Widget HTML Atas

Download Package Xlsx In R

The R environment, directories, libraries, reading, writing, and working cross platforms


Maybe this is your first foray into the world of programming or maybe you just don't know R... yet.
Are you... scared?

We promise, this zombie will be the scariest thing you experience in this course, with the exception of our mugs, of course. Ok, so lets get to it and lay down some ground rules (arrr... more like guidelines) :

1) There are no stupid questions when it comes to programming problems, so stop us and ask questions!!

2) Annotate, annotate, annotate ALL programs that you create whether in or out of class (more later today).

3) The only-- yes ONLY way to learn to use a programming language is to practice, practice, practice. We will provide example exercises for you to practice your skills, but you should consider these the minimum amount of practice you should be undertaking.


A
t the end of this lesson you will know...


INTRODUCTION TO R

R is an open-source, objected-oriented language. When we say R is object-oriented, what we mean is that everything that R does or contains is based on objects. Objects in R include:

numerical values letters or other symbols that stand for numerical values that can range from single values (often called scalars) to entire databases (defined in R as dataframes)
other structures, such as lists;
functions, series of mathematical or statical operations; and
the results of statistical analyses (theses
often can be confusing-- but not to worry).

For those of you experienced with other programming languages (e.g., C, Fortran), object orientation can initially be confusing. In this course, the structure of objects will be very simple and straightforward; as we get more familiar with R and get into more complex problems the power of the object-oriented approach will hopefully become clearer.

THE R INTERFACE

Unless you are a total geek (see bonus material below), you will more than likely be using a graphical user interface (GUI, pronounced--'gooey") to work in R. Fortunately, you have two options (ok, probably more that two but we will only discuss these two). The first comes with the standard R download from the Cran website. You can assess this by clicking on the "Rgui.exe" file in the "bin" folder that was created during the R install. In the computer lab, R should be listed in the program files list (hint: click the windows explorer icon) . Open the standard R gui and take a look. It should look something like this (you may have a different version though):

(Hint click on image for a bigger version). It's fairly bare bones, but quite useful. The other alternative to this GUI is Rstudio, which also is open source and freely available. Both interfaces can do roughly the same things though Rstudio has some additional functions that we think that you-- as nubies-- will prefer (JP definitely prefers Rstudio). Everything we cover in this course can be performed using the standard R GUI or Rstudio with the exception of a few procedures that we will cover in this lesson. Before we go over Rstudio, lets explore the standard GUI first. Above, shows the R console, which  displays the results of analysis and messages associated with any code that is either entered in the command line (after the red arrow " > ") or using something called a script. A script is a file that contains a bunch of R code that you can save and use over and over again. We will mostly be using scripts throughout the course, but for now lets just use the command line.

We assume that you all know the mathematical operators that we will be using in the course but just in case:

Operator
 Description  Example
 + addition  x plus y is x + y
 - subtraction  x minus y is x - y
 * multiplication  product of x and y is x*y
 / division  dividing y by x is y/x
 ^ or **
exponentiation  y raised to the x power is y^x or y**x

NOTE: The caret or hat operator ^ may not work on Mac versions of R!! Use ** instead.


As we mentioned, you can enter commands directly in the R console. For example type in 2+2 and hit return. You should get something like that below.

You can see the command that we typed in before the arrow "2+2" and the result below, 4. Try a few more commands for grins-- we're computing! Notice that the command console gave us the answer right after we submitted the command.

What about about if we want to save the result of the operation? Well, we need to assign the answer to an object (your first!). We do this by using as assignment operator, either an equal sign " = " or an arrow followed by a dash " <- ". For many of the objects we will be using, these two assignment operators can be used interchangeably, BUT (warning warning!!) later on we will find that this may not work of all objects. Lets create an object that is the product (reminder: multiply) of 10 and 2.5. Remember the product operator is the asterisk " * " and assign the value to an object X using the command console. Do the same for an object Y but use the other assignment operator. For example,

You should notice that the results of the operation were not printed in the console-- that's because they were assigned to each object. If we want to see the result, we simply type Y or Y in the console and the results should be printed, For example,

Both values should be the same and they are . Objects are not restricted to single characters and can consist of combinations of numbers, letters, period, and underscore. Note that special characters (e.g., #, $, @, and commas) are used for other specific purposes, so stick with, "a,b,c,yx,dog,cat...", "1,2,3,55,19...", ".", and "_" when naming objects. For example, we could have used "jims.Y" or "jims_X" rather than X and Y. This makes it easier to keep track of data. For example, the object "fish.wt" could contain fish weights. WARNING, WARNING object names are CASE SENSITIVE , i.e., typing lowercase x in the command window will result in the following error:

More often than not, forgetting this fact can lead to real headaches. Our suggestion: start now to establish a naming convention for yourself, such as only use lowercase for certain types of data and numbers for various versions of the object. As you might have guessed, we can perform operations of objects. To convince yourself, perform a few operations using X and Y, go ahead and assign the result of one of the operations to an object names "this.is.the.best.course.ever". For example,

The neat thing about the console is that you can recall commands simply by using the up and down arrows on your keyboard. Go ahead and put the cursor in the command window and press the up arrow. The last command should reappear for the above example
this.is.the.best.course.ever should reappear. You can continue to us the up and down arrows to scroll through your previous commands.


The objects that we created are stored in the working directory . To list the contents of the working directory, we simply type "ls()" in the command window. For example,

alternatively, we can go to the "Misc" menu at the top of the GUI and select "List objects":

For now, it may be easier to use the menu driven commands. Eventually, you will find that it will be better to learn and use the commands. We see here that it listed "X", "Y" and " this.is.the.best.course.ever". If we wanted to get rid of one or more of these objects, we use the remove command as "remove(object name)" or for short "rm(object name)" where "object name" is the name of the object you want to remove. If you want to remove more than one object, you separate the object names using a comma. Lets remove "X" and "Y" and list the contents of the working directory:

Not too shabby. Before we go any further, lets talk R scripts . An R script is a text file that contains R commands (and data if you want). Using a script is infinitely easier than using the console and typing in commands, so lets get started. In the standard R GUI, we click on "File" and select "New script" or use Ctrl-N:

The new script is shown in the window above. You can save and name the Script anyway you want-- that said, we strongly recommend using ".r" or ".R" as the file name extension. We will be using that convention throughout this course. To create a script, simply type the commands into the script and submit to R. In the script, type in the following commands

fish.wt = 10

fish.length = 150

fish.condition = fish.length/ fish.wt

fish.condition

ls()

Then select the commands and right click your mouse as:

Then choose "Run line or selection" (you could also hit Ctrl-R) and you should get the following in the console.

Notice that the console displays the commands just as if you entered them individually and prints out the results. Before we go any further, we need to talk annotation. Annotating an R script or any program is always good practice. It helps explain what you did and helps you communicate it to others or helps you remember what you were doing at each step. To annotate a program, we use the pound sign "#" at the beginning of each line and R will ignore anything after the # and before the "return". For example,

# This program calculates the goofy Peterson and Colvin condition factor

fish.wt = 10

fish.length = 150

## here's the formula..., pure genius

fish.condition = fish.length/ fish.wt

fish.condition

Everything shown in green is ignored by R when you submit the script. You can use more than one # as shown above. Using the comment character "#" also can be used to turn off parts of programs. For example, adding a # before fish.condition:

# This program calculates the goofy Peterson and Colvin condition factor

fish.wt = 10

fish.length = 150

## here's the formula..., pure genius

fish.condition = fish.length/ fish.wt

#fish.condition

ls()

stops R from printing out fish.condition after it is calculated. Go ahead and try it. As we will find later, this feature is very useful for debugging programs. We heartily recommend using this feature to take notes during the course too. Go ahead and save this script, we may use it later.

As we discussed earlier, the objects we created are written to the working directory. If you save the contents of the working directory (think: "massive database that I worked on for hours"), they will always be available when you start R. But where is the working directory? By default, it's more than likely placed somewhere you don't want it to be (but see profile below). Therefore, it's always good practice to set the working directory at the beginning of your R session. We do this using the "setwd(path)" command, where path is the windows path to the folder where you want to write and read things. For example, lets say that I wanted to save this session to my memory stick and the memory stick was assigned as the "G" drive. I want to save it in a folder "RRRR" inside of " Jims class stuff":

setwd("G:/Jims class stuff/RRRR") ## NOTICE USE OF FORWARD SLASH

# This program calculates the goofy Peterson and Colvin condition factor

fish.wt = 10

fish.length = 150

## here's the formula..., pure genius

fish.condition = fish.length/ fish.wt

fish.condition

ls()

Or you could use the double back slash. R will not recognize single backslash format.

setwd("G:\\Jims class stuff\\RRRR") ## NOTICE USE OF DOUBLE BACKSLASH

# This program calculates the goofy Peterson and Colvin condition factor

fish.wt = 10

fish.length = 150

## here's the formula..., pure genius

fish.condition = fish.length/ fish.wt

fish.condition

ls()

To save the all of the contents of your working directory you ca use "save.image(file = "myfilename.Rdata")" command. This command saves everything to " myfilename.Rdata" where myfilename would be your file name. Alternatively, you can save specific objects in the working directory to a file using "save( file = "myfilename.Rdata",list = c(names of objects in quotes separated by commas))". For example, let say that we wanted to save everything we created using the above script to a file called "firstRclass.Rdata":

setwd("G:\\Jims class stuff\\RRRR")

# This program calculates the goofy Peterson and Colvin condition factor

fish.wt = 10

fish.length = 150

## here's the formula..., pure genius

fish.condition = fish.length/ fish.wt

fish.condition

ls()

### Here's where we save everything

save.image(file = "firstRclass.Rdata")

This would save all of the objects to firstRclass in the folder specified with the setwd command above. Alternatively, let's say that we wanted to only save fish.wt and fish.condition. We would specify:

### Here's where we save just 2 objects
save(file = "firstRclass.Rdata", list = c("fish.condition","fish.wt"))

Whew, we saved all of that work, turned off the computer, and headed to Bombs Away for happy hour. How do we access the objects that we saved? No problem, we use the "load" command, but first we have to set the working directory.

load("firstRclass.Rdata")

You also may encounter another method of specifying the current working directory in the file path using " ./" before the filename. For example:

load("./firstRclass.Rdata")

In the standard R GUI and Rstudio (more below), we won't have to use " ./" but it can come in handy later.

Another way we could have loaded the file was to specify the path inside the load command.

# Load objects from existing file, full path edition
load("
G:\\Jims class stuff\\RRRR\\ firstRclass.Rdata")


READING TEXT DATA FILES
Specifying paths inside commands can be used for several other commands, but we think it is better a practice when learning R to set the working directory and read and write files from that directory.
Speaking of reading and writing files, there are several commands that can be used to read and write files in a variety of formats. For now, we'll stick to datafiles. The two most common formats read (and written) are text files in tab or comma delimited formats. Delimited means how the data in columns are separated. In a tab delimited file, the data in each column in a row are separated by a tab. Here is a simple tab delimited file with a header (the first row contains the column names):

pet    length wt age

cat 100 25 15

dog 500 257 5

and the same file in comma delimited format.

pet,length,wt,age

cat,100,25,15

dog,500,257,5

Save the full version of the pets data in tab (pets.txt) and comma (pets.csv) delimited versions by right clicking on the links and saving to you computer/memory stick. These files were created in excel using "save as" and selecting "Text (Tab delimited) (*.txt)" and "CSV comma delimited (*.csv)" options. Go ahead and open them up and take a look. Nothing scary there. As with everything in R, there are several ways to read in data files. We will learn to use two methods for text files: read.table, which can read in text files with any type of delimiter and read.csv, which reads in comma delimited files. Remember to set your working directory to the location of the text files. Here's the syntax for reading in the tab delimited pets data:

# set working directory
setwd("G:/Jims class stuff/RRRR")
## read in tab delimited file

pet.data<-read.table("pets.txt", header = TRUE, sep = "\t")

### print the contents to the console

pet.data


Here the name of the file is specified first. We then indicate that the file contains a header (column names),
header = TRUE; and we specify that the file is tab delimited using sep = "\t". Whew, how can anyone remember all that? We can't. That's why we always check the syntax using the help function. Lets do this by typing help(read.table) in the console. You can see all of the options. For example, if the file did not contain column names we would specify header = FALSE (note upper case of FALSE). If you review the help file, you'll see that you can specify the type of delimiter with the sep option. Hmmmm... "\t" represented tab delimited... wonder what we use for comma delimited? Maybe a comma ",". Lets try that.

## read in comma delimited file

pet.data2<-read.table("pets.csv", header = TRUE, sep = ",")
### print the contents to the console

pet.data 2

It worked. Now lets try using read.csv to read in the pet data. Let's use the help command to pull up the help file and look at the syntax. Based on the help file our command should look something like this:

## read in comma delimited file

pet.data3<-read.csv("pets.csv")
### print the contents to the console
pet.data 3

Notice how many fewer options we needed to specify. This was because the default was header = TRUE. Important point to remember: the default options for any R command/function are displayed inside the command at the top of the help file.

Commonly Encountered Problems (CEP) reading data:

Throughout the course we will try to provide you with commonly encountered problems and solutions. These will be flagged with CEP. Here's the first CEP of the course. Save the comma separated file Habitat data.csv to your computer and open using Excel. These data were collected in wetlands in central Utah. There are 12 columns of variables ranging from site number to average water depth (Avg H20) and there are 29 observations (rows). Read the data into R using read.csv command-- be sure that the working directory is correct-- and print the contents e.g.,

habitat<-read.csv("habitat data.csv")

habitat

What happened? Does it look like the same data in excel? It shouldn't. You should see several changes.

1) There are a whole bunch on " NA ". In R, " NA " means that the data are missing.
2) The column names have changed. For example, we now have "Site.." rather than "Site #"
3) The print out also indicates that there are columns X, X.1, X.2...X.10.
4) There are also 35 rows of data (all NA ) rather than 29.

What the... @#$#@&!!? This happens all of the time (even to us).

1) The NA are not necessarily a problem. The original data did have some missing data, so R simply replaces missing data with NA. We will learn how to deal with real missing data later in the course.
2) The column names changes because R does not allow spaces or special characters ("#", "(",")") in the names of objects and columns and it will automatically change these to periods ".". SOLUTION: Don't use special characters.
3-4) The extra columns and rows are because the csv file contains extra rows and columns. This is often due to a stray character or space in the file. For example, take a look at Habitat data.csv in Excel and you will see a stray character (x) in column W row 36.
SOLUTION: Delete the extra rows and columns.

You probably noticed in the above CEP that it was a pain to print out a moderate size dataset in the console. Using the standard R GUI, you may not want to print out the contents of a gigantonormous file. You can use the command head() to print out the column headings and the first 6 lines, e.g.,

habitat<-read.csv("habitat data.csv")
## print out first 6 lines
head(habitat)

You can also obtain the names of the column headings using the names command. For example,

habitat<-read.csv("habitat data.csv")

## print out first 6 lines
head(habitat)
## print out column names

names(habitat)

We created objects when we read in the text files ans these objects are known as dataframes. They will become among the most important and frustrating objects we use in R. We have entire lessons devoted to working with dataframes. For now, we'll do a couple of operations with the dataframes. To access or use the columns of the dataframe, we need to us a special syntax that involves a dollar sign "$". To refer to the length and wt columns in pet.data, we use pet.data$length and pet.data$wt. We can create an R object that contains the data from a column of the dataframe, e.g.,

weight<- pet.data$wt

leng <- pet.data$length

We also can create a new variable (column) in pet.data using the "$". For example, let's say that we wanted to create a variable leng.wt by dividing the weight of each per by its length. Simple, we do the following:

pet.data$leng.wt <- pet.data$wt / pet.data$length

or we also could have done

weight<- pet.data$wt

leng <- pet.data$length
pet.data$leng.wt <- weight / leng

If time allows, go ahead and create a couple more variables for the pet.data dataframe.


WRITING DATAFRAMES TO TEXT FILES

Ok, we learned to read files and conduct simple manipulations with dataframes. Now we'll learn how to write data to a text file. The people that gave us R are fairly logical people. The created read.table and read.csv to read text files. What are the commands for writing files?  If you guessed write.table and write.csv you deserve a gold star. How do we figure out the syntax? All together now..."USE THE HELP COMMAND." Lets do that. We see that just like read.table, write.table can write text files with any type of delimiter, so to write pet.data to a tab delimited file we can use:

## Write a tab delimited file to working directory

write.table(pet.data, "Look and me.txt", sep = "\t")

This will write the file to your working directory so make sure it is correct!!! Notice that the syntax is similar to read.table. We can do the same and write a comma delimited file using write.table and write.csv, e.g.,

## Write a comma delimited file to working directory

write.table(pet.data, "Look and me.csv", sep = ",")

## Write a comma delimited file to working directory

write.csv(pet.data, "Look and me too.csv")

We can read files we can write files, isn't this awesome?

INSTALLING AND LOADING R PACKAGES

We can hear your questions now, but we use Excel. What about us? Well, you could save your spread sheets as text files and read them into R using the commands we just learned above (probably safer option). For those who want to live life on the edge, we can use xlsx. Before you try, we first need to install it on your computer and load it into the R environment. Why? Because it is a R package (a.k.a. R library). R packages are a set of functions that perform specialized tasks. The base R generally comes with several packages that are automatically loaded in the R environment every time you start it up. For example, the read and write commands are in the R base packages. In the base R GUI, we first to download the package from the internet from an R mirror site. These are various host institutions that have the the packages available for downloads. OSU (go Beavs!) is one of the host institutions. To set the mirrow with, base R GUI go to "Packages"  and select "Set CRAN mirror...", e.g.,

Choose you're mirror site. We'll pick OSU as:

Click OK and the go to "Packages" and select "Install packag(e)" you will get the pop up to the left:

Scroll down and find xlsx and click OK. You will get the following (or something similar) in the console:

trying URL 'http://ftp.osuosl.org/pub/cran/bin/windows/contrib/3.0/xlsx_0.5.1.zip'
Content type 'application/zip' length 314576 bytes (307 Kb)
opened URL
downloaded 307 Kb

package 'xlsx' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
        C:\.... some filepath ...\AppData\Local\Temp\Rtmp671J6Q\downloaded_packages

A zip file containing the package was downloaded to your computer in some default folder that will differ from the one shown above. Just make sure that you have the path right because you need to go there to install the package using the zip file. Go to "Package" and select "Install package(s) from local zip file". Go to the location indicated above " C:\ ...some filepath...\... downloaded_packages " select the zip file and click "Open" and you will get the following:

package 'xlsx' successfully unpacked and MD5 sums checked

You are now ready to load the package. This can be done for any R package using one of two commands: library(package name) or require(package):

# load xlsx package

library(xlsx)

or

require(xlsx)


JP generally prefers to use require because loading certain packages after they have already been loaded with library can produce screwy things. We ususlly load packages at the start of a session by including the library or require statements at the top of the R script. Note that these packages need to be reloaded an each time you begin an R session (but see setting R profile below for tricks).

CEP loading packages: You may encounter the following (or similar) message when loading a package:

> require(jims.stuff)
Loading required package: jims.stuff
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called 'jims.stuff'

SOLUTION: You don't have the package installed, so you need to install it using the above procedure.  More on using the xlsx package below in Bonus Material.

RSTUDIO

Hey-- what about Rstudio? We haven't forgot and best of all, you haven't wasted your time. Almost everything we just learned is applicable to Rstudio.  You still use scripts, the same commands, and have to load packages. The benefits of Rstudio is that the added additional windows like functionality. R studio is just a useful GUI, so R must be installed before installing Rstudio. Once you install Rstudio, open it and it should look something like this:



Rstudio allows you to customize your environment and use multiple panes. Here I have 4 panes shown. Upper left is an Untitled R script; upper right will display all R objects created or read in the R session (workspace tab) or all of the commands used (History tab); the lower left is the console (just like the base R GUI); and the lower right can display the installed packages, help files, plots, and all files in the working directory. These windows can be moved around or closed. For grins, lets install xlsx package and read in the files using the R code above. The beauty of Rstudio is that it will automatically load packages for you. Just click the "Packages" tab and select "Install packages" as:

A window will pop us and all you need to do is type the name of the package in the appropriate box and ta-da, it will install the package for you. You can load the package using the library or require commands as shown above of you can scroll down the packages list and check the box. It will then load the packages automatically. Go ahead and used the above code for reading in the Excel files. Be sure to include the code to load the xlsx package and set the working directory. Highlight the code in the script pane and click "Run" in the upper right hand corner of the script pane. The console should show the commands and the output from the commands just like it did in base R GUI. You should also should be able to see the names of the dataframes that you created in the Workspace pane, like this:

Double click on one of the files and it will open up in a separate tab in the script pane. You can now view objects without having to print them to the command console. There are many, many other things Rstudio can do in a windows like environment such as reading text files, saving objects or a session to a file, setting a working directory, and several others. Mess around with the GUI and we're sure you can find others.  One more thing about the script editor. You should notice that the script editor automatically colors comments and things inside of quotes green. It also colors commands blue and will help you debug problems. This increased functionality can really be helpful when learning R.

Getting in the weeds: SETTING UP YOUR R SITE PROFILE

What is the site profile? Well basically it is a script that runs every time you open R, it allows you to modify how R starts .  It is located in the etc folder of you R installation ( C:\Program Files\R\R-2.15.2\etc\Rprofile.site)   This script is independent of whether you are using the basic R GUI or R Studio.  In other words if you specify something in your site profile it will be recognized by Rstudio which is especially important for the .libPaths below.

Why would I want to alter such a thing?

  1. You have functions you use frequently that you wrote yourself (e.g., inverse logit)
  2. There are pesky problems that come up when you don't have administrative privileges on your computer

Here is what the default profile looks like:

# Things you might want to change

# options(papersize="a4")

# options(editor="notepad")

# options(pager="internal")

# set the default help type

# options(help_type="text")

options(help_type="html")

# set a site library

# .Library.site <- file.path(chartr("\\", "/", R.home()), "site-library")

# set a CRAN mirror

# local({r <- getOption("repos")

#       r["CRAN"] <- "http://my.local.cran"

#       options(repos=r)})

# Give a fortune cookie, but only to interactive sessions

# (This would need the fortunes package to be installed.)

#  if (interactive())

#  fortunes::fortune()

Here are some common things I prefer (change help from html to text, and set a cran mirror):

options(help_type="text")

local({r <- getOption("repos")

# TELL R WHERE TO LOOK FOR AND SAVE PACKAGES

.libPaths(c("C:/Users/ jpeterson/Documents/R/win-library/2.15", "C:/Program Files/R/R-2.15.0/library"))

The above bit of code for .libPaths() is very important as it tells R where to look for packages and where to save them.  This is especially useful if you can't save directly to the C drive.  The key is to have the first folder location as a location where you can save, as this is where R will save downloaded packages.  This works for R studio as well and is how Aaron M. is installing Rstudio on grad student machines since they were having issues with saving downloaded packages.

Well that is fine and dandy, but I don't want to have to get admin privileges every time I need to modify my site profile.  Also it is a pain in the rear to copy all this code when I upgrade to the latest version of R.  What to do?  Glad you asked.  My site profile has 1 line of code in it:

source("c:/Users/jpeterson/..../jims_site_profile.R")

What does that get us?  Well now all you have to do is modify the R script mikes_profile.R to change the site profile and you only have to copy and paste 1 line of code when you upgrade R versions.  Plus you can easily share site profiles between your workstation and laptop.  Here is what jims _site_profile .R looks like (at least part of it):

options(help_type="text")

local({r <- getOption("repos")

options(repos=r)})

.libpaths( "C:/Users/ jpeterson /Documents/R/win-library/2.15", "C:/Program Files/R/R-2.15.0/library" )

# read in useful functions

source("C:/Documents and Settings/jpeterson/My Documents/myFunctions.R")

# packages I use daily

require(lattice)

require(RODBC)

require(xlsx)

require(reshape2)

require(plyr)

# set up a com to my dbase

com<- odbcConnectAccess2007("C:/Documents and Settings/ jpeterson/My Documents/projects/PSM/

Week 1 Assignment

Due 1 week from today by 5pm Pacific. Read the following three files into R: weather.csv (comma delimited), people.prn (single space delimited), and biota.txt (tab delimited). Using the weather file, create a new variable titled "ppt.in" by calculating precipitation in inches using the variable "total.ppt.mm" (precipitation in mm). Using people, create a new variable titled "wt.kg" using the weight of each person in stone in the dataframe, variable "weight.stone" (1 stone = 6.35 kg). Using the biota dataframe, create a new variable in the dataframe (call it what you want) by dividing the mass (Mass.kg) of each species, but its corresponding height (height.m). Each dataframe should be written to a text file in the format of your choice (e.g., comma separated, tab delimited). Please save all of the code you used in a single script and submit the script in an email attachment to both instructors.


Bonus Material (embrace your inner geek!)

READING AND WRITING EXCEL SPREADSHEETS

Ok-- now that we know how to install R packages, we can get back to out original problem. First make sure that we load the xlsx package. After that we could use the help command to get information of the types of commands/functions we can use to read data and their syntax. To save time, the function for reading sheets in an Excel workbook is: read.xlsx. Lets use the help command to get the syntax. We should see that it requires the name of the Excel file, the sheet number or sheet name and a minimum. There are several other options that we can discuss later in the course. For now, save pets and habitat.xlsx to your computer and open it in Excel. You should see a two worksheets: "catch data" (sheet1) and  "habitat data" (sheet 2). We can read these two worksheets using the following (Be sure that the working directory is where the file was saved):

# load the package

require(xlsx)
# set working directory
setwd("G:/Jims class stuff/RRRR")

# read data in the catch spreadsheet

catch<- read.xlsx("pets and habitat.xlsx",sheetName="catch data")

# read data in the habitat spreadsheet

habitat<- read.xlsx("pets and habitat.xlsx",sheetName="habitat data")


Pretty cool, eh? We now have catch and habitat dataframes. Using your noggin, we bet that you probably figured out that that you can write to an Excel file using the command write.xlsx. Lets look at the syntax of the command using help. You will see that it is similar to the read.xlsx and write.table commands. The is however a very important option that we need to cover, the append option. If the append option append = FALSE, R will create an entirely new spreadsheet and WILL OVERWRITE EXISTING EXCEL FILES. The idea here is that if you plan to create a new sheet  in an existing Excel file, you need to set append = TRUE. Lets create a new Excel workbook using the catch and habitat dataframes from above. Note that the file will be written to the working directory.

# create excel workbook and write catch dataframe in new catch spreadsheet

write.xlsx(catch,"new pets and habitat.xlsx",sheetName="new catch data", append = FALSE)

# create new sheet "new habitat data" in existing spreadsheet and write habitat dataframe

write.xlsx(habitat,"new pets and habitat.xlsx",sheetName="new habitat data", append = TRUE)


A word to the wise.. xlsx package can give some strange (java) error messages that are not readily interpretable, so anything strange in a spreadsheet like borders, special colors, and freezing frames can (not always) result is errors when trying to read data. In addition, there are memory limitations in xlsx package that will prevent reading in very large spreadsheets. Nonetheless, the package is fairly useful and gives you an idea of that R is fairly adept at working across platforms.

Problems with xlsx package

When attempting to load the xlsx package in Windows R (& RStudio), people are often confronted with the following error message:

> require(xlsx)
Loading required package: xlsx
Loading required package: rJava
Error in get(Info[i, 1], envir = env) :
lazy-load database 'C:/Users/peterjam/Documents/R/win-library/3.2/rJava/R/rJava.rdb' is corrupt
In addition: Warning messages:
1: package 'xlsx' was built under R version 3.2.5
2: package 'rJava' was built under R version 3.2.5
3: In get(Info[i, 1], envir = env) : internal error -3 in R_decompress1
Failed with error: 'package 'rJava' could not be loaded'

More often than not (5/5 of my lab computers if that counts as a representative sample) the issue is that you are running 32 Bit version of Java and the 64 bit version of R (and Rstudio). So, first you need to check the version of Java that R is using with the following system command in R:

> system("java -version")
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) Client VM (build 25.91-b14, mixed mode, sharing)


Ok, it is not obvious that R is using a 32 bit version installed on you computer but this message is what you get when it is 32 bit Java. This means that you need to manually download and install the 64 bit version of Java that can be found here: https://www.java.com/en/download/manual.jsp. Choose the 64 bit option, currently labeled "Windows Offline (64-bit)", then install. Be sure to uninstall the old version of Java. Now close and reopen R or RStudio. Let's see if R now recognizes the 64 bit version using the system command above.


> system("java -version")
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)


Yea! We now have R using 64 Bit Java. Lets try to reload the xlsx package.
> require(xlsx)
Loading required package: xlsx
Loading required package: xlsxjars
Warning message:
package 'xlsx' was built under R version 3.2.5


Now were ready to rock!


R studio alternative, courtesy Mike Colvin
Notepad++ with NppToR is an alternative to Rstudio. I (MC) use it for all my programming. Why? Several reasons:

1) plays well with dual monitors (see here for a image of MC's station set up)
2) plays well with other languages (e.g., html, java)
3) tabbed scripts
4) code folding
5) code is independent of R, therefore if R crashes your code is still intact

Personally, I have found this setup to be efficient for my day to day operations, especially if dual monitors are involved. It is not as great for single monitors and laptops, but I still use it in those situations

Posted by: francismcgoon.blogspot.com

Source: https://sites.google.com/site/rforfishandwildlifegrads/home/week-1?tmpl=%2Fsystem%2Fapp%2Ftemplates%2Fprint%2F&showPrintDialog=1