The following 261 words could not be found in the dictionary of 615 words (including 615 LocalSpellingWords) and are highlighted below:

2temperature   3rd   above   accessed   all   allow   already   an   An   analyzing   and   apples   apply   Archive   arrow   assign   assignment   at   available   barplot   basic   basket   be   beginning   below   bit   blue   bonus   Boolean   boolean   brackets   but   Calculator   calculator   call   can   cells   character   clotheslines   collection   column   columns   command   commands   comments   common   Comprehensive   computer   concatenation   concept   contains   convention   cran   created   creating   cut   dataframe   Dataframe   different   dilution   dimensional   dimensions   dna   document   don   double   driver   each   Each   easiest   element   elements   environment   equals   evaluated   except   expression   expressions   far   figure   fill   fit   for   frame   frames   fred   from   fruit   fruitsupply   function   Functions   functions   gene   generate   genotype   get   given   good   graphics   green   hang   happens   hold   how   However   If   if   in   installed   interesting   interpreter   into   Intro   iv   kinds   kitchen   know   language   languages   latest   launched   length   Let   let   like   line   lines   list   List   listed   Lists   lists   little   ll   logic   long   Look   ls   main   make   matrices   Matrices   matrix   Matrix   mean   mileage   mixed   mm   more   multiple   must   mydata   mylist   name   names   ncol   Network   Now   nrow   numbered   numeric   numerical   object   objects   odd   of   often   on   one   operate   operator   or   ordered   Other   other   out   paste   peaches   pears   pi   plot   points   position   predefined   project   red   returned   rich   row   same   second   see   series   set   sign   similar   simply   single   so   some   Sometimes   sqrt   square   start   started   store   summary   sure   tables   temperature   temperature2   test   than   that   The   the   them   they   They   things   think   this   three   to   To   two   Type   type   types   us   Use   use   used   using   value   values   variable   variables   Variables   ve   vector   Vectors   vectors   version   want   way   We   we   what   whatever   When   whereas   will   window   with   word   works   wt   You   you   your  

Clear message

Intro to R

R is a language and an environment, that is good for analyzing data and creating rich graphics. To get started make sure you have R installed on your computer. The latest version of R is available at the Comprehensive R Archive Network.

When you start R, an interpreter window is launched. You can type commands into it, or cut and paste them from a document.

Use R as a Calculator

The easiest way to get started is to simply use R as a calculator. Type some numerical expressions into R and see what happens (or cut and paste the code below). The lines that start with "#" are comments do not get evaluated.

# how many seconds in a day?
24*60*60

# what is 2 to the power of 16?
2^16

# what's the square root of pi?
sqrt(22/7)

# generate a series of numbers using the colon character
1:5

# what's the average value of the series of numbers above?
mean(1:5)

The basic elements of R are variables and functions. Look, you've already used two functions! (bonus points: type the word pi on the command line, and you'll see that it's a predefined variable).

Variables

Variables are used to store data so we can operate on it. R has four basic kinds of variables: vector, matrix, dataframe, list. The way we assign data to a variable is to use an assignment operator "<-". An equals sign also works (=, common in other languages), but the convention in R is to use the little arrow.

Let's start with a vector. It can hold one or more values.

# create x and assign it the value of 2
x <- 2

# make other variables
y <- 3.14

# you can name variables using letters, words, periods, underscores
temperature <- 37

temperature2 <- 98.6

# but the name cannot start with a number!
2temperature <- 98.6

# we can assign character strings too, but they have to be enclosed in quotes
genotype <- ″wt

Now that you've created them, if you type the name any of the variables above, the value of the variable is returned.

To see all the objects you've created so far, use the ls() function. Type it on the command line, and the objects in your environment will be listed.

To assign more than one value to a variable, we often have to use a concatenation function: c()

# the concentration of DNA in my std curve samples
dna <- c(5, 10, 20, 40, 80, 160, 320)

Now if you type the name of this variable, all the values will be returned. However, each value can be accessed using square brackets. The elements of the vector are numbered beginning with 1 (not 0 like some other languages).

# what concentration was the 3rd sample?
dna[3]

# We can use multiple subscripts
dna[1:5]

# I just need the odd samples
dna[c(1,3,5,7)]

# it's hard to evaluate numbers by looking at them
# let's draw a bar plot to how they compare
barplot(dna)

Sometimes we don't know how long a variable is, let's use the length() function to figure that out.

length(dna)

If we use a vector in an expression, it get's evaluated for each element.

# I need to divide my concentrations by 2
dna/2

# I can assign the results to a new vector
dna_dilution <- dna/2

Vectors can have multiple values, but they must all be of the same type.

numeric vector

23.2

45.8

63.7

character vector

"red"

"green"

"blue"

boolean vector

TRUE

FALSE

TRUE

Boolean vectors are an interesting concept. They allow us to test things and apply logic.

# which DNA concentrations are greater than 50?
dna > 50

# we can capture the result, and use that as an "index vector" to return the values
# that meet the criteria
iv <- dna > 50

# see which values met our criteria
dna[iv]

# we can also use a shortcut, and use the logical expression directly
dna[dna > 50]

Other variables: Matrix, Dataframe, List

The other main types of variables are matrices, data frames, and lists. Matrices and data frames are two dimensional tables. Matrices have to be all one data type, whereas data frames can have columns of different types (e.g. a column of gene names, and a column of expression values).

To create a matrix, we call a function, set some dimensions, and fill it with data. The cells of the matrix can be accessed using square brackets: matrix[row,column]

# create a 5 x 5 matrix
mm <- matrix(1:25, nrow=5,ncol=5)

# look at the result
mm

# access the 3rd element of the 4th row
mm[4,3]

# return the 3rd row (thus all columns)
mm[3,]

# return the 5th column (thus all rows)
mm[,5]

# re-assign a particular cell with a new number
mm[4,3] <- 125

Data frames are similar except the data types can be mixed. We can create a data frame using a function.

mydata <- data.frame(fruit=c("apples", "pears", "peaches"), basket=c(1,3,2), kitchen=c(5,3,6))

# let's summarize the fruit count
summary(mydata)

Lists are a little bit odd. A list is an ordered collection of objects. Each object can be a different type.

mylist <- list(driver="fred", mileage=c(2200, 1150, 5000), fruitsupply=mydata)

The list above contains a single element character vector in the first position, a three element numeric vector in the second position, and the dataframe we created above in the 3rd position. I think of lists like clotheslines, in that you can hang whatever you want at a given position.

Access the elements of a list using double square brackets.

# what's in the first position of the list?
mylist[[1]]

# what's the second mileage number?
mylist[[2]][2]

# since the list has "named" positions, we could also access 
# the information in a special way
mylist$mileage[2]

Functions

plot

fit a line

generate some points

generate some series

R/R Intro (last edited 2011-09-28 06:59:07 by ChrisSeidel)