R Packages

  • Commonly used R functions are installed with base R
  • R packages containing more specialized R functions can be installed freely from CRAN servers using function install.packages()
  • After packages are installed, their functions can be loaded into the current R session using the function library()

Installation Demo

We install packages using the install.packages() command:

install.packages("plyr")

You can install multiple packages at once using the c() command:

install.packages(c("dplyr","data.table"))

Note that when you install packages other dependancy packages may be installed as well.

Finding R Packages

  • How do I locate a package with the desired function?
  • Google ("R project" + search term works well)
  • R website task views to search relevent subjects: http://cran.r-project.org/web/views/
  • ??searchterm will search R help for pages related to the search term
  • sos package adds helpful features for searching for packages related to a particular topic

Handy R Packages

  • ggplot2: Statistical graphics
  • dplyr/tidyr: Manipulating data structures
  • lme4: Mixed models
  • knitr: integrate LaTeX, HTML, or Markdown with R for easy reproducible research

Creating Your Own Functions Outline

Code Skeleton:

foo <- function(arg1, arg2, ...) {
    # Code goes here
    return(output)
}

Example: Finding the mean of set of numbers

mymean <- function(data) {
    ans <- sum(data) / length(data)
    return(ans)
}
mymean(1:5)
## [1] 3

If/Else Statements

Code Skeleton:

if (condition) { # Starting bracket if statement
    # Some code that runs if condition is TRUE
} else { # Starting bracket else statement
    # Some code that runs if condition is FALSE
}# Ending bracket else statement

Example: Finding the mean of set of numbers with a conditional statement

mymean <- function(data) { # Starting bracket for loop
    if (!is.numeric(data)) { # Starting bracket if statement
        stop("Numeric input is required")
    } else {
        ans <- sum(data) / length(data) # computing division
        return(ans)
    } # Ending bracket if statement
}# Ending bracket for loop

This new function ensures that our argument is a number to continue to the calculations.

Looping

  • Reducing the amount of typing we do can be nice
  • If we have a lot of code that is essentially the same we can take advantage of looping.
  • R offers several loops: for, while, repeat.

Code Skeleton:

for (i in Indexset) { # Starting bracket for loop
    # Do something
} # Ending bracket for loop

Example: Printing first 5 players drafted in 2008

nba <- read.csv("NBA Draft Class.csv")
for (i in 1:5) {  # Indexset is {1,2,3,4,5}
    print(nba$Player[i], max.levels = 0) # Print statement
  # max.levels = 0 surpresses output of all the levels.
}
## [1] Derrick Rose
## [1] Michael Beasley
## [1] O.J. Mayo
## [1] Russell Westbrook
## [1] Kevin Love

For Loops More Involved Example

Our indexing set can be elements aside from numbers.

id <- c("Total.Points", "Minutes", "Games")
# Loops through id and prints out each string
for (colname in id) {
    print(colname)
}
## [1] "Total.Points"
## [1] "Minutes"
## [1] "Games"
for(colname in id) {
    print(paste(colname, mymean(nba[, colname]))) 
    # paste() is used to print variables and strings together
}
## [1] "Total.Points 2037.2899408284"
## [1] "Minutes 4783.63313609467"
## [1] "Games 200.615384615385"

In the second for loop, we are cycling though our indexset and printing the column name along with its associated average.

While Loops

While loops are similar to for loops by stop once a certain condition is met unlike a for loop that continues until the index set is done being iterated through.

Motivating Example

Constructing a while loop to print out first 5 draft picks in nba data set

pick <- 1 # Initialize our starting Value
while (pick <= 5) {
    print(nba$Player[pick], max.levels = 0) # Prints each value while hiding the level.
    pick <- pick + 1 # Add one to pick 
}
## [1] Derrick Rose
## [1] Michael Beasley
## [1] O.J. Mayo
## [1] Russell Westbrook
## [1] Kevin Love

Having done the same operation with a for loop, you can decide which is easier to use.

Your Turn

  1. Create a function that takes numeric input and provides the mean and a 95% confidence interval for the mean for the data (Hint: The t.test() function computes both)
  2. Construct checks to your function to make sure the data is either numeric. If it is not, stop the program.
  3. Add on a (for or while) loop to your function. Apply this to the Points Per Game and Win Share of the nba data set.

Answers

1.

mean.interval <- function(data) {
    output <- t.test(data)
    return(output)
}

2.

mean.interval <- function(data) {
  if (!is.numeric(data)) { 
        stop("Data input is required")
    } else {
    output <- t.test(data)
    return(output)
    }
}

3.

mean.interval(nba$Points.Per.Game)
## 
##  One Sample t-test
## 
## data:  data
## t = 23.079, df = 168, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  7.858401 9.328582
## sample estimates:
## mean of x 
##  8.593491

mean.interval(nba$Win.Share)
## 
##  One Sample t-test
## 
## data:  data
## t = 11.733, df = 168, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   7.870519 11.054925
## sample estimates:
## mean of x 
##  9.462722