## R Packages

• Commonly used R functions are installed with base R
• R packages containing more specialized R functions can be installed freely from CRAN servers using function install.packages()
• After packages are installed, their functions can be loaded into the current R session using the function library()

## Installation Demo

We install packages using the install.packages() command:

install.packages("plyr")

You can install multiple packages at once using the c() command:

install.packages(c("dplyr","data.table"))

Note that when you install packages other dependancy packages may be installed as well.

## Finding R Packages

• How do I locate a package with the desired function?
• Google ("R project" + search term works well)
• R website task views to search relevent subjects: http://cran.r-project.org/web/views/
• ??searchterm will search R help for pages related to the search term
• sos package adds helpful features for searching for packages related to a particular topic

## Handy R Packages

• ggplot2: Statistical graphics
• dplyr/tidyr: Manipulating data structures
• lme4: Mixed models
• knitr: integrate LaTeX, HTML, or Markdown with R for easy reproducible research

## Creating Your Own Functions Outline

Code Skeleton:

foo <- function(arg1, arg2, ...) {
# Code goes here
return(output)
}

Example: Finding the mean of set of numbers

mymean <- function(data) {
ans <- sum(data) / length(data)
return(ans)
}
mymean(1:5)
## [1] 3

## If/Else Statements

Code Skeleton:

if (condition) { # Starting bracket if statement
# Some code that runs if condition is TRUE
} else { # Starting bracket else statement
# Some code that runs if condition is FALSE
}# Ending bracket else statement

Example: Finding the mean of set of numbers with a conditional statement

mymean <- function(data) { # Starting bracket for loop
if (!is.numeric(data)) { # Starting bracket if statement
stop("Numeric input is required")
} else {
ans <- sum(data) / length(data) # computing division
return(ans)
} # Ending bracket if statement
}# Ending bracket for loop

This new function ensures that our argument is a number to continue to the calculations.

## Looping

• Reducing the amount of typing we do can be nice
• If we have a lot of code that is essentially the same we can take advantage of looping.
• R offers several loops: for, while, repeat.

Code Skeleton:

for (i in Indexset) { # Starting bracket for loop
# Do something
} # Ending bracket for loop

Example: Printing first 5 players drafted in 2008

nba <- read.csv("NBA Draft Class.csv")
for (i in 1:5) {  # Indexset is {1,2,3,4,5}
print(nba$Player[i], max.levels = 0) # Print statement # max.levels = 0 surpresses output of all the levels. } ## [1] Derrick Rose ## [1] Michael Beasley ## [1] O.J. Mayo ## [1] Russell Westbrook ## [1] Kevin Love ## For Loops More Involved Example Our indexing set can be elements aside from numbers. id <- c("Total.Points", "Minutes", "Games") # Loops through id and prints out each string for (colname in id) { print(colname) } ## [1] "Total.Points" ## [1] "Minutes" ## [1] "Games" for(colname in id) { print(paste(colname, mymean(nba[, colname]))) # paste() is used to print variables and strings together } ## [1] "Total.Points 2037.2899408284" ## [1] "Minutes 4783.63313609467" ## [1] "Games 200.615384615385" In the second for loop, we are cycling though our indexset and printing the column name along with its associated average. ## While Loops While loops are similar to for loops by stop once a certain condition is met unlike a for loop that continues until the index set is done being iterated through. ## Motivating Example Constructing a while loop to print out first 5 draft picks in nba data set pick <- 1 # Initialize our starting Value while (pick <= 5) { print(nba$Player[pick], max.levels = 0) # Prints each value while hiding the level.
pick <- pick + 1 # Add one to pick
}
## [1] Derrick Rose
## [1] Michael Beasley
## [1] O.J. Mayo
## [1] Russell Westbrook
## [1] Kevin Love

Having done the same operation with a for loop, you can decide which is easier to use.

1. Create a function that takes numeric input and provides the mean and a 95% confidence interval for the mean for the data (Hint: The t.test() function computes both)
2. Construct checks to your function to make sure the data is either numeric. If it is not, stop the program.
3. Add on a (for or while) loop to your function. Apply this to the Points Per Game and Win Share of the nba data set.

### 1.

mean.interval <- function(data) {
output <- t.test(data)
return(output)
}

### 2.

mean.interval <- function(data) {
if (!is.numeric(data)) {
stop("Data input is required")
} else {
output <- t.test(data)
return(output)
}
}

### 3.

mean.interval(nba$Points.Per.Game) ## ## One Sample t-test ## ## data: data ## t = 23.079, df = 168, p-value < 2.2e-16 ## alternative hypothesis: true mean is not equal to 0 ## 95 percent confidence interval: ## 7.858401 9.328582 ## sample estimates: ## mean of x ## 8.593491 mean.interval(nba$Win.Share)
##
##  One Sample t-test
##
## data:  data
## t = 11.733, df = 168, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   7.870519 11.054925
## sample estimates:
## mean of x
##  9.462722