---
title: "R Basics"
output:
ioslides_presentation:
smaller: true
---
## Loading NBA Data set
This data set displays the statistical outputs of players drafted from 2008 - 2012 up until 2014.
```{r}
nba <- read.csv("NBA Draft Class.csv")
names(nba) # See all column names
```
## Some Computations
Recall that `head()` displays the first 6 rows of our data.
```{r}
head(nba)
```
How many more games did Michael Beasley play in his first 6 seasons compared to Derrick Rose?
```{r}
# Addition and Subtraction
409-289
#How many combined?
409+289
```
## Multiplication/Division
How many games could Derrick Rose have played in his first 6 seasons? Note that there are 82 regular season games in the NBA.
```{r}
6*82
#How durable is Derrick Rose?
289/492
```
To be specific, in his first 6 seasons, Derrick Rose only played in about 59% of possible regular season games.
## More Calculator Operations
```{r}
# Integer division
82 %/% 10
# Modulo operator (Remainder)
82 %% 10
# Powers
8^3
```
## Even More Functions
- Exponentiation
- `exp(x)`
- Logarithms
- `log(x)`
- `log(x, base = 10)`
- Trigonometric functions
- `sin(x)`
- `asin(x)`
- `cos(x)`
- `tan(x)`
## Creating Variables
We can create variables using the assignment operator `<-`:
```{r}
paul.george <- 13
```
We can then perform any of the functions on the variables:
```{r}
# Logarithm
log(paul.george)
# Square root
sqrt(paul.george)
# Square
paul.george^2
```
## Rules for Variable Creation
- Variable names can't start with a number
- Variables in R are case-sensitive
- Some common letters are used internally by R and should be avoided as variable names (c, q, t, C, D, F, T, I)
- There are reserved words that R won't let you use for variable names e.g. (for, in, while, if, else, repeat, break, next)
- R will let you use the name of a predefined function. Try not to overwrite those though!
## Vectors
A variable does not need to be a single value. We can create a **vector** using the `c()` (combine) function:
What is the total number of points by 2008 top 5 draft picks through their first 6 seasons?
```{r}
head(nba)
```
##
```{r}
y <- c(6017, 5416, 6447, 8834, 6989) # Creates vector of total points by top 5 picks
```
This displays, on average, the total number of points score per season.
Operations will then be done element-wise:
```{r}
y / 6
#Average points per game through first 6 seasons
z <- y / 6
z / 82
```
This displays, on average, the number of points scored per game.
## Getting Help
We will talk MUCH more about vectors in a later, but for now, let's talk about a couple ways to get help. The primary function to use is the `help` function. Just pass in the name of the function you need help with:
```{r, eval=FALSE}
help(head)
```
The `?` function also works:
```{r, eval=FALSE}
?head
```
Googling for help is a bit hard. You might need to search for R + CRAN + to get good results
## R Reference Card
You can download and R reference card from:
http://cran.r-project.org/doc/contrib/Short-refcard.pdf
Having this open or printed off and near you while working is helpful until you master the basics.
## Your Turn
Using the R Reference Card (and the Help pages, if needed), do the following:
1. Find out how many rows and columns the nba data set has. Figure out at least 2 ways to do this.
2. Create a vector with the number of games played for the top 5 players.
3. On average, how many games a season out of 6 total, did those 5 players participate?
## Answers
### 1.
```{r}
dim(nba) # Finds dimension of data frame
str(nba) # Finds structure of data
```
##
### 2.
```{r}
games <- c(289,409,435,440,364) # Vector of games
```
##
### 3.
```{r}
games/6 # Divide games by 6 seasons
```
## Some Useful Functions
There are a whole variety of useful functions to operate on vectors. A couple of the more common ones are `length()`, which returns the length (number of elements) of a vector, and `sum()`, which adds up all the elements of a vector.
```{r}
length(games) # calculates the length of this vector
sum(games) # Calculates the sum of the vector elements
```
## Data Frames Introduction
- `nba` is a data frame.
- Data frames hold data sets
- Not every column need be the same type - like an Excel spreadsheet
- Each column in a data frame is a vector - so each column needs to have values that are all the same type.
- We can access different columns using the `$` operator.
```{r}
draft <- nba$Year # Creates column named draft
school <- nba$College # Creates column named school
points <- nba$Total.Points # Creates column named points
```
## More about Vectors
A vector is a list of values that are all the same type. We have seen that we can create them using the `c()` or the `rep()` function. We can also use the `:` operator if we wish to create consecutive values:
```{r}
a <- 10:15
a
```
We can extract the specific elements of the vector like so:
```{r}
school[3] # Selects the 3rd school in the school column
```
The 69 levels represents the only 69 total schools who had a player drafted in this data set
## Indexing Vectors
We saw that we can access individual elements of the vector. But **indexing** is a lot more powerful than that:
```{r}
head(school)
school[c(1, 3, 5)] # Selects the 1st, 3rd, and 5th school
```
##
```{r}
school[1:6] # Selects the 1st through 6th school
```
## Logical Values
- R has built in support for logical values
- TRUE and FALSE are built in. T (for TRUE) and F (for FALSE) are supported but can be modified
- Logicals can result from a comparison using
- $<$
- $>$
- $<=$
- $>=$
- $==$
- $!=$
## Indexing with Logicals
We can index vectors using logical values as well:
```{r}
x <- points[1:5] #Pulls the total points for first 5 players
x > 6000 # Which of the first 5 players points is greater than 6000
x[x < 6000] # Which is less than 6000
```
We interpret this to mean that the second player, Michael Beasley, did not score 6000 points by the end of his 6th season.
## Logical Examples
We gather the total minutes played for the players in the 2008 NBA draft.
```{r}
minutes <- (nba$Minutes[nba$Year == 2008])
# creates variable, minutes, which is the minutes played by everyone in the 2008 draft class
str(minutes)
```
We see which minutes are below 8000 to find certain players which are labeled busts
```{r}
bust <- minutes < 8000 # Finds players that play less than 8000 minutes
minutes[bust]
```
##
This code locates players from 2008 who correspond to those minutes.
```{r}
(nba$Player[bust][nba$Year == 2008])
```
## Your Turn
1. Which college did DeMar DeRozan attend? Note: There are many ways to answer this. Some are faster than others.
2. Find out how many players from the 2008 draft scored more than 6,000 points in their first 6 seasons.
**Challenge**: Calculate the sum of the total points for everyone who scored more than 6,000 points. (Hint: Make use of `%in%`)
## Answers
### 1
```{r}
nba$College[nba$Player == "DeMar DeRozan"]
```
##
### 2.
```{r}
# Finds the players who minutes are above 6000
players.6000 <- (nba$Player[nba$Minutes > 6000])
players.6000
```
##
### **Challenge**
```{r}
# Finds the minutes for those players
each.points <- nba$Minutes[nba$Player %in% players.6000]
sum.each.points <- sum(each.points) # Sums each element in vector
sum.each.points
```
## Modifying Vectors
We can modify vectors using indexing as well. Here we create a new data frame that consists of the first 5 columns of the NBA data set.
```{r}
x <- nba[1:5]
head(x)
```
##
We replace all the years with 2014.
```{r}
x[1] <- 2014
head(x)
```
## Vector Elements
Elements of a vector must all be the same type:
```{r}
head(minutes)
minutes[bust] <- ":-(" #Replacing minutes below 6000 with a frownie face.
head(minutes)
minutes
```
By changing a value to a string, all the other values got changed as well.
## Data Types in R
- Can use `mode()` or `class()` to find out information about variables
- `str()` is useful to find information about the structure of your data
- Many data types: numeric, integer, character, date, and factor most common
```{r}
str(nba)
```
## Converting Between Types
We can convert between different types using the `as` series of functions:
```{r}
assists <- head(nba$Total.Assists) # Creates vector of first 6 players assist totals
assists
as.character(assists) # Converts to character
as.numeric("2")
```
Notice that in one instance there are quotation marks and in the other there is not. Hence one is a character and the other is numeric.
## Statistical Functions
Using the basic functions we've learned it wouldn't be hard to compute some basic statistics.
```{r}
(n <- length(points)) # Assigns n to be the number of elements in points
(meanpoints <- sum(points)/n) # Calculates mean by usual formula
# Calculates standard deviation by usual formula
(standdev <- sqrt(sum((points - meanpoints)^2) / (n - 1)))
```
This is fairly easy, that is, if you know the formulas!
## Built-in Statistical Functions
We don't need to memorize formulas. R does the work for us!
```{r}
mean(points) # Calculates mean
sd(points) # Calculates standard deviation
summary(points) # calculates number summary
quantile(points, c(.025, .975)) # 2.5% and 97.5% quartiles
```
## Element-wise Logical Operators
- `&` (elementwise AND)
- `|` (elementwise OR)
```{r}
c(T, T, F, F) & c(T, F, T, F)
c(T, T, F, F) | c(T, F, T, F)
```
##
Which players averaged more than 20 points and 5 assists a game?
```{r}
condition <- which(nba$Points.Per.Game > 20.0 & nba$Assists.Per.Game > 5.0)
nba[condition,]
```
Just Derrick Rose, Russell Westbrook, Stephen Curry, and Kyrie Irving!
## Your Turn
1. Is it more common to average 20 points and 5 assists a game or 20 points and 5 rebounds a game?
2. Determine which player(s) from the 2009 draft class averaged more than 7.0 rebounds per game.
3. Who scored more points in their first 6 seasons, Russell Westbrook or John Wall?
## Answers
### 1.
```{r}
condition1 <- which(nba$Points.Per.Game > 20.0 & nba$Assists.Per.Game > 5.0)
condition2 <- which(nba$Points.Per.Game > 20.0 & nba$Rebounds.Per.Game > 5.0)
length(condition1) #Finds how many players satisfy condition
length(condition2) #Finds how many players satisfy condition
```
##
### 2.
```{r}
condition <- which(nba$Rebounds.Per.Game > 7.0 & nba$Year == 2009)
nba[condition,]
```
##
### 3.
```{r}
westbrook.points <- nba$Total.Points[nba$Player == "Russell Westbrook"]
wall.points <- nba$Total.Points[nba$Player == "John Wall"]
westbrook.points < wall.points # Compares logical values
```