---
title: "Data Structures"
output:
ioslides_presentation:
smaller: true
---
## Data Frames
- Data Frames are the work horse of R objects
- Structured by rows and columns and can be indexed
- Each column is a specified variable type
- Columns names can be used to index a variable
- Advice for naming variable applys to editing columns names
- Can be specified by grouping vectors of equal length as columns
## Data Frame Indexing
- Elements indexed similar to a vector using `[` `]`
- `df[i,j]` will select the element in the $i^{th}$ row and $j^{th}$ column
- `df[ ,j]` will select the entire $j^{th}$ column and treat it as a vector
- `df[i ,]` will select the entire $i^{th}$ row and treat it as a vector
- Logical vectors can be used in place of i and j used to subset the row and columns
## Adding a New Variable to a Data Frame
- Create a new vector that is the same length as other columns
- Append new column to the data frame using the `$` operator
- The new data frame column will adopt the name of the vector
## Data Frame Demo
Loading previously used NBA data set:
```{r}
nba <- read.csv("NBA Draft Class.csv")
```
Select position column (5th column):
```{r}
nba[,5]
```
## Demo (Continued)
Select team column with the `$` operator:
```{r}
nba$Team
```
## Demo (Continued)
We now determine the row location, in our data, where the team is the Milwaukee Bucks.
```{r}
bucks <- nba$Team == "MIL" # Creates vector of T/F values if the entry is MIL
head(bucks)
```
This output doesn't show much. It would be much easier if we could see which positions are `TRUE`!
```{r}
which(bucks == TRUE) # Tells row number where team is labeled MIL
```
## Demo (Continued)
Displaying part of the NBA data set where the team is Milwaukee by subsetting rows.
```{r}
nba[nba$Team=="MIL", ]
```
## Creating our own Data Frame
Creating our own data frame using the `data.frame()` function:
```{r}
mydf <- data.frame(NUMS = 1:5,
LETS = letters[1:5],
SHOES = c("Nike", "Adidas", "Reebok", "Big Baller Brand", "Adidas"))
mydf
```
Note that in a data frame, each column has to have the same length!
## Renaming columns
We can use the `names()` function to set that first column to lowercase:
```{r}
names(mydf)[1] <- "nums" # Changes the names of the first column in mydf
mydf
```
We can also rename all the columns at once using the `colnames()` command.
```{r}
colnames(mydf) <- c("numbers","letters","shoes") # Changes all columns at once
mydf
```
## Your Turn
1. Construct a data frame where column 1 contains 5 Milwaukee Bucks players and column 2 is their Pick number.
2. Select only the rows where the Pick number is even.
3. Determine which rows of the nba data set contains the Chicago Bulls.
## Answers
### 1.
```{r}
mydf <- data.frame(Player = c("Jennings","Sanders","Fredette","Henson","Antetokounmpo"),
Pick = c(10,15,10,14,15)
)
mydf
```
##
### 2.
```{r}
mydf[c(1,3,4),]
```
##
### 3.
```{r}
bulls <- nba$Team == "CHI"
which(bulls == TRUE)
```
## Lists
- Lists are a structured collection of R objects
- R objects in a list need not be the same type
- Create lists using the `list` function
- Lists indexed using double square brackets `[[ ]]` to select an object
## List Example
Creating a list containing a matrix of size 2 by 5, and a vector of length 5, and a string:
```{r}
mylist <- list(matrix(letters[1:10], nrow = 2, ncol = 5),
c("Brady, Rodgers, Romo, Newton, Wilson"),
"The Chicago Cubs won the 2016 World Series")
mylist
```
Note that unlike data frames, list can contain elements of varying sizes and structures.
Use indexing to select the second list element:
```{r}
mylist[[3]] # Selections third argument in mylist
```
## Your Turn
1. Create a list containing `mydf` as well as a vector of length 5 containing NFL wide receivers
2. Use indexing to select mydf from your list
## Answers
### 1.
```{r}
mylist <- list(mydf,
c("Nelson","Bryant","Crabtree","Fitzgerald","Jones"))
```
### 2.
```{r}
mylist[[1]]
```
## Examining Objects
- `head(x)` - View top 6 rows of a data frame
- `tail(x)` - View bottom 6 rows of a data frame
- `summary(x)` - Summary statistics
- `str(x)` - View structure of object
- `dim(x)` - View dimensions of object
- `length(x)` - Returns the length of a vector
## Examining Objects Demo
We can examine the first two values of an object by passing the `n` parameter to the `head()` function:
```{r}
head(nba, n = 2) # n = 2 displays onlt the first two rows.
```
##
What's its structure?
```{r}
str(nba)
```
## Your Turn
1. View the top 8 rows of nba data
2. What type of object is the nba data set?
3. How many rows are in nba data set? (try finding this using dim or indexing + length)
## Answers
### 1.
```{r}
head(nba,n = 8)
```
##
### 2.
```{r}
str(nba)
# data frame
```
##
### 3.
```{r}
dim(nba)
dim(nba)[1] #Picks first output element
```
## Working with Output from a Function
- Can save output from a function as an object
- An object is generally a list of output objects
- Can pull off items from the output for further computing
- Examine objects using functions like `str()`
## Saving Output Demo
- Apply t-test using the NBA data set to see if the Points Per Game for players drafted in 2008 and 2010 are statistically different
- `t.test()` can only handle two groups, so we subset out the every other year.
## Demo (Continued)
Save the output of the t-test to an object:
```{r}
tout <- t.test(Points.Per.Game ~ Year, data = nba[nba$Year %in% c("2008","2010"), ])
tout
```
An interpretation of this is that there is a statistical difference in the average points scored between the 2008 and 2010 NBA draft classes. This is a possible way to determine the strength of a particular draft.
##
Let's look at the structure of this object:
```{r}
str(tout)
```
## Demo: Extracting the P-Value
Since this is simply a list, we can use our regular indexing:
```{r}
tout$p.value
tout[[3]]
```
## Your Turn
1. Pull the p-value from t.test comparing the difference between Win Shares from the 2009 and 2011 NBA draft class.
2. What does this p-value imply?
## Answer
### 1.
```{r}
tout <- t.test(Win.Share ~ Year, data = nba[nba$Year %in% c("2008","2010"), ])
tout
```
##
### 2.
` Since p = 1.027e-05 < .05, we are 95% confident there is a difference in the means of the two groups win shares. From this we can claim that the 2008 NBA draft class is superior to that of 2010.`