---
title: "Introduction to Probability and Statistics"
output:
ioslides_presentation:
smaller: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Outline
- Give a brief overview of probabilty and statistics need for data analytics
## Summary Statistics
A few common summary statistics for interpretting data are:
- **mean**: Average value
- **median**: 50th percentile (middle) value
- **mode**: Most frequently occuring value
- **minimum**: Minimum value
- **maximum**: Maximum value
- **range**: Distance from largest to smallest value
- **standard deviation**: Measures spread of data
## Example with Passing Yard of Aaron Rodgers
Below we create a vector where each element consists of seasonal passing yards of Aaron Rodgers of the Green Bay Packers over his 12 year career, that is, upto the 2017-2018 NFL season .
```{r}
passing.yards.ar <- c(65,46,218,4038,4434,3922,4643,4295,2536,4381,3821,4428)
passing.yards.ar
```
```{r}
mean(passing.yards.ar) # Mean
median(passing.yards.ar) # Median
```
##
```{r}
mode(passing.yards.ar) # Mode
min(passing.yards.ar) # Min
max(passing.yards.ar) # Max
range(passing.yards.ar) # Range
sd(passing.yards.ar) # Standard deviation
```
## Median vs. Mean
Both are statistics measures that try to understand the central tendency of a set of data points. In some cases, using one is better than the other.
## Example Median vs. Mean
Let's take Aaron Rodgers first 4 years in the NFL. Which value would be a more accurate indicator of his passing ability in his first 4 years?
```{r}
passing.yards.ar.4 <- passing.yards.ar[1:4] # Selects first 4 elements
passing.yards.ar.4
median(passing.yards.ar.4) # Median
mean(passing.yards.ar.4) # Mean
```
##
Certainly an average is a popular and most natural measure of a midpoint. However, it suffers because it can be greatly affected if there is one value that is significantly higher or lower than the other data points. This is an example of why one may choose the median over the mean.
## Your Turn
1. Apply the `summary()` function to Aaron Rodgers' passing yard over 12 years.
2. Based on Aaron Rodger's passing yard values, would take the mean or mode to be a better estimate of his career passing yards?
## Answers
### 1.
```{r}
summary(passing.yards.ar)
```
### 2.
From a statistical point of view, I would take his median value of 3980. This is because his first 3 years are not a real representation of his passing yardage because of lack of playing time. He is consistently around 4000 passing yards each season.
## Confidence Intervals
A confidence interval is a range of values such that a true mean will lie inside the interval with a high probability.
## Example of Confidence Interval with Passing Yards of Brett Favre
Below we create a vector where each element consists of seasonal passing yards of Brett Favre over his entire career.
```{r}
# Create vector of Brett Favre's passing yards
passing.yards.bf <- c(0,3227,3303,3882,4413,3899,3867,4212,4091,
3812,3921,3658,3361,4088,3881,3885,4155,
3472,4202,2509)
passing.yards.bf
```
##
Computing the confidence interval along with other values:
```{r}
t.test(passing.yards.bf)
```
Keeping our eyes on the important part, we see our confidence interval is `(3146.78,4037.02)`. From this, since $p<.05$, we can say that we are 95% sure that true mean of Brett Favre's career passing yards will lie in this interval.
## Comparing Brett Favre and Aaron Rodgers
We can use the `t.test(x,y)` function with two inputs to quickly compute the means of two lists.
```{r}
t.test(passing.yards.bf,passing.yards.ar)
```
Without going into too much detail about the statistics. Looking at the means of $x$, Brett Favre, and $y$, Aaron Rodgers, we can see that there is a clear difference between the two.
## Your Turn
Number of interceptions from Aaron Rodgers and Brett Favre by season:
```{r}
intercepts.ar <- c(1,0,0,13,7,11,6,8,6,5,8,7) # Aaron Rodgers
intercepts.bf <- c(2,13,24,14,13,13,16,23,23, # Brett Favre
16,15,16,21,17,29,18,15,22,7,19)
```
1. What is the 95% confidence interval for Aaron Rodgers' and Brett Favre's mean interceptions per season?
2. Is there a clear difference in the number of interceptions between the two?
## Answers
### 1.
```{r}
t.test(intercepts.ar)
```
##
```{r}
t.test(intercepts.bf)
```
Aaron Rodgers is `(3.4,8.6)`.
Brett Favre is `(14.0,19.6)`.
##
### 2.
```{r}
t.test(intercepts.ar,intercepts.bf)
```
There is a clear distinction between their average number of interceptions!