Dates and Times

  • Dates and times are deceptively tricky to work with
  • Formats - Is 02/05/2017 February 5 or May 2?
  • Time Zones
  • POSIXct and POSIXlt format in R is difficult to work with

Lubridate Package

  • available from CRAN (install.packages("lubridate"))
  • Written by Garett Grolemund and Hadley Wickham
  • associated paper

Instants of Time Examples

library(lubridate)

One moment in time, usually named, e.g.

now() # Date with time
## [1] "2017-08-22 16:40:10 CDT"
as.Date(now()) # Just the date
## [1] "2017-08-22"

Libridate turns strings into instants with functions that have y, m, and d in their names

ymd("2013-05-14")
## [1] "2013-05-14"
mdy("05/14/2013")
## [1] "2013-05-14"
dmy("14052013")
## [1] "2013-05-14"
ymd_hms("2013:05:14 14:50:30")
## [1] "2013-05-14 14:50:30 UTC"

Order matters!

Working with Instants

Standard arithmetic operations now work on dates:

ymd("2017-07-23") > ymd("1970-01-01")
## [1] TRUE
myd("07-2017-23") - ymd("1970-01-01")
## Time difference of 17370 days

Functions for extracting pieces of dates:

month(now())
## [1] 8
wday(now())
## [1] 3
wday(now(), label=TRUE)
## [1] Tues
## Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat

Your Turn

  1. The last time the Cleveland Indians won the World Series of Baseball was October 11th, 1948. How long has it been in days?
  2. Wayne Gretzky's first NHL game was 1980-04-08 (ymd) and his last was 1999-04-18 (ymd). How many days was he an NHL player?

Answers

1.

as.Date(now()) - mdy("10-11-1948")
## Time difference of 25152 days

2.

ymd("1999-04-18") - ymd("1980-04-08")
## Time difference of 6949 days

Accessor Functions

Component Function
Year year()
Month month()
Day of the year yday()
Day of the month mday()
Day of the week wday()
Hour hour()
Minute minute()
Second second()
Time zone tz()

Example

What day of the week did the Boston Celtics play their first game in the 2008 NBA Playoffs?

nba.playoffs <- read.csv("NBA Playoffs.csv")
head(nba.playoffs, n = 2)
##      Team Opp      Date Number Round Game Location W.L Importance Points
## 1 Celtics ATL 4/20/2008      1     1    1     Home   W          0    104
## 2 Celtics ATL 4/23/2008      2     1    2     Home   W          0     96
##   Oppo Diff Team.FG Team.FGA Team.FG. Team.3P Team.3PA Team.3P. Team.FT
## 1   81   23      38       81    0.469       9       16    0.563      19
## 2   77   19      35       84    0.417       7       18    0.389      19
##   Team.FTA Team.FT. Team.ORB Team.TRB Team.AST Team.STL Team.BLK Team.TOV
## 1       24    0.792       13       40       22        6        7       10
## 2       26    0.731       13       45       23       15        3       12
##   Team.PF Opponent.FG Opponent.FGA Opponent.FG. Opponent.3P Opponent.3PA
## 1      21          29           76        0.382           3           14
## 2      33          23           60        0.383           0            5
##   Opponent.3P. Opponent.FT Opponent.FTA Opponent.FT. Opponent.ORB
## 1        0.214          20           28        0.714           16
## 2        0.000          31           40        0.775            5
##   Opponent.TRB Opponent.AST Opponent.STL Opponent.BLK Opponent.TOV
## 1           41           16            4            8           15
## 2           35           10            4            5           21
##   Opponent.PF Advanced.ORtg Advanced.DRtg Advanced.Pace Advanced.FTr
## 1          18         123.6          96.2          84.2        0.296
## 2          22         105.4          84.5          91.1        0.310
##   Advanced.3PAr Advanced.TS. Advanced.TRB. Advanced.AST. Advanced.STL.
## 1         0.198        0.568          49.4          57.9           7.1
## 2         0.214        0.503          56.3          65.7          16.5
##   Advanced.BLK. Offensive.Four.Factors.eFG. Offensive.Four.Factors.TOV.
## 1          11.3                       0.525                         9.8
## 2           5.5                       0.458                        11.2
##   Offensive.Four.Factors.ORB. Offensive.Four.Factors.FT.FGA
## 1                        34.2                         0.235
## 2                        30.2                         0.226
##   Defensive.Four.Factors.eFG. Defensive.Four.Factors.TOV.
## 1                       0.401                        14.5
## 2                       0.383                        21.3
##   Defensive.Four.Factors.DRB. Defensive.Four.Factors.FT.FGA
## 1                        62.8                         0.263
## 2                        86.5                         0.517

#wday() will tell us which day of the week this date is
wday("2008-04-20",label = TRUE)
## [1] Sun
## Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat

Your Turn

  1. What day is most common for NBA playoff games? (Hint: Make a bar plot and use the as.Date(mdy()) command)

Answers

1.

Sunday

library(ggplot2)
nba.playoffs <- read.csv("NBA Playoffs.csv")

playoffs.dates <- nba.playoffs$Date #Creates vector of dates
playoffs.dates <- as.Date(mdy(playoffs.dates)) # Changes structure
playoffs.dates <- wday(playoffs.dates, label = T) # Computes day of week

qplot(playoffs.dates, geom = "bar",
      main = "Barplot of Days for NBA Playoff Games")

What is a Time Series?

  • series of data points sequenced with a time ordering

Example

Time series of points scored by Boston Celtics in 2008 NBA Playoffs using ggplot()

boston <- read.csv("Boston Celtics.csv")
#Convert date to a date structure type
boston$Date <- as.Date(boston$Date,"%m/%d/%Y")

ggplot(data = boston,aes(x = Date,y = Points,group = 1)) + 
  geom_point() + 
  geom_line() + 
  #For date_labels we use the strptime notation
  scale_x_date(name = 'Date', # Creates x-axis scale
               date_breaks = '4 days',
                date_labels = '%a') +
  ggtitle("Time Series for Points Scored by Boston Celtics in 2008 Playoffs")

From this time series plot, we can see that the Celtics scored less points in the middle stages of the playoffs. This could perhaps be due to some fatigue factor or strength of opponent.

Your Turn

  1. Make time series plot for the 2008 Boston Celtics Playoff run for their team assists (Team.AST).

  2. What can you interpret from this?

Answers

1.

library(ggplot2)
boston <- read.csv("Boston Celtics.csv")
#Convert date to a date structure type
boston$Date <- as.Date(boston$Date,"%m/%d/%Y")

ggplot(data = boston,aes(x = Date,y = Team.AST,group = 1)) + 
  geom_point() + 
  geom_line() + 
  #For date_labels we use the strptime notation
  scale_x_date(name = 'Date', date_breaks = '4 days',
        date_labels = '%a') +
  ggtitle("Time Series for Team Assists by Boston Celtics in 2008 Playoffs")

2.

As with the time series of points scored, we see there are less total assists in the middle stages of the playoffs. This is most likely a result of the strength of the opponent of some other explainable factors.