- Dates and times are deceptively tricky to work with
- Formats - Is 02/05/2017 February 5 or May 2?
- Time Zones
- POSIXct and POSIXlt format in R is difficult to work with
install.packages("lubridate")
)library(lubridate)
One moment in time, usually named, e.g.
now() # Date with time
## [1] "2017-08-22 16:40:10 CDT"
as.Date(now()) # Just the date
## [1] "2017-08-22"
Libridate turns strings into instants with functions that have y, m, and d in their names
ymd("2013-05-14")
## [1] "2013-05-14"
mdy("05/14/2013")
## [1] "2013-05-14"
dmy("14052013")
## [1] "2013-05-14"
ymd_hms("2013:05:14 14:50:30")
## [1] "2013-05-14 14:50:30 UTC"
Order matters!
Standard arithmetic operations now work on dates:
ymd("2017-07-23") > ymd("1970-01-01")
## [1] TRUE
myd("07-2017-23") - ymd("1970-01-01")
## Time difference of 17370 days
month(now())
## [1] 8
wday(now())
## [1] 3
wday(now(), label=TRUE)
## [1] Tues ## Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat
as.Date(now()) - mdy("10-11-1948")
## Time difference of 25152 days
ymd("1999-04-18") - ymd("1980-04-08")
## Time difference of 6949 days
Component | Function |
---|---|
Year | year() |
Month | month() |
Day of the year | yday() |
Day of the month | mday() |
Day of the week | wday() |
Hour | hour() |
Minute | minute() |
Second | second() |
Time zone | tz() |
What day of the week did the Boston Celtics play their first game in the 2008 NBA Playoffs?
nba.playoffs <- read.csv("NBA Playoffs.csv") head(nba.playoffs, n = 2)
## Team Opp Date Number Round Game Location W.L Importance Points ## 1 Celtics ATL 4/20/2008 1 1 1 Home W 0 104 ## 2 Celtics ATL 4/23/2008 2 1 2 Home W 0 96 ## Oppo Diff Team.FG Team.FGA Team.FG. Team.3P Team.3PA Team.3P. Team.FT ## 1 81 23 38 81 0.469 9 16 0.563 19 ## 2 77 19 35 84 0.417 7 18 0.389 19 ## Team.FTA Team.FT. Team.ORB Team.TRB Team.AST Team.STL Team.BLK Team.TOV ## 1 24 0.792 13 40 22 6 7 10 ## 2 26 0.731 13 45 23 15 3 12 ## Team.PF Opponent.FG Opponent.FGA Opponent.FG. Opponent.3P Opponent.3PA ## 1 21 29 76 0.382 3 14 ## 2 33 23 60 0.383 0 5 ## Opponent.3P. Opponent.FT Opponent.FTA Opponent.FT. Opponent.ORB ## 1 0.214 20 28 0.714 16 ## 2 0.000 31 40 0.775 5 ## Opponent.TRB Opponent.AST Opponent.STL Opponent.BLK Opponent.TOV ## 1 41 16 4 8 15 ## 2 35 10 4 5 21 ## Opponent.PF Advanced.ORtg Advanced.DRtg Advanced.Pace Advanced.FTr ## 1 18 123.6 96.2 84.2 0.296 ## 2 22 105.4 84.5 91.1 0.310 ## Advanced.3PAr Advanced.TS. Advanced.TRB. Advanced.AST. Advanced.STL. ## 1 0.198 0.568 49.4 57.9 7.1 ## 2 0.214 0.503 56.3 65.7 16.5 ## Advanced.BLK. Offensive.Four.Factors.eFG. Offensive.Four.Factors.TOV. ## 1 11.3 0.525 9.8 ## 2 5.5 0.458 11.2 ## Offensive.Four.Factors.ORB. Offensive.Four.Factors.FT.FGA ## 1 34.2 0.235 ## 2 30.2 0.226 ## Defensive.Four.Factors.eFG. Defensive.Four.Factors.TOV. ## 1 0.401 14.5 ## 2 0.383 21.3 ## Defensive.Four.Factors.DRB. Defensive.Four.Factors.FT.FGA ## 1 62.8 0.263 ## 2 86.5 0.517
#wday() will tell us which day of the week this date is wday("2008-04-20",label = TRUE)
## [1] Sun ## Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat
as.Date(mdy())
command)Sunday
library(ggplot2) nba.playoffs <- read.csv("NBA Playoffs.csv") playoffs.dates <- nba.playoffs$Date #Creates vector of dates playoffs.dates <- as.Date(mdy(playoffs.dates)) # Changes structure playoffs.dates <- wday(playoffs.dates, label = T) # Computes day of week qplot(playoffs.dates, geom = "bar", main = "Barplot of Days for NBA Playoff Games")
Time series of points scored by Boston Celtics in 2008 NBA Playoffs using ggplot()
boston <- read.csv("Boston Celtics.csv") #Convert date to a date structure type boston$Date <- as.Date(boston$Date,"%m/%d/%Y")
ggplot(data = boston,aes(x = Date,y = Points,group = 1)) + geom_point() + geom_line() + #For date_labels we use the strptime notation scale_x_date(name = 'Date', # Creates x-axis scale date_breaks = '4 days', date_labels = '%a') + ggtitle("Time Series for Points Scored by Boston Celtics in 2008 Playoffs")
From this time series plot, we can see that the Celtics scored less points in the middle stages of the playoffs. This could perhaps be due to some fatigue factor or strength of opponent.
Make time series plot for the 2008 Boston Celtics Playoff run for their team assists (Team.AST).
What can you interpret from this?
library(ggplot2) boston <- read.csv("Boston Celtics.csv") #Convert date to a date structure type boston$Date <- as.Date(boston$Date,"%m/%d/%Y") ggplot(data = boston,aes(x = Date,y = Team.AST,group = 1)) + geom_point() + geom_line() + #For date_labels we use the strptime notation scale_x_date(name = 'Date', date_breaks = '4 days', date_labels = '%a') + ggtitle("Time Series for Team Assists by Boston Celtics in 2008 Playoffs")
As with the time series of points scored, we see there are less total assists in the middle stages of the playoffs. This is most likely a result of the strength of the opponent of some other explainable factors.