class: center, middle, inverse, title-slide .title[ # Visualizing Time Series Data ] .author[ ### Thiyanga S. Talagala ] --- ### R packages need for time series visualization ```r library(tsibble) library(lubridate) library(tidyverse) ``` --- # Data package ```r # install.packages("devtools") #devtools::install_github("thiyangt/denguedatahub") library(denguedatahub) ``` More information: https://denguedatahub.netlify.app/ --- # Data sets in `denguedatahub` ```r vcdExtra::datasets("denguedatahub") ``` ``` ## Item class dim ## 1 americas_annual_data data.frame 899134x5 ## 2 cdc_usa_dengue_infection data.frame 9039x6 ## 3 china_annual_data data.frame 16x5 ## 4 india_annual_data data.frame 432x5 ## 5 level_of_risk data.frame 138x3 ## 6 philippines_daily_data data.frame 32701x5 ## 7 singapore_weekly_data data.frame 272x3 ## 8 srilanka_weekly_data data.frame 21934x6 ## 9 world_annual data.frame 2773284x10 ## Title ## 1 Dengue and severe dengue cases and deaths for subregions of the Americas ## 2 Annual number of dengue fever infections in the USA ## 3 Dengue related data in china ## 4 DENGUE/DHF situation in India since 2017 ## 5 Level of Dengue risk around the world ## 6 Daily number of dengue fever infections in Philippines ## 7 Weekly number of dengue fever infections in Sri Lanka ## 8 Weekly number of dengue fever infections in Sri Lanka ## 9 Annual number of dengue fever infections around the world ``` --- class: inverse, center, middle # Univariate Time Series Visualization --- # `srilanka_weekly_data` ```r srilanka_weekly_data ``` ``` ## # A tibble: 21,934 × 6 ## year week start.date end.date district cases ## * <dbl> <dbl> <date> <date> <chr> <dbl> ## 1 2006 52 2006-12-23 2006-12-29 Colombo 71 ## 2 2006 52 2006-12-23 2006-12-29 Gampaha 12 ## 3 2006 52 2006-12-23 2006-12-29 Kalutara 12 ## 4 2006 52 2006-12-23 2006-12-29 Kandy 20 ## 5 2006 52 2006-12-23 2006-12-29 Matale 4 ## 6 2006 52 2006-12-23 2006-12-29 NuwaraEliya 1 ## 7 2006 52 2006-12-23 2006-12-29 Galle 1 ## 8 2006 52 2006-12-23 2006-12-29 Hambanthota 1 ## 9 2006 52 2006-12-23 2006-12-29 Matara 11 ## 10 2006 52 2006-12-23 2006-12-29 Jaffna 0 ## # ℹ 21,924 more rows ``` --- # Filter Colombo data ```r colombodf <- srilanka_weekly_data |> filter(district == "Colombo") colombodf ``` ``` ## # A tibble: 844 × 6 ## year week start.date end.date district cases ## <dbl> <dbl> <date> <date> <chr> <dbl> ## 1 2006 52 2006-12-23 2006-12-29 Colombo 71 ## 2 2007 1 2006-12-30 2007-01-05 Colombo 40 ## 3 2007 2 2007-01-06 2007-01-12 Colombo 43 ## 4 2007 3 2007-01-13 2007-01-19 Colombo 38 ## 5 2007 4 2007-01-20 2007-01-26 Colombo 52 ## 6 2007 5 2007-01-27 2007-02-02 Colombo 69 ## 7 2007 6 2007-02-03 2007-02-09 Colombo 36 ## 8 2007 7 2007-02-10 2007-02-16 Colombo 42 ## 9 2007 8 2007-02-17 2007-02-23 Colombo 28 ## 10 2007 9 2007-02-24 2007-03-02 Colombo 20 ## # ℹ 834 more rows ``` --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_point() ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-7-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_point() ``` What limitations or shortcomings do you observe in its current presentation? ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-9-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_point() ``` How can the graph be further improved? ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-11-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_point() ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-13-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_line() ``` We can connect neighbouring points with lines to emphasise the time dependent relationship. ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-15-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_line() ``` - Lines do not represent actual data. - Lines are used to guide the eye. - It is not appropriate to connect points with lines if the observed values are far apart or unevenly distributed. ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-17-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_line() + geom_point() ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-19-1.png)<!-- --> ] --- ** Which is the best option? `Lines only` or `Points and Lines both`** .pull-left[ ![](tsviz_files/figure-html/unnamed-chunk-20-1.png)<!-- --> ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-21-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_area() ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-23-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_area() ``` - Distinguishes the area above and below the curve visually. - More focus is given to the overarching trend in the series. - This visualisation works only if the y-axis is set to zero. ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-25-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_line() ``` How can the graph be further improved? ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-27-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_line() ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-29-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_line() + * scale_x_date(date_breaks = "1 year") ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-31-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_line() + scale_x_date(date_breaks = "1 year") + * theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-33-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_line() + * scale_x_date(date_breaks = "1 month") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-35-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_line() + * scale_x_date(date_breaks = "1 year") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-37-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_line() + * scale_x_date(date_breaks = "1 year") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-39-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_line() + * scale_x_date(date_breaks = "1 year", date_labels = "%b-%y") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + xlab("Time") + ylab("Dengue Cases") ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-41-1.png)<!-- --> ] --- ## start.data vs cases .pull-left[ ```r ggplot(data=colombodf, aes(x=start.date, y=cases)) + geom_line() + * scale_x_date(date_breaks = "1 year", date_labels = "%Y-%m-%d") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + xlab("Time") + ylab("Dengue Cases") ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-43-1.png)<!-- --> ] --- ![](tsviz_files/figure-html/unnamed-chunk-44-1.png)<!-- --> -- What are the key takeaways from this chart? --- class: inverse, middle, center # Formatting time axis --- # Filter 2019 data for Colombo district ```r colombo2019 <- srilanka_weekly_data |> filter(district == "Colombo", year==2019) colombo2019 ``` ``` ## # A tibble: 52 × 6 ## year week start.date end.date district cases ## <dbl> <dbl> <date> <date> <chr> <dbl> ## 1 2019 1 2018-12-29 2019-01-04 Colombo 299 ## 2 2019 2 2019-01-05 2019-01-11 Colombo 326 ## 3 2019 3 2019-01-12 2019-01-18 Colombo 259 ## 4 2019 4 2019-01-19 2019-01-25 Colombo 341 ## 5 2019 5 2019-01-26 2019-02-01 Colombo 232 ## 6 2019 6 2019-02-02 2019-02-08 Colombo 201 ## 7 2019 7 2019-02-09 2019-02-15 Colombo 215 ## 8 2019 8 2019-02-16 2019-02-22 Colombo 144 ## 9 2019 9 2019-02-23 2019-03-01 Colombo 153 ## 10 2019 10 2019-03-02 2019-03-08 Colombo 170 ## # ℹ 42 more rows ``` --- class: middle, centre, inverse # Your turn: Obtain the following chart ![](tsviz_files/figure-html/unnamed-chunk-46-1.png)<!-- --> --- .pull-left[ ```r base ``` ![](tsviz_files/figure-html/unnamed-chunk-47-1.png)<!-- --> ] .pull-right[ - How can the plot be improved? ] --- ```r base + geom_point() ``` ![](tsviz_files/figure-html/unnamed-chunk-48-1.png)<!-- --> --- class: middle, center # Breaks: `date_breaks` --- ```r base + geom_point() + scale_x_date(date_breaks = "2 months") ``` ``` ## Scale for x is already present. ## Adding another scale for x, which will replace the existing scale. ``` ![](tsviz_files/figure-html/unnamed-chunk-49-1.png)<!-- --> --- class: middle, center # Minor breaks: `date_minor_breaks` --- .pull-left[ ```r base + geom_point() + scale_x_date(date_breaks = "1 month") ``` ![](tsviz_files/figure-html/unnamed-chunk-50-1.png)<!-- --> ] .pull-right[ ```r base + geom_point() + scale_x_date(date_breaks = "1 month", date_minor_breaks = "1 week") ``` ![](tsviz_files/figure-html/unnamed-chunk-51-1.png)<!-- --> ] --- class: middle, center # Labels: `date_labels` --- ```r base + geom_point() + scale_x_date(date_breaks = "1 month", date_minor_breaks = "1 week", date_labels = "%b") ``` ![](tsviz_files/figure-html/unnamed-chunk-52-1.png)<!-- --> --- .pull-left[ |Label | Description| |---|---| |`%S` | seconds (00-59)| |`%M`| minutes (00-59)| |`%l`| hour in 12-hour clock, 1, 2, 3..12| |`%I`| hour in 12-hour clock, 01, 02,...| |`%p`|am\_pm| |`%H`| hour in 24-hour clock| |`%a`|day of week, Mon, Tue,..| |`%A`|full name of day of week| ] .pull-right[ |Label | Description| |---|---| |`%e` | day of month 1, 2, 3 | |`%d` | day of month 01, 02, 03 | |`%m`|month 1, 2, 3,..| |`%b`|month, Jan, Feb..| |`%B`|full name of month| |`%y`|year, 01, 02,..| |`%Y`|year, 2001, 2002,..| ] --- class: middle,inverse, center ## Your turn: Modify the chart as follows ![](tsviz_files/figure-html/unnamed-chunk-53-1.png)<!-- --> --- ### Axis label modifications .pull-left[ ```r base + geom_point() + scale_x_date( date_breaks = "1 month", date_minor_breaks = "1 week", date_labels = "%b %y") ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-55-1.png)<!-- --> ] --- ### Axis limits .pull-left[ ```r lim <- as.Date(c("2019-01-01", "2019-12-31")) base + geom_point() + scale_x_date( limits=lim, date_breaks = "1 month", date_minor_breaks = "1 week", date_labels = "%b %y") ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-57-1.png)<!-- --> ] --- ## Modify axis labels .pull-left[ ```r base + geom_point() + scale_x_date( limits=lim, date_breaks = "1 month", date_minor_breaks = "1 week", date_labels = "%B\n%Y") ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-59-1.png)<!-- --> ] --- ## Modify axis labels .pull-left[ ```r base + geom_point() + scale_x_date( limits=lim, date_breaks = "1 month", date_minor_breaks = "1 week", labels = scales::label_date("%B\n%Y")) ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-61-1.png)<!-- --> ] --- ## Modify axis labels .pull-left[ ```r base + geom_point() + scale_x_date( limits=lim, date_breaks = "1 month", date_minor_breaks = "1 week", labels = scales::label_date_short()) ``` ] .pull-right[ ![](tsviz_files/figure-html/unnamed-chunk-63-1.png)<!-- --> ] --- class: inverse, middle, center # Grouping --- # Seasonal plots .pull-left[ ```r ggplot(data=colombodf, aes(x=as.factor(week), y=cases, group=year)) + geom_line() ``` ] .pull-left[ ![](tsviz_files/figure-html/unnamed-chunk-65-1.png)<!-- --> ] --- # Seasonal plots .pull-left[ ```r library(viridis) ggplot(data=colombodf, aes(x=as.factor(week), y=cases, group=year, col=year)) + geom_line() + scale_color_viridis() ``` ] .pull-left[ ![](tsviz_files/figure-html/unnamed-chunk-67-1.png)<!-- --> ] --- class: inverse, center, middle .pull-left[ ![](tsviz_files/figure-html/unnamed-chunk-68-1.png)<!-- --> ] .pull-right[ Task 1: Level up the plot. Task 2: Draw some other alternatives to show the seasonal behaviour. ]