2026-02-05


The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
The core tidyverse contains these packages:
ggplot2, for data visualisation.dplyr, for data manipulation.tidyr, for data tidying.readr, for data import.purrr, for functional programming.tibble, for tibbles, a modern re-imagining of data frames.stringr, for strings.forcats, for factors.lubridate, for date/times.dplyr
dplyris a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges.
dplyrSome core dplyr functions:
mutate() creates new columnsselect() subsets existing columnsfilter() subsets rows by a logical conditionsummarize() creates summary stats for one or more columnsgroup_by() creates grouping variables that are respected by the other functionstidyr
tidyrThe goal of tidyr is to help you create tidy data. Tidy data is data where:
tidyrWe will primarily use tidyr to reshape data:
pivot_longer() reshapes wide data to long datapivot_wider() reshapes long data to wide data| Dataset | Mean | SD | Var |
|---|---|---|---|
| A | 10 | 3 | 9 |
| B | 7 | 4 | 16 |
| Dataset | Stat | Value |
|---|---|---|
| A | Mean | 10 |
| A | SD | 3 |
| A | Var | 9 |
| B | Mean | 7 |
| B | SD | 4 |
| B | Var | 16 |
Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one “raw” data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics… Data analysts typically spend the majority of their time in the process of data wrangling compared to the actual analysis of the data.

For a view through the tidyverse lens of data wrangling, see Chapters 9-16 of R for Data Science.
In our context, we want to use data wrangling to get our data in a format conducive to visualization.
For example, to create visualizations with ggplot2, we typically want our data in long rather than wide format.
Conversely, calculating derived attributes (columns) is typically easier in wide rather than long format.
We are going to do some data wrangling in the code walkthrough (next).
But for even more details, see: