Lecture 14: Reducing Items and Attributes

Brian J. Smith

2026-03-26

Reducing Items and Attributes

The beginning of this lecture is based on Chapter 13 of Visualization Analysis & Design.


“Reduce Items and Attributes”


For this lecture, I’m going to start with the VAD Text, then transition over to R.

Visualization Analysis & Design Cover

Fig. 13.1

Fig. 13.1

Reducing Items and Attributes

Why?

  • Manage complexity in our visualizations.
  • Find a strategy that reduces the visual complexity while minimizing the changes of hiding important information from the user.

Reducing Items and Attributes

How?

  • Filter:
    • Eliminating elements (items or attributes).
    • Easy to understand and compute.
    • But, “out of sight, out of mind.”
  • Aggregate:
    • Creating a new single element to replace multiple others.
    • Challenging to design and convey.
    • Safer cognitively to avoid forgetting filtered information.

Filtering

  • Straightfoward and intuitive.
    • Just don’t show some things!
    • Common when exploring a new dataset.
  • Plot only some attributes (columns).
    • Often based on knowledge of what the attributes represent.
  • Plot only some items (rows).
    • Often based on values.
    • Sometimes based on a random subset.

Filtering

Attribute Filtering

  • Often, we have an idea which attributes are most important to us.
    • Prioritize visualizations with those attributes.
Code
data("mtcars")

plot(mtcars)

Filtering

Attribute Filtering

  • This may be combined with attribute ordering.
    • Orders attributes based on their similarity.
    • E.g., highly correlated attributes show some redundant information.
Code
a <- mtcars %>% 
  ggplot(aes(x = disp, y = hp)) +
  geom_point() +
  geom_label(x = 150, y = 300, 
             label = paste("r =", 
                           round(cor(mtcars$disp, 
                                     mtcars$hp,
                                     method = "spearman"),
                                 2))) +
  geom_smooth(method = "lm", se = FALSE) +
  theme_classic()

b <- mtcars %>% 
  ggplot(aes(x = disp, y = mpg)) +
  geom_point() +
  theme_classic()

c <- mtcars %>% 
  ggplot(aes(x = hp, y = mpg)) +
  geom_point() +
  theme_classic()

des <- "AB
        AC"

a + b + c + plot_layout(widths = c(0.3, 0.7),
                        design = des)

Filtering

Item Filtering

  • Filter items based on values.
    • E.g., remove outliers or extreme values.
Code
data("starwars")

a <- starwars %>% 
  ggplot(aes(x = height, y = mass)) +
  geom_point() +
  coord_cartesian(xlim = c(50, 250)) +
  ggtitle("Without Filtering") +
  theme_classic()

b <- starwars %>% 
  filter(mass < 500) %>% 
  ggplot(aes(x = height, y = mass)) +
  geom_point() +
  coord_cartesian(xlim = c(50, 250)) +
  ggtitle("Filter Extreme Values") +
  theme_classic()

a/b

Filtering

Item Filtering

  • Randomly subset items.
    • Useful for (truly) big datasets where plotting all items would be difficult (impossible).
# See ?dplyr::slice_sample

# Can specify number of items
mtcars %>% 
  slice_sample(n = 5)

# Can specify proportion of rows
mtcars %>% 
  slice_sample(prop = 0.1)

Aggregation

  • Creating a new derived element to represent the group.
    • Simple examples include summary statistics.
    • Remember Anscombe’s Quartet!
Anscombe’s quartet. Image from Wikipedia.

Aggregation

Item Aggregation

  • Easily accomplished with idioms that:
    • Display summary statistics.
      • Mean, SD, etc. (moments)
      • Counts, densities
    • Show distributions.

Aggregation

Item Aggregation

1D Summary Stats

Code
a <- starwars %>% 
  filter(mass < 500) %>% 
  ggplot(aes(x = 0, y = mass)) +
  geom_point(show.legend = FALSE) +
  ggtitle("Without Aggregation") +
  labs(x = NULL, y = "Mass") +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

b <- starwars %>%
  filter(mass < 500) %>% 
  ggplot(aes(y = mass)) +
  geom_boxplot(show.legend = FALSE) +
  ggtitle("Aggregate with Summary Stats (Boxplot)") +
  labs(x = NULL, y = "Mass") +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

a/b

Aggregation

Item Aggregation

2D Summary Stats

Code
spp <- starwars %>% 
  filter(!is.na(species)) %>% 
  count(species) %>% 
  filter(n > 1)

sw <- starwars %>% 
  filter(species %in% spp$species) %>% 
  mutate(species = factor(species))

a <- sw %>% 
  ggplot(aes(x = species, y = mass, color = species)) +
  geom_point(show.legend = FALSE) +
  ggtitle("Without Aggregation") +
  labs(x = "Species", y = "Mass") +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

b <- sw %>%
  ggplot(aes(x = species, y = mass, 
             color = species)) +
  geom_boxplot(show.legend = FALSE) +
  ggtitle("Aggregate by Species (Boxplot)") +
  labs(x = "Species", y = "Mass") +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

a/b

Aggregation

Item Aggregation

3D Summary Stats

Code
# 'sw' created on previous slide

a <- sw %>% 
  ggplot(aes(x = height, y = mass, color = species)) +
  geom_point(show.legend = FALSE) +
  coord_cartesian(xlim = c(50, 250)) +
  ggtitle("Without Aggregation") +
  labs(x = "Height", y = "Mass") +
  theme_classic()

b <- sw %>% 
  group_by(species) %>% 
  summarize(mass_mean = mean(mass, na.rm = TRUE),
            mass_sd = sd(mass, na.rm = TRUE),
            height_mean = mean(height, na.rm = TRUE),
            height_sd = sd(height, na.rm = TRUE)) %>% 
  mutate(mass_lwr = mass_mean - mass_sd,
         mass_upr = mass_mean + mass_sd,
         height_lwr = height_mean - height_sd,
         height_upr = height_mean + height_sd) %>% 
  ggplot(aes(x = height_mean, y = mass_mean, 
             color = species)) +
  geom_errorbar(aes(xmin = height_lwr, xmax = height_upr),
                width = 5) +
  geom_errorbar(aes(ymin = mass_lwr, ymax = mass_upr),
                width = 2) +
  geom_point(size = 2) +
  coord_cartesian(xlim = c(50, 250)) +
  ggtitle("Aggregate by Species (mean +/- SD)") +
  labs(x = "Height", y = "Mass") +
  theme_classic()

a/b + plot_layout(guides = "collect")

Aggregation

Item Aggregation

1D Counts/Densities

Code
a <- starwars %>% 
  filter(mass < 500) %>% 
  ggplot(aes(y = 0, x = height)) +
  geom_point(show.legend = FALSE) +
  ggtitle("Without Aggregation") +
  labs(x = "Height", y = NULL) +
  theme_classic()

b <- starwars %>%
  filter(mass < 500) %>% 
  ggplot(aes(x = height)) +
  geom_histogram(bins = 15, color = "black", fill = "gray") +
  ggtitle("Aggregate with Counts (Histogram)") +
  labs(x = "Height", y = "Count") +
  theme_classic()

c <- starwars %>%
  filter(mass < 500) %>% 
  ggplot(aes(x = height)) +
  geom_density(color = "black", fill = "gray") +
  ggtitle("Aggregate with Density (KDE)") +
  labs(x = "Height", y = "Density") +
  theme_classic()

a/(b+c)

Aggregation

Item Aggregation

2D Summary Stats

Code
# 'sw' created on an earlier slide

a <- sw %>% 
  ggplot(aes(x = species, y = mass, color = species)) +
  geom_point(show.legend = FALSE) +
  ggtitle("Without Aggregation") +
  labs(x = "Species", y = "Mass") +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

b <- sw %>%
  filter(!is.na(mass)) %>% 
  mutate(mass_bin = cut_number(mass, n = 5)) %>% 
  group_by(species, mass_bin) %>% 
  tally() %>% 
  ggplot(aes(x = species, y = mass_bin, 
             fill = n)) +
  geom_raster() +
  ggtitle("Aggregate with Counts (Heatmap)") +
  labs(x = "Species", y = "Mass") +
  scale_fill_viridis_c() +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

a/b

Aggregation

Item Aggregation

2D Summary Stats

Code
# 'sw' created on an earlier slide

a <- sw %>% 
  ggplot(aes(x = height, y = mass)) +
  geom_point(show.legend = FALSE) +
  ggtitle("Without Aggregation") +
  labs(x = "Height", y = "Mass") +
  theme_classic()

b <- sw %>%
  filter(!is.na(mass)) %>% 
  ggplot(aes(x = height, y = mass)) +
  geom_density_2d_filled(show.legend = FALSE) +
  ggtitle("Aggregate with Density (2D KDE)") +
  labs(x = "Height", y = "Mass") +
  scale_fill_viridis_d() +
  theme_classic()

a/b

Aggregation

Attribute Aggregation

  • Generally refered to as dimensionality reduction (Wikipedia).
  • Goal is preserve meaningful structure while using fewer attributes.
  • Sometimes, simple solutions exist.
  • Mostly involves complex mathematics.

Aggregation

Attribute Aggregation

  • Simple solution:

\[BMI = kg/m^2\]

Code
# 'sw' created on previous slide

a <- sw %>% 
  ggplot(aes(x = height, y = mass, color = species)) +
  geom_point() +
  coord_cartesian(xlim = c(50, 250)) +
  ggtitle("Without Aggregation") +
  labs(x = "Height", y = "Mass") +
  theme_classic()

b <- sw %>% 
  mutate(bmi = mass/(height^2)) %>% 
  ggplot(aes(x = species, y = bmi, 
             color = species)) +
  geom_boxplot(show.legend = FALSE) +
  ggtitle("Aggregate with BMI") +
  labs(x = "Species", y = "BMI") +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

a / b + plot_layout(guides = "collect")

Aggregation

Attribute Aggregation

Aggregation

Attribute Aggregation

PCA

Code
# 'sw' created earlier; adding BMI here
sw <- sw %>% 
  mutate(bmi = mass/height^2)

pca <- prcomp(~ height + mass, data = sw)
pred <- predict(pca, sw)
sw$pc1 <- pred[, 1]

par(mfrow = c(1, 2))
biplot(pca)
summ <- summary(pca)
barplot(summ$importance[2, ],
        ylim = c(0, 1), ylab = "Proportion",
        main = "Variance Explained")
box()

Aggregation

Attribute Aggregation

PCA

Code
# 'sw' created earlier
a <- sw %>% 
  ggplot(aes(x = height, y = bmi)) +
  geom_point() +
  labs(x = "Height", y = "BMI") +
  theme_classic()

b <- sw %>% 
  ggplot(aes(x = mass, y = bmi)) +
  geom_point() +
  labs(x = "Mass", y = "BMI") +
  theme_classic()

c <- sw %>% 
  ggplot(aes(x = pc1, y = bmi)) +
  geom_point() +
  labs(x = "PC1", y = "BMI") +
  ggtitle("Attribute Agg. with PCA") +
  theme_classic()

des <- "AC
        BC"

a + b + c + plot_layout(widths = c(0.4, 0.6),
                        design = des)

Questions?



BCB5200 Home