Post hoc synopsis

Во овој документ започнавме да разгледуваме процедури за линеарна регресија на групирани податоци. Со пакетите од tidyverse и broom можеме многу лесно да направиме регресија за секоја категорија (пол, тежина, натпреварување). Графирање на ваквите резултати е исто така лесно со овие платформи. Клучната техника тука е вгнездување (nest()) на подгрупите, вршење операции врз ваквите вгнездени податоци, и потоа ‘одгнездување’ на посакуваните елементи. Оваа е моќна техника, но не е наједноставна за совладување, бидејќи бара подетално познавање на структурата на tibble, data.frame и list. Овој документ може да служи како минимален вовед за овие процедури и објекти.

Неопходни пакети

library(tidyverse)
library(broom)

Податоците

lifting <-
  ipf_lifts <-
  readr::read_csv(
    "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-10-08/ipf_lifts.csv"
  )
glimpse(lifting)
## Rows: 41,152
## Columns: 16
## $ name             <chr> "Hiroyuki Isagawa", "David Mannering", "Eddy Penge...
## $ sex              <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", ...
## $ event            <chr> "SBD", "SBD", "SBD", "SBD", "SBD", "SBD", "SBD", "...
## $ equipment        <chr> "Single-ply", "Single-ply", "Single-ply", "Single-...
## $ age              <dbl> NA, 24.0, 35.5, 19.5, NA, NA, 32.5, 31.5, NA, NA, ...
## $ age_class        <chr> NA, "24-34", "35-39", "20-23", NA, NA, "24-34", "2...
## $ division         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ bodyweight_kg    <dbl> 67.5, 67.5, 67.5, 67.5, 67.5, 67.5, 67.5, 90.0, 90...
## $ weight_class_kg  <chr> "67.5", "67.5", "67.5", "67.5", "67.5", "67.5", "6...
## $ best3squat_kg    <dbl> 205.0, 225.0, 245.0, 195.0, 240.0, 200.0, 220.0, 2...
## $ best3bench_kg    <dbl> 140.0, 132.5, 157.5, 110.0, 140.0, 100.0, 140.0, 2...
## $ best3deadlift_kg <dbl> 225.0, 235.0, 270.0, 240.0, 215.0, 230.0, 235.0, 3...
## $ place            <chr> "1", "2", "3", "4", "5", "6", "7", "1", "2", "2", ...
## $ date             <date> 1985-08-03, 1985-08-03, 1985-08-03, 1985-08-03, 1...
## $ federation       <chr> "IPF", "IPF", "IPF", "IPF", "IPF", "IPF", "IPF", "...
## $ meet_name        <chr> "World Games", "World Games", "World Games", "Worl...

Fitting models to grouped data

Convert to long format such that we have one column and weight lifted.

ll <- lifting_long <- 
  lifting %>% gather(type, weight, starts_with("best"))

ll <- group_by(ll, sex, type)

Fit a model

Now we can fit a linear model of weight ~ boyweight while keeping track of relevant groupings.

Check these step by step to see what nest does what the map within mutate does and unnest.

LMs_by_sex_and_type <- 
  ll %>%
  tidyr::nest() %>%
  mutate(fit = purrr::map(data, ~ lm(weight ~ bodyweight_kg, data = .))) %>%
  mutate(results = purrr::map(fit, broom::tidy)) %>% 
  tidyr::unnest(results)

# if you get this error: 
# Error: 'vec_dim' is not an exported object from 'namespace:vctrs'
# reinstall pillar with install.packages('pillar')
# https://github.com/r-lib/vctrs/issues/487

Evaluate the model

Now we can ask what are the coefficients of the linear regressions for each group

LMs_by_sex_and_type %>% 
  select(-data, -fit) %>% 
  filter(term == "(Intercept)") %>% 
  knitr::kable(caption = "Inercepts for lifted weight by bodyweight by sex and type of lifting")
Inercepts for lifted weight by bodyweight by sex and type of lifting
sex type term estimate std.error statistic p.value
M best3squat_kg (Intercept) 98.74112 1.3982126 70.61954 0
F best3squat_kg (Intercept) 63.31427 1.4309542 44.24619 0
M best3bench_kg (Intercept) 50.68200 0.9550533 53.06719 0
F best3bench_kg (Intercept) 31.15649 0.8731337 35.68352 0
M best3deadlift_kg (Intercept) 144.95591 1.1529829 125.72252 0
F best3deadlift_kg (Intercept) 96.62448 1.1283408 85.63413 0
  # caption will render in Rmd 

LMs_by_sex_and_type %>% 
  select(-data, -fit) %>% 
  filter(term == "bodyweight_kg") %>% 
  knitr::kable(caption = "Inercepts for lifted weight by bodyweight by sex and type of lifting")
Inercepts for lifted weight by bodyweight by sex and type of lifting
sex type term estimate std.error statistic p.value
M best3squat_kg bodyweight_kg 1.7301136 0.0154233 112.17530 0
F best3squat_kg bodyweight_kg 1.3466930 0.0213294 63.13781 0
M best3bench_kg bodyweight_kg 1.3672329 0.0104211 131.19822 0
F best3bench_kg bodyweight_kg 0.9039786 0.0129411 69.85324 0
M best3deadlift_kg bodyweight_kg 1.2396643 0.0127121 97.51838 0
F best3deadlift_kg bodyweight_kg 0.9465361 0.0168002 56.34075 0

Neat, everything is significant.

Visualize

To visualize, we can use this table, or plot the raw data and use geom_smooth while faceting by sex and type.

LMs_by_sex_and_type_wide <- 
  LMs_by_sex_and_type %>% 
  select(sex, type, term, estimate) %>% 
  spread(term, estimate) %>% 
  rename("Intercept" = `(Intercept)`) %>% 
  rename("Slope" = bodyweight_kg)

ggplot(data = LMs_by_sex_and_type_wide) +
  # aes(group=interaction(sex, type)) +
  geom_abline(size = 1,
              aes(
                slope = Slope,
                intercept = Intercept,
                color = type,
                linetype = sex
              )) +
  # since we don't have mapping for x and y
  # the slope lines are not visible
  # add manual axis limits to see them
  scale_x_continuous(limits = c(0, 100)) +
  scale_y_continuous(limits = c(0, 200)) +
  labs(x="bodyweight (kg)") +
  labs(y="weight lifter (kg)") +
  theme_bw() +
  theme(legend.position="top")

Then you can ask are slopes for women higher than men? Are slopes significantly different from each other (maybe use likelihood ratio test or anova to compare models).