Post hoc synopsis

Во овој документ започнавме да разгледуваме процедури за линеарна регресија на групирани податоци. Со пакетите од tidyverse и broom можеме многу лесно да направиме регресија за секоја категорија (пол, тежина, натпреварување). Графирање на ваквите резултати е исто така лесно со овие платформи. Клучната техника тука е вгнездување (nest()) на подгрупите, вршење операции врз ваквите вгнездени податоци, и потоа ‘одгнездување’ на посакуваните елементи. Оваа е моќна техника, но не е наједноставна за совладување, бидејќи бара подетално познавање на структурата на tibble, data.frame и list. Овој документ може да служи како минимален вовед за овие процедури и објекти.

Неопходни пакети

library(tidyverse)
library(broom)

Податоците

lifting <-
  ipf_lifts <-
  readr::read_csv(
    "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-10-08/ipf_lifts.csv"
  )

glimpse(lifting)

## Rows: 41,152
## Columns: 16
## $ name             <chr> "Hiroyuki Isagawa", "David Mannering", "Eddy Penge...
## $ sex              <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", ...
## $ event            <chr> "SBD", "SBD", "SBD", "SBD", "SBD", "SBD", "SBD", "...
## $ equipment        <chr> "Single-ply", "Single-ply", "Single-ply", "Single-...
## $ age              <dbl> NA, 24.0, 35.5, 19.5, NA, NA, 32.5, 31.5, NA, NA, ...
## $ age_class        <chr> NA, "24-34", "35-39", "20-23", NA, NA, "24-34", "2...
## $ division         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ bodyweight_kg    <dbl> 67.5, 67.5, 67.5, 67.5, 67.5, 67.5, 67.5, 90.0, 90...
## $ weight_class_kg  <chr> "67.5", "67.5", "67.5", "67.5", "67.5", "67.5", "6...
## $ best3squat_kg    <dbl> 205.0, 225.0, 245.0, 195.0, 240.0, 200.0, 220.0, 2...
## $ best3bench_kg    <dbl> 140.0, 132.5, 157.5, 110.0, 140.0, 100.0, 140.0, 2...
## $ best3deadlift_kg <dbl> 225.0, 235.0, 270.0, 240.0, 215.0, 230.0, 235.0, 3...
## $ place            <chr> "1", "2", "3", "4", "5", "6", "7", "1", "2", "2", ...
## $ date             <date> 1985-08-03, 1985-08-03, 1985-08-03, 1985-08-03, 1...
## $ federation       <chr> "IPF", "IPF", "IPF", "IPF", "IPF", "IPF", "IPF", "...
## $ meet_name        <chr> "World Games", "World Games", "World Games", "Worl...

Fitting models to grouped data

Convert to long format such that we have one column and weight lifted.

ll <- lifting_long <- 
  lifting %>% gather(type, weight, starts_with("best"))

ll <- group_by(ll, sex, type)

Fit a model

Now we can fit a linear model of weight ~ boyweight while keeping track of relevant groupings.

Check these step by step to see what nest does what the map within mutate does and unnest.

LMs_by_sex_and_type <- 
  ll %>%
  tidyr::nest() %>%
  mutate(fit = purrr::map(data, ~ lm(weight ~ bodyweight_kg, data = .))) %>%
  mutate(results = purrr::map(fit, broom::tidy)) %>% 
  tidyr::unnest(results)

# if you get this error: 
# Error: 'vec_dim' is not an exported object from 'namespace:vctrs'
# reinstall pillar with install.packages('pillar')
# https://github.com/r-lib/vctrs/issues/487

Evaluate the model

Now we can ask what are the coefficients of the linear regressions for each group

LMs_by_sex_and_type %>% 
  select(-data, -fit) %>% 
  filter(term == "(Intercept)") %>% 
  knitr::kable(caption = "Inercepts for lifted weight by bodyweight by sex and type of lifting")

Inercepts for lifted weight by bodyweight by sex and type of lifting
sex	type	term	estimate	std.error	statistic
M	best3squat_kg	(Intercept)	98.74112	1.3982126	70.61954
F	best3squat_kg	(Intercept)	63.31427	1.4309542	44.24619
M	best3bench_kg	(Intercept)	50.68200	0.9550533	53.06719
F	best3bench_kg	(Intercept)	31.15649	0.8731337	35.68352
M	best3deadlift_kg	(Intercept)	144.95591	1.1529829	125.72252
F	best3deadlift_kg	(Intercept)	96.62448	1.1283408	85.63413

  # caption will render in Rmd 

LMs_by_sex_and_type %>% 
  select(-data, -fit) %>% 
  filter(term == "bodyweight_kg") %>% 
  knitr::kable(caption = "Inercepts for lifted weight by bodyweight by sex and type of lifting")

Inercepts for lifted weight by bodyweight by sex and type of lifting
sex	type	term	estimate	std.error	statistic
M	best3squat_kg	bodyweight_kg	1.7301136	0.0154233	112.17530
F	best3squat_kg	bodyweight_kg	1.3466930	0.0213294	63.13781
M	best3bench_kg	bodyweight_kg	1.3672329	0.0104211	131.19822
F	best3bench_kg	bodyweight_kg	0.9039786	0.0129411	69.85324
M	best3deadlift_kg	bodyweight_kg	1.2396643	0.0127121	97.51838
F	best3deadlift_kg	bodyweight_kg	0.9465361	0.0168002	56.34075

Neat, everything is significant.

Visualize

To visualize, we can use this table, or plot the raw data and use geom_smooth while faceting by sex and type.

LMs_by_sex_and_type_wide <- 
  LMs_by_sex_and_type %>% 
  select(sex, type, term, estimate) %>% 
  spread(term, estimate) %>% 
  rename("Intercept" = `(Intercept)`) %>% 
  rename("Slope" = bodyweight_kg)

ggplot(data = LMs_by_sex_and_type_wide) +
  # aes(group=interaction(sex, type)) +
  geom_abline(size = 1,
              aes(
                slope = Slope,
                intercept = Intercept,
                color = type,
                linetype = sex
              )) +
  # since we don't have mapping for x and y
  # the slope lines are not visible
  # add manual axis limits to see them
  scale_x_continuous(limits = c(0, 100)) +
  scale_y_continuous(limits = c(0, 200)) +
  labs(x="bodyweight (kg)") +
  labs(y="weight lifter (kg)") +
  theme_bw() +
  theme(legend.position="top")

Then you can ask are slopes for women higher than men? Are slopes significantly different from each other (maybe use likelihood ratio test or anova to compare models).

#TidyTuesday 8.10.2019: Powerlifting

teofil