Remember to change the author: field on this Rmd file to your own name.

Learning objectives

In today’s Lab you will gain practice with the following concepts from today’s class:

  • Interpreting linear regression coefficients of numeric covariates
  • Interpreting linear regression coefficients of categorical variables
  • Fitting linear regression models with interaction terms
  • Interpreting linear regression coefficients of interaction terms

We’ll begin by loading some packages and importing the data.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.3.2     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.6.2
## ── Conflicts ────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
# Import data
gapminder <- read_delim("http://www.andrew.cmu.edu/user/achoulde/94842/data/gapminder_five_year.txt", delim = "\t")
## Parsed with column specification:
## cols(
##   country = col_character(),
##   year = col_double(),
##   pop = col_double(),
##   continent = col_character(),
##   lifeExp = col_double(),
##   gdpPercap = col_double()
## )

Interaction terms in regression

(a) Run a linear regression to better understand how birthweight varies with the mother’s age and smoking status (do not include interaction terms).

# Edit me

(b) What is the coefficient of mother.age in your regression? How do you interpret this coefficient?

# Edit me

(c) How many coefficients are estimated for the mother’s smoking status variable? How do you interpret these coefficients?

# Edit me

(d) What does the intercept mean in this model?

(e) Using ggplot, construct a scatterplot with birthweight on the y-axis and mother’s age on the x-axis. Color the points by mother’s smoking status, and add smoking status-specific linear regression lines using the stat_smooth layer.

# Edit me

(f) Do the regression lines plotted in part (e) correspond to the model you fit in part (a)? How can you tell?

(g) Fit a linear regression model that now models potential interactions between mother’s age and smoking status in their effect on birthweight.

# Edit me

(h) Interpret your model. Is the interaction term statistically significant? What does it mean?

# Edit me