Lecture 5: plyr and introduction to graphics

Prof. Alexandra Chouldechova
94842

Agenda

  • split-apply-combine (plyr)
  • Introduction to R graphics (ggplot2)

Packages

library(MASS)
library(plyr)
library(dplyr)
library(tibble)
library(ggplot2)

Getting started: birthwt dataset

  • We're going to start by operating on the birthwt dataset from the MASS library

  • Let's get it loaded and see what we're working with

str(birthwt) 
'data.frame':   189 obs. of  10 variables:
 $ low  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ age  : int  19 33 20 21 18 21 22 17 29 26 ...
 $ lwt  : int  182 155 105 108 107 124 118 103 123 113 ...
 $ race : int  2 3 1 1 1 3 1 3 1 1 ...
 $ smoke: int  0 0 1 1 1 0 0 0 1 1 ...
 $ ptl  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ ht   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ ui   : int  1 0 0 1 1 0 0 0 0 0 ...
 $ ftv  : int  0 3 1 2 0 0 1 1 1 0 ...
 $ bwt  : int  2523 2551 2557 2594 2600 2622 2637 2637 2663 2665 ...

Multiple splitting factors

  • We can add additional splitting variables by including additional terms in the formula
ddply(birthwt, ~ race + mother.smokes, summarize,
      mean.age = mean(mother.age),
      mean.bwt = mean(birthwt.grams),
      low.bwt.prop = mean(birthwt.below.2500 == "yes"))
   race mother.smokes mean.age mean.bwt low.bwt.prop
1 black            no 19.93750 2854.500   0.31250000
2 black           yes 24.10000 2504.000   0.60000000
3 other            no 22.36364 2815.782   0.36363636
4 other           yes 22.50000 2757.167   0.41666667
5 white            no 26.02273 3428.750   0.09090909
6 white           yes 22.82692 2826.846   0.36538462

Example: Association between mother's age and birth weight?

  • Is the mother's age correlated with birth weight?
with(birthwt, cor(birthwt.grams, mother.age))  # Calculate correlation
[1] 0.09031781
  • Does the correlation vary with smoking status?
    • tapply can't help us here. But ddply still works!
ddply(birthwt, ~ mother.smokes, summarize,
      cor.bwt.age = cor(birthwt.grams, mother.age))
  mother.smokes cor.bwt.age
1            no   0.2014558
2           yes  -0.1441649

Graphics in R

  • We now know a lot about how to tabulate data

  • It's often easier to look at plots instead of tables

  • We'll now talk about some of the standard plotting options

  • Easier to do this in a live demo…

  • Please refer to .Rmd version of lecture notes for the graphics material