Agenda

Packages

## ── Attaching packages ───────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Getting started: birthwt dataset

tibbles

## # A tibble: 189 x 10
##      low   age   lwt  race smoke   ptl    ht    ui   ftv   bwt
##    <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
##  1     0    19   182     2     0     0     0     1     0  2523
##  2     0    33   155     3     0     0     0     0     3  2551
##  3     0    20   105     1     1     0     0     0     1  2557
##  4     0    21   108     1     1     0     0     1     2  2594
##  5     0    18   107     1     1     0     0     1     0  2600
##  6     0    21   124     3     0     0     0     0     0  2622
##  7     0    22   118     1     0     0     0     0     1  2637
##  8     0    17   103     3     0     0     0     0     1  2637
##  9     0    29   123     1     1     0     0     0     1  2663
## 10     0    26   113     1     1     0     0     0     0  2665
## # … with 179 more rows
##     low age lwt race smoke ptl ht ui ftv  bwt
## 85    0  19 182    2     0   0  0  1   0 2523
## 86    0  33 155    3     0   0  0  0   3 2551
## 87    0  20 105    1     1   0  0  0   1 2557
## 88    0  21 108    1     1   0  0  1   2 2594
## 89    0  18 107    1     1   0  0  1   0 2600
## 91    0  21 124    3     0   0  0  0   0 2622
## 92    0  22 118    1     0   0  0  0   1 2637
## 93    0  17 103    3     0   0  0  0   1 2637
## 94    0  29 123    1     1   0  0  0   1 2663
## 95    0  26 113    1     1   0  0  0   0 2665
## 96    0  19  95    3     0   0  0  0   0 2722
## 97    0  19 150    3     0   0  0  0   1 2733
## 98    0  22  95    3     0   0  1  0   0 2751
## 99    0  30 107    3     0   1  0  1   2 2750
## 100   0  18 100    1     1   0  0  0   0 2769
## 101   0  18 100    1     1   0  0  0   0 2769
## 102   0  15  98    2     0   0  0  0   0 2778
## 103   0  25 118    1     1   0  0  0   3 2782
## 104   0  20 120    3     0   0  0  1   0 2807
## 105   0  28 120    1     1   0  0  0   1 2821
## 106   0  32 121    3     0   0  0  0   2 2835
## 107   0  31 100    1     0   0  0  1   3 2835
## 108   0  36 202    1     0   0  0  0   1 2836
## 109   0  28 120    3     0   0  0  0   0 2863
## 111   0  25 120    3     0   0  0  1   2 2877
## 112   0  28 167    1     0   0  0  0   0 2877
## 113   0  17 122    1     1   0  0  0   0 2906
## 114   0  29 150    1     0   0  0  0   2 2920
## 115   0  26 168    2     1   0  0  0   0 2920
## 116   0  17 113    2     0   0  0  0   1 2920
## 117   0  17 113    2     0   0  0  0   1 2920
## 118   0  24  90    1     1   1  0  0   1 2948
## 119   0  35 121    2     1   1  0  0   1 2948
## 120   0  25 155    1     0   0  0  0   1 2977
## 121   0  25 125    2     0   0  0  0   0 2977
## 123   0  29 140    1     1   0  0  0   2 2977
## 124   0  19 138    1     1   0  0  0   2 2977
## 125   0  27 124    1     1   0  0  0   0 2922
## 126   0  31 215    1     1   0  0  0   2 3005
## 127   0  33 109    1     1   0  0  0   1 3033
## 128   0  21 185    2     1   0  0  0   2 3042
## 129   0  19 189    1     0   0  0  0   2 3062
## 130   0  23 130    2     0   0  0  0   1 3062
## 131   0  21 160    1     0   0  0  0   0 3062
## 132   0  18  90    1     1   0  0  1   0 3062
## 133   0  18  90    1     1   0  0  1   0 3062
## 134   0  32 132    1     0   0  0  0   4 3080
## 135   0  19 132    3     0   0  0  0   0 3090
## 136   0  24 115    1     0   0  0  0   2 3090
## 137   0  22  85    3     1   0  0  0   0 3090
## 138   0  22 120    1     0   0  1  0   1 3100
## 139   0  23 128    3     0   0  0  0   0 3104
## 140   0  22 130    1     1   0  0  0   0 3132
## 141   0  30  95    1     1   0  0  0   2 3147
## 142   0  19 115    3     0   0  0  0   0 3175
## 143   0  16 110    3     0   0  0  0   0 3175
## 144   0  21 110    3     1   0  0  1   0 3203
## 145   0  30 153    3     0   0  0  0   0 3203
## 146   0  20 103    3     0   0  0  0   0 3203
## 147   0  17 119    3     0   0  0  0   0 3225
## 148   0  17 119    3     0   0  0  0   0 3225
## 149   0  23 119    3     0   0  0  0   2 3232
## 150   0  24 110    3     0   0  0  0   0 3232
## 151   0  28 140    1     0   0  0  0   0 3234
## 154   0  26 133    3     1   2  0  0   0 3260
## 155   0  20 169    3     0   1  0  1   1 3274
## 156   0  24 115    3     0   0  0  0   2 3274
## 159   0  28 250    3     1   0  0  0   6 3303
## 160   0  20 141    1     0   2  0  1   1 3317
## 161   0  22 158    2     0   1  0  0   2 3317
## 162   0  22 112    1     1   2  0  0   0 3317
## 163   0  31 150    3     1   0  0  0   2 3321
## 164   0  23 115    3     1   0  0  0   1 3331
## 166   0  16 112    2     0   0  0  0   0 3374
## 167   0  16 135    1     1   0  0  0   0 3374
## 168   0  18 229    2     0   0  0  0   0 3402
## 169   0  25 140    1     0   0  0  0   1 3416
## 170   0  32 134    1     1   1  0  0   4 3430
## 172   0  20 121    2     1   0  0  0   0 3444
## 173   0  23 190    1     0   0  0  0   0 3459
## 174   0  22 131    1     0   0  0  0   1 3460
## 175   0  32 170    1     0   0  0  0   0 3473
## 176   0  30 110    3     0   0  0  0   0 3544
## 177   0  20 127    3     0   0  0  0   0 3487
## 179   0  23 123    3     0   0  0  0   0 3544
## 180   0  17 120    3     1   0  0  0   0 3572
## 181   0  19 105    3     0   0  0  0   0 3572
## 182   0  23 130    1     0   0  0  0   0 3586
## 183   0  36 175    1     0   0  0  0   0 3600
## 184   0  22 125    1     0   0  0  0   1 3614
## 185   0  24 133    1     0   0  0  0   0 3614
## 186   0  21 134    3     0   0  0  0   2 3629
## 187   0  19 235    1     1   0  1  0   0 3629
## 188   0  25  95    1     1   3  0  1   0 3637
## 189   0  16 135    1     1   0  0  0   0 3643
## 190   0  29 135    1     0   0  0  0   1 3651
## 191   0  29 154    1     0   0  0  0   1 3651
## 192   0  19 147    1     1   0  0  0   0 3651
## 193   0  19 147    1     1   0  0  0   0 3651
## 195   0  30 137    1     0   0  0  0   1 3699
## 196   0  24 110    1     0   0  0  0   1 3728
## 197   0  19 184    1     1   0  1  0   0 3756
## 199   0  24 110    3     0   1  0  0   0 3770
## 200   0  23 110    1     0   0  0  0   1 3770
## 201   0  20 120    3     0   0  0  0   0 3770
## 202   0  25 241    2     0   0  1  0   0 3790
## 203   0  30 112    1     0   0  0  0   1 3799
## 204   0  22 169    1     0   0  0  0   0 3827
## 205   0  18 120    1     1   0  0  0   2 3856
## 206   0  16 170    2     0   0  0  0   4 3860
## 207   0  32 186    1     0   0  0  0   2 3860
## 208   0  18 120    3     0   0  0  0   1 3884
## 209   0  29 130    1     1   0  0  0   2 3884
## 210   0  33 117    1     0   0  0  1   1 3912
## 211   0  20 170    1     1   0  0  0   0 3940
## 212   0  28 134    3     0   0  0  0   1 3941
## 213   0  14 135    1     0   0  0  0   0 3941
## 214   0  28 130    3     0   0  0  0   0 3969
## 215   0  25 120    1     0   0  0  0   2 3983
## 216   0  16  95    3     0   0  0  0   1 3997
## 217   0  20 158    1     0   0  0  0   1 3997
## 218   0  26 160    3     0   0  0  0   0 4054
## 219   0  21 115    1     0   0  0  0   1 4054
## 220   0  22 129    1     0   0  0  0   0 4111
## 221   0  25 130    1     0   0  0  0   2 4153
## 222   0  31 120    1     0   0  0  0   2 4167
## 223   0  35 170    1     0   1  0  0   1 4174
## 224   0  19 120    1     1   0  0  0   0 4238
## 225   0  24 116    1     0   0  0  0   1 4593
## 226   0  45 123    1     0   0  0  0   1 4990
## 4     1  28 120    3     1   1  0  1   0  709
## 10    1  29 130    1     0   0  0  1   2 1021
## 11    1  34 187    2     1   0  1  0   0 1135
## 13    1  25 105    3     0   1  1  0   0 1330
## 15    1  25  85    3     0   0  0  1   0 1474
## 16    1  27 150    3     0   0  0  0   0 1588
## 17    1  23  97    3     0   0  0  1   1 1588
## 18    1  24 128    2     0   1  0  0   1 1701
## 19    1  24 132    3     0   0  1  0   0 1729
## 20    1  21 165    1     1   0  1  0   1 1790
## 22    1  32 105    1     1   0  0  0   0 1818
## 23    1  19  91    1     1   2  0  1   0 1885
## 24    1  25 115    3     0   0  0  0   0 1893
## 25    1  16 130    3     0   0  0  0   1 1899
## 26    1  25  92    1     1   0  0  0   0 1928
## 27    1  20 150    1     1   0  0  0   2 1928
## 28    1  21 200    2     0   0  0  1   2 1928
## 29    1  24 155    1     1   1  0  0   0 1936
## 30    1  21 103    3     0   0  0  0   0 1970
## 31    1  20 125    3     0   0  0  1   0 2055
## 32    1  25  89    3     0   2  0  0   1 2055
## 33    1  19 102    1     0   0  0  0   2 2082
## 34    1  19 112    1     1   0  0  1   0 2084
## 35    1  26 117    1     1   1  0  0   0 2084
## 36    1  24 138    1     0   0  0  0   0 2100
## 37    1  17 130    3     1   1  0  1   0 2125
## 40    1  20 120    2     1   0  0  0   3 2126
## 42    1  22 130    1     1   1  0  1   1 2187
## 43    1  27 130    2     0   0  0  1   0 2187
## 44    1  20  80    3     1   0  0  1   0 2211
## 45    1  17 110    1     1   0  0  0   0 2225
## 46    1  25 105    3     0   1  0  0   1 2240
## 47    1  20 109    3     0   0  0  0   0 2240
## 49    1  18 148    3     0   0  0  0   0 2282
## 50    1  18 110    2     1   1  0  0   0 2296
## 51    1  20 121    1     1   1  0  1   0 2296
## 52    1  21 100    3     0   1  0  0   4 2301
## 54    1  26  96    3     0   0  0  0   0 2325
## 56    1  31 102    1     1   1  0  0   1 2353
## 57    1  15 110    1     0   0  0  0   0 2353
## 59    1  23 187    2     1   0  0  0   1 2367
## 60    1  20 122    2     1   0  0  0   0 2381
## 61    1  24 105    2     1   0  0  0   0 2381
## 62    1  15 115    3     0   0  0  1   0 2381
## 63    1  23 120    3     0   0  0  0   0 2410
## 65    1  30 142    1     1   1  0  0   0 2410
## 67    1  22 130    1     1   0  0  0   1 2410
## 68    1  17 120    1     1   0  0  0   3 2414
## 69    1  23 110    1     1   1  0  0   0 2424
## 71    1  17 120    2     0   0  0  0   2 2438
## 75    1  26 154    3     0   1  1  0   1 2442
## 76    1  20 105    3     0   0  0  0   3 2450
## 77    1  26 190    1     1   0  0  0   0 2466
## 78    1  14 101    3     1   1  0  0   0 2466
## 79    1  28  95    1     1   0  0  0   2 2466
## 81    1  14 100    3     0   0  0  0   2 2495
## 82    1  23  94    3     1   0  0  0   0 2495
## 83    1  17 142    2     0   0  1  0   0 2495
## 84    1  21 130    1     1   0  1  0   3 2495
##   [1] 19 33 20 21 18 21 22 17 29 26 19 19 22 30 18 18 15 25 20 28 32 31 36
##  [24] 28 25 28 17 29 26 17 17 24 35 25 25 29 19 27 31 33 21 19 23 21 18 18
##  [47] 32 19 24 22 22 23 22 30 19 16 21 30 20 17 17 23 24 28 26 20 24 28 20
##  [70] 22 22 31 23 16 16 18 25 32 20 23 22 32 30 20 23 17 19 23 36 22 24 21
##  [93] 19 25 16 29 29 19 19 30 24 19 24 23 20 25 30 22 18 16 32 18 29 33 20
## [116] 28 14 28 25 16 20 26 21 22 25 31 35 19 24 45 28 29 34 25 25 27 23 24
## [139] 24 21 32 19 25 16 25 20 21 24 21 20 25 19 19 26 24 17 20 22 27 20 17
## [162] 25 20 18 18 20 21 26 31 15 23 20 24 15 23 30 22 17 23 17 26 20 26 14
## [185] 28 14 23 17 21
##   [1] 19 33 20 21 18 21 22 17 29 26 19 19 22 30 18 18 15 25 20 28 32 31 36
##  [24] 28 25 28 17 29 26 17 17 24 35 25 25 29 19 27 31 33 21 19 23 21 18 18
##  [47] 32 19 24 22 22 23 22 30 19 16 21 30 20 17 17 23 24 28 26 20 24 28 20
##  [70] 22 22 31 23 16 16 18 25 32 20 23 22 32 30 20 23 17 19 23 36 22 24 21
##  [93] 19 25 16 29 29 19 19 30 24 19 24 23 20 25 30 22 18 16 32 18 29 33 20
## [116] 28 14 28 25 16 20 26 21 22 25 31 35 19 24 45 28 29 34 25 25 27 23 24
## [139] 24 21 32 19 25 16 25 20 21 24 21 20 25 19 19 26 24 17 20 22 27 20 17
## [162] 25 20 18 18 20 21 26 31 15 23 20 24 15 23 30 22 17 23 17 26 20 26 14
## [185] 28 14 23 17 21
## [1] 19
## # A tibble: 1 x 1
##     age
##   <int>
## 1    19
## [1] 19

Note: If you want to import data directly into tibble format, you may use read_delim() and read_csv() instead of their base-R alternatives. Even though we started with the base alternatives, I recommend using these improved import commands going forward.

Renaming the variables

##  [1] "low"   "age"   "lwt"   "race"  "smoke" "ptl"   "ht"    "ui"   
##  [9] "ftv"   "bwt"
## # A tibble: 189 x 10
##    birthwt.below.2… mother.age mother.weight  race mother.smokes
##               <int>      <int>         <int> <int>         <int>
##  1                0         19           182     2             0
##  2                0         33           155     3             0
##  3                0         20           105     1             1
##  4                0         21           108     1             1
##  5                0         18           107     1             1
##  6                0         21           124     3             0
##  7                0         22           118     1             0
##  8                0         17           103     3             0
##  9                0         29           123     1             1
## 10                0         26           113     1             1
## # … with 179 more rows, and 5 more variables: previous.prem.labor <int>,
## #   hypertension <int>, uterine.irr <int>, physician.visits <int>,
## #   birthwt.grams <int>

An alternative renaming approach: the rename() command

rename operates by allowing you to specify a new variable name for whichever old variable name you want to change.

##  [1] "birthwt.below.2500"  "mother.age"          "mother.weight"      
##  [4] "race"                "mother.smokes"       "previous.prem.labor"
##  [7] "hypertension"        "uterine.irr"         "physician.visits"   
## [10] "birthwt.grams"
## # A tibble: 189 x 10
##    birthwt.below.2… mother.age mother.weight  race mother.smokes
##               <int>      <int>         <int> <int>         <int>
##  1                0         19           182     2             0
##  2                0         33           155     3             0
##  3                0         20           105     1             1
##  4                0         21           108     1             1
##  5                0         18           107     1             1
##  6                0         21           124     3             0
##  7                0         22           118     1             0
##  8                0         17           103     3             0
##  9                0         29           123     1             1
## 10                0         26           113     1             1
## # … with 179 more rows, and 5 more variables: previous.prem.labor <int>,
## #   hypertension <int>, uterine.irr <int>, physician.visits <int>,
## #   birthwt.grams <int>

Note that in this command we didn’t rename the race variable because it already had a good name.

Renaming the factors

## # A tibble: 189 x 10
##    birthwt.below.2… mother.age mother.weight  race mother.smokes
##               <int>      <int>         <int> <int>         <int>
##  1                0         19           182     2             0
##  2                0         33           155     3             0
##  3                0         20           105     1             1
##  4                0         21           108     1             1
##  5                0         18           107     1             1
##  6                0         21           124     3             0
##  7                0         22           118     1             0
##  8                0         17           103     3             0
##  9                0         29           123     1             1
## 10                0         26           113     1             1
## # … with 179 more rows, and 5 more variables: previous.prem.labor <int>,
## #   hypertension <int>, uterine.irr <int>, physician.visits <int>,
## #   birthwt.grams <int>
## # A tibble: 189 x 10
##    birthwt.below.2… mother.age mother.weight race  mother.smokes
##    <fct>                 <int>         <int> <fct> <fct>        
##  1 no                       19           182 black no           
##  2 no                       33           155 other no           
##  3 no                       20           105 white yes          
##  4 no                       21           108 white yes          
##  5 no                       18           107 white yes          
##  6 no                       21           124 other no           
##  7 no                       22           118 white no           
##  8 no                       17           103 other no           
##  9 no                       29           123 white yes          
## 10 no                       26           113 white yes          
## # … with 179 more rows, and 5 more variables: previous.prem.labor <int>,
## #   hypertension <fct>, uterine.irr <fct>, physician.visits <int>,
## #   birthwt.grams <int>

Recall that the syntax ~ recode_factor(.x, ...) defines an anonymous function that will be applied to every column specfied in the first part of the mutate_at() call. In this case, all of the specified variables are binary 0/1 coded, and are being recoded to no/yes.

Summary of the data

##  birthwt.below.2500   mother.age    mother.weight      race   
##  no :130            Min.   :14.00   Min.   : 80.0   white:96  
##  yes: 59            1st Qu.:19.00   1st Qu.:110.0   black:26  
##                     Median :23.00   Median :121.0   other:67  
##                     Mean   :23.24   Mean   :129.8             
##                     3rd Qu.:26.00   3rd Qu.:140.0             
##                     Max.   :45.00   Max.   :250.0             
##  mother.smokes previous.prem.labor hypertension uterine.irr
##  no :115       Min.   :0.0000      no :177      no :161    
##  yes: 74       1st Qu.:0.0000      yes: 12      yes: 28    
##                Median :0.0000                              
##                Mean   :0.1958                              
##                3rd Qu.:0.0000                              
##                Max.   :3.0000                              
##  physician.visits birthwt.grams 
##  Min.   :0.0000   Min.   : 709  
##  1st Qu.:0.0000   1st Qu.:2414  
##  Median :0.0000   Median :2977  
##  Mean   :0.7937   Mean   :2945  
##  3rd Qu.:1.0000   3rd Qu.:3487  
##  Max.   :6.0000   Max.   :4990

A simple table

## # A tibble: 6 x 3
## # Groups:   race [3]
##   race  mother.smokes mean.birthwt
##   <fct> <fct>                <dbl>
## 1 white no                    3429
## 2 white yes                   2827
## 3 black no                    2854
## 4 black yes                   2504
## 5 other no                    2816
## 6 other yes                   2757

A simple reshape

## # A tibble: 3 x 3
## # Groups:   race [3]
##   race     no   yes
##   <fct> <dbl> <dbl>
## 1 white  3429  2827
## 2 black  2854  2504
## 3 other  2816  2757

What if we wanted nicer looking output?

race no yes
white 3429 2827
black 2854 2504
other 2816 2757

Example: Association between mother’s age and birth weight?

## [1] 0.09031781
## # A tibble: 2 x 2
##   mother.smokes cor_bwt_age
##   <fct>               <dbl>
## 1 no                  0.201
## 2 yes                -0.144

Does the association between birthweight and mother’s age vary by race?

## # A tibble: 3 x 2
##   race  cor_bwt_age
##   <fct>       <dbl>
## 1 white      0.166 
## 2 black     -0.329 
## 3 other     -0.0293

There does look to be variation, but we don’t know if it’s statistically significant without further investigation.

Graphics in R

Standard graphics in R

Single-variable plots

Let’s continue with the birthwt data from the MASS library.

Here are some basic single-variable plots.

Note that the result of calling plot(x, ...) varies depending on what x is.
- When x is numeric, you get a plot showing the value of x at every index.
- When x is a factor, you get a bar plot of counts for every level

Let’s add more information to the smoking bar plot, and also change the color by setting the col option.

(much) better graphics with ggplot2

Introduction to ggplot2

ggplot2 has a slightly steeper learning curve than the base graphics functions, but it also generally produces far better and more easily customizable graphics.

There are two basic calls in ggplot:

  • qplot(x, y, ..., data): a “quick-plot” routine, which essentially replaces the base plot()
  • ggplot(data, aes(x, y, ...), ...): defines a graphics object from which plots can be generated, along with aesthetic mappings that specify how variables are mapped to visual properties.

plot vs qplot

Here’s how the default scatterplots look in ggplot compared to the base graphics. We’ll illustrate things by continuing to use the birthwt data from the MASS library.