--- title: "Lab 7 Solutions" author: "Alexandra Chouldechova" date: "" output: html_document --- ##### Remember to change the `author: ` field on this Rmd file to your own name. We'll begin by loading all the packages we might need. ```{r} library(tidyverse) Cars93 <- as_tibble(MASS::Cars93) ``` ### Testing means between two groups Here is a command that generates density plots of `MPG.highway` from the Cars93 data. Separate densities are constructed for US and non-US vehicles. ```{r} qplot(data = Cars93, x = MPG.highway, fill = Origin, geom = "density", alpha = I(0.5)) ``` **(a)** Using the Cars93 data and the `t.test()` function, run a t-test to see if average `MPG.highway` is different between US and non-US vehicles. *Interpret the results* Try doing this both using the formula style input and the `x`, `y` style input. ```{r} # Formula version mpg.t.test <- t.test(MPG.highway ~ Origin, data = Cars93) mpg.t.test # x, y version with(Cars93, t.test(x = MPG.highway[Origin == "USA"], y = MPG.highway[Origin == "non-USA"])) ``` There is no statistically significant difference in highway fuel consumption between US and non-US origin vehicles. **(b)** What is the confidence interval for the difference? ```{r} mpg.t.test\$conf.int ``` **(c)** Repeat part (a) using the `wilcox.test()` function. ```{r} mpg.wilcox.test <- wilcox.test(MPG.highway ~ Origin, data = Cars93) mpg.wilcox.test ``` **(d)** Are your results for (a) and (c) very different? > The p-value from the t-test is somewhat smaller than that output by wilcox.test. Since the MPG.highway distributions are right-skewed, we might expect some differences between the t-test and wilcoxon test Neither test is statistically significant. ### Is the data normal? **(a)** Modify the density plot code provided in problem 1 to produce a plot with better axis labels. Also add a title. ```{r} qplot(data = Cars93, x = MPG.highway, fill = Origin, geom = "density", alpha = I(0.5), xlab = "Highway fuel consumption (MPG)", main = "Highway fuel consumption density plots") ``` **(b)** Does the data look to be normally distributed? > The densities don't really look normally distributed. They appear right-skewed. **(c)** Construct qqplots of `MPG.highway`, one plot for each `Origin` category. Overlay a line on each plot using with `qqline()` function. ```{r, fig.height = 4} par(mfrow = c(1,2)) # USA cars with(Cars93, qqnorm(MPG.highway[Origin == "USA"])) with(Cars93, qqline(MPG.highway, col = "blue")) # Foreign cars with(Cars93, qqnorm(MPG.highway[Origin == "non-USA"])) with(Cars93, qqline(MPG.highway, col = "blue")) ``` **(d)** Does the data look to be normally distributed? The non-USA MPG.highway data looks quite far from normally distributed. This distribution appears to have a heavier upper tail. ### Testing 2 x 2 tables Doll and Hill's 1950 article studying the association between smoking and lung cancer contains one of the most important 2 x 2 tables in history. Here's their data: ```{r} smoking <- as.table(rbind(c(688, 650), c(21, 59))) dimnames(smoking) <- list(has.smoked = c("yes", "no"), lung.cancer = c("yes","no")) smoking ``` **(a)** Use `fisher.test()` to test if there's an association between smoking and lung cancer. ```{r} smoking.fisher.test <- fisher.test(smoking) smoking.fisher.test ``` **(b)** What is the odds ratio? ```{r} smoking.fisher.test\$estimate ``` **(c)** Are your findings significant? ```{r} smoking.fisher.test\$p.value ``` The findings are highly significant.