Lecture 14: The End ==== author: Prof. Alexandra Chouldechova date: font-family: Gill Sans autosize: false width:1480 height:720 Agenda ==== - What have we learned? - Where do you go from here? - Useful packages you should know about - Shiny demo ====
What have we learned?
Packages ==== - `base`, `stats` - `MASS` - Contains a lot of simple data sets - `ggplot2` - Awesome graphics - `plyr` - Enables simple syntax for split-apply-combine operations - `mapvalues()` is from here - `dplyr` Programming basics ==== - Loops, apply/sapply/lapply alternatives - Functions - If-else statements Tabular summaries ==== - `table()` - `tapply()` - `aggregate()` - `plyr` functions - `dplyr::summarise()` Graphical summaries ====

ggplot2
Statistics: Quantitative outcomes ==== - t-tests - Does the mean of `y` differ between 2 groups? - $k$-way ANOVA (analysis of variance) - Does the mean of `y` differ across various combinations of $k$ factors? - linear regression - (How) does the mean of `y` differ across various covariates? - Interpreting coefficients of categorical variables - Interpreting interaction terms - Using `anova()` to compare 2 nested models Statistics: Binary outcomes ==== - odds ratios - fisher test, chi-squared test - (2 x 2 tables) Is smoking associated with lung cancer? - (j x k tables) Is there an association between political party affiliation and gender? - logistic regression - how to fit it with the `glm()` command. Data challenges ==== - Missing values - Corrupted data - Collinearity - `pairs()` and `GGally::ggpairs()` plots - Regression diagnostics ====
Where do we go from here?
Data import/export ==== [foreign](http://www.rdocumentation.org/packages/foreign) - Read Data Stored by Minitab, S, SAS, SPSS, Stata, Systat, Weka, dBase, ... [xlsx](https://cran.rstudio.com/web/packages/xlsx/) - Read/write Excel data [RSQLite](http://www.rdocumentation.org/packages/RSQLite) - SQLite Interface for R [RMySQL](http://www.r-bloggers.com/mysql-and-r/) - MySQL Inferface for R Data summarization and manipulation ==== [tidyr](http://blog.rstudio.org/2014/07/22/introducing-tidyr/) - Tools for reshaping your data into "tidy" formatting [R for Data Science](http://r4ds.had.co.nz/) - New book by Garrett Grolemund and Hadley Wickham, available for **free** online. - Introduces the "tidyverse" set of R pacakges and workflows The handy [Data wrangling cheatsheet](https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf) provides a quick reference to the various `dplyr` and `tidyr` functions. Interfacing R with other languages ==== [Rcpp](http://dirk.eddelbuettel.com/code/rcpp.html) - Call C++ functions from R. [RPython](https://rpython.readthedocs.org/en/latest/) - Call Python functions from R. _R Notebooks make it even easier to interface with Python, C++, SQL, and bash_ Visualization, interactive graphics ==== [shiny](http://shiny.rstudio.com/) - A web application framework for R [ggvis](http://ggvis.rstudio.com/) - Interactive web-based graphics [plotly](https://plot.ly/r/) - Make ggplots interactive [htmlwidgets](http://www.htmlwidgets.org) - "Bring the best of JavaScript data visualization to R" Todo ==== - Course evaluations - I really appreciate your feedback - Today is the last day to submit evaluations.