--- title: 'Lecture 5:
Loops and some of their alternatives' author: "Prof. Alexandra Chouldechova" date: "Fall 2020" output: ioslides_presentation: highlight: tango widescreen: true smaller: true --- ## Agenda - For/while loops to iterate over data - `apply` - `map`, `map_`, `map_at`, `map_if` - `mutate_at`, `mutate_if` - `summarize_at`, `summarize_if` ## Package and data loading ```{r, message=FALSE, warning=FALSE} # Our favourite library library(tidyverse) # For Cars93 data again Cars93 <- MASS::Cars93 # For the clean survey data: survey <- read.csv("http://www.andrew.cmu.edu/user/achoulde/94842/data/survey_data2020.csv", header=TRUE, stringsAsFactors = FALSE) ``` ## More programming basics: loops - We'll now learn about loops and some more efficient/syntactically simple loop alternatives - **loops** are ways of iterating over data ## For loops: a pair of examples ```{r} for(i in 1:4) { print(i) } phrase <- "Good Night," for(word in c("and", "Good", "Luck")) { phrase <- paste(phrase, word) print(phrase) } ``` ## For loops: syntax > A **for loop** executes a chunk of code for every value of an **index variable** in an **index set** - The basic syntax takes the form ```{r, eval=FALSE} for(index.variable in index.set) { code to be repeated at every value of index.variable } ``` - The index set is often a vector of integers, but can be more general ## Example ```{r} index.set <- list(name="Michael", weight=185, is.male=TRUE) # a list for(i in index.set) { print(c(i, typeof(i))) } ``` ## Example: Calculate sum of each column ```{r} fake.data <- matrix(rnorm(500), ncol=5) # create fake 100 x 5 data set head(fake.data,2) # print first two rows col.sums <- numeric(ncol(fake.data)) # variable to store running column sums for(i in 1:nrow(fake.data)) { col.sums <- col.sums + fake.data[i,] # add ith observation to the sum } col.sums colSums(fake.data) # A better approach (see also colMeans()) ``` ## while loops - **while loops** repeat a chunk of code while the specified condition remains true ```{r, eval=FALSE} day <- 1 num.days <- 365 while(day <= num.days) { day <- day + 1 } ``` - We won't really be using while loops in this class - Just be aware that they exist, and that they may become useful to you at some point in your analytics career ## Loop alternatives Command | Description --------|------------ `apply(X, MARGIN, FUN)` | Obtain a vector/array/list by applying `FUN` along the specified `MARGIN` of an array or matrix `X` `map(.x, .f, ...)` | Obtain a *list* by applying `.f` to every element of a list or atomic vector `.x` `map_(.x, .f, ...)` | For `` given by `lgl` (logical), `int` (integer), `dbl` (double) or `chr` (character), return a *vector* of this type obtained by applying `.f` to each element of `.x` `map_at(.x, .at, .f)` | Obtain a *list* by applying `.f` to the elements of `.x` specified by name or index given in `.at` `map_if(.x, .p, .f)` | Obtain a *list* `.f` to the elements of `.x` specified by `.p` (a predicate function, or a logical vector) `mutate_all/_at/_if` | Mutate all variables, specified (at) variables, or those selected by a predicate (if) `summarize_all/_at/_if` | Summarize all variables, specified variables, or those selected by a predicate (if) - These take practice to get used to, but make analysis easier to debug and less prone to error when used effectively - The best way to learn them is by looking at a bunch of examples. The end of each help file contains some examples. ## Example: apply() ```{r} colMeans(fake.data) apply(fake.data, MARGIN=2, FUN=mean) # MARGIN = 1 for rows, 2 for columns # Function that calculates proportion of vector indexes that are > 0 propPositive <- function(x) mean(x > 0) apply(fake.data, MARGIN=2, FUN=propPositive) ``` ## Example: map, map_() ```{r} map(survey, is.numeric) # Returns a list map_lgl(survey, is.numeric) # Returns a logical vector with named elements ``` ## Example: apply(), map(), map_() ```{r} apply(cars, 2, FUN=mean) # Data frames are arrays map(cars, mean) # Data frames are also lists map_dbl(cars, mean) # map output as a double vector ``` ## Example: mutate_if Let's convert all factor variables in Cars93 to lowercase ```{r} head(Cars93$Type) Cars93.lower <- mutate_if(Cars93, is.factor, tolower) head(Cars93.lower$Type) ``` - Note: this has the effect of producing a copy of the `Cars93` data where all of the factor variables have been replaced with versions containing lowercase values ## Example: mutate_if, adding instead of replacing columns If you pass the functions in as a list with named elements, those names get appended to create modified versions of variables instead of replacing existing variables ```{r} Cars93.lower <- mutate_if(Cars93, is.factor, list(lower = tolower)) head(Cars93.lower$Type) head(Cars93.lower$Type_lower) ``` ## Example: mutate_at Let's convert from MPG to KPML but this time using `mutate_at` ```{r} Cars93.metric <- Cars93 %>% mutate_at(c("MPG.city", "MPG.highway"), list(KMPL = ~ 0.425 * .x)) tail(colnames(Cars93.metric)) ``` Here, `~ 0.425 * .x` is an example of specifying a "lambda" (anonymous) function. It is permitted short-hand for ```{r, eval = FALSE} function(.x){0.425 * .x} ``` ## Example: summarize_if Let's get the mean of every numeric column in Cars93 ```{r} Cars93 %>% summarize_if(is.numeric, mean) Cars93 %>% summarize_if(is.numeric, list(mean = mean), na.rm=TRUE) ``` ## Example: summarize_at Let's get the average fuel economy of all vehicles, grouped by their Type ```{r} Cars93 %>% group_by(Type) %>% summarize_at(c("MPG.city", "MPG.highway"), mean) ``` ## Another approach We'll learn about a bunch of select helper functions like `contains()` and `starts_with()`. Here's one way of performing the previous operation with the help of these functions, and appending `_mean` to the resulting output. ```{r} Cars93 %>% group_by(Type) %>% summarize_at(vars(contains("MPG")), list(mean = mean)) ``` ## More than one grouping variable ```{r} Cars93 %>% group_by(Origin, AirBags) %>% summarize_at(vars(contains("MPG")), list(mean = mean)) ``` ## Assignments - **Homework 2** will be posted today - **Due: Wednesday, November 11, 1:30pm ET** - Submit your .Rmd and .html files on Canvas - **Lab 5** is available on Canvas and the course website - You have until Friday evening to complete it - Friday's lab session will go over this week's material and help you complete the labs