---
title: 'Lecture 5:
Loops and some of their alternatives'
author: "Prof. Alexandra Chouldechova"
date: "Fall 2020"
output:
ioslides_presentation:
highlight: tango
widescreen: true
smaller: true
---
## Agenda
- For/while loops to iterate over data
- `apply`
- `map`, `map_`, `map_at`, `map_if`
- `mutate_at`, `mutate_if`
- `summarize_at`, `summarize_if`
## Package and data loading
```{r, message=FALSE, warning=FALSE}
# Our favourite library
library(tidyverse)
# For Cars93 data again
Cars93 <- MASS::Cars93
# For the clean survey data:
survey <- read.csv("http://www.andrew.cmu.edu/user/achoulde/94842/data/survey_data2020.csv",
header=TRUE, stringsAsFactors = FALSE)
```
## More programming basics: loops
- We'll now learn about loops and some more efficient/syntactically simple loop alternatives
- **loops** are ways of iterating over data
## For loops: a pair of examples
```{r}
for(i in 1:4) {
print(i)
}
phrase <- "Good Night,"
for(word in c("and", "Good", "Luck")) {
phrase <- paste(phrase, word)
print(phrase)
}
```
## For loops: syntax
> A **for loop** executes a chunk of code for every value of an **index variable** in an **index set**
- The basic syntax takes the form
```{r, eval=FALSE}
for(index.variable in index.set) {
code to be repeated at every value of index.variable
}
```
- The index set is often a vector of integers, but can be more general
## Example
```{r}
index.set <- list(name="Michael", weight=185, is.male=TRUE) # a list
for(i in index.set) {
print(c(i, typeof(i)))
}
```
## Example: Calculate sum of each column
```{r}
fake.data <- matrix(rnorm(500), ncol=5) # create fake 100 x 5 data set
head(fake.data,2) # print first two rows
col.sums <- numeric(ncol(fake.data)) # variable to store running column sums
for(i in 1:nrow(fake.data)) {
col.sums <- col.sums + fake.data[i,] # add ith observation to the sum
}
col.sums
colSums(fake.data) # A better approach (see also colMeans())
```
## while loops
- **while loops** repeat a chunk of code while the specified condition remains true
```{r, eval=FALSE}
day <- 1
num.days <- 365
while(day <= num.days) {
day <- day + 1
}
```
- We won't really be using while loops in this class
- Just be aware that they exist, and that they may become useful to you at some point in your analytics career
## Loop alternatives
Command | Description
--------|------------
`apply(X, MARGIN, FUN)` | Obtain a vector/array/list by applying `FUN` along the specified `MARGIN` of an array or matrix `X`
`map(.x, .f, ...)` | Obtain a *list* by applying `.f` to every element of a list or atomic vector `.x`
`map_(.x, .f, ...)` | For `` given by `lgl` (logical), `int` (integer), `dbl` (double) or `chr` (character), return a *vector* of this type obtained by applying `.f` to each element of `.x`
`map_at(.x, .at, .f)` | Obtain a *list* by applying `.f` to the elements of `.x` specified by name or index given in `.at`
`map_if(.x, .p, .f)` | Obtain a *list* `.f` to the elements of `.x` specified by `.p` (a predicate function, or a logical vector)
`mutate_all/_at/_if` | Mutate all variables, specified (at) variables, or those selected by a predicate (if)
`summarize_all/_at/_if` | Summarize all variables, specified variables, or those selected by a predicate (if)
- These take practice to get used to, but make analysis easier to debug and less prone to error when used effectively
- The best way to learn them is by looking at a bunch of examples. The end of each help file contains some examples.
## Example: apply()
```{r}
colMeans(fake.data)
apply(fake.data, MARGIN=2, FUN=mean) # MARGIN = 1 for rows, 2 for columns
# Function that calculates proportion of vector indexes that are > 0
propPositive <- function(x) mean(x > 0)
apply(fake.data, MARGIN=2, FUN=propPositive)
```
## Example: map, map_()
```{r}
map(survey, is.numeric) # Returns a list
map_lgl(survey, is.numeric) # Returns a logical vector with named elements
```
## Example: apply(), map(), map_()
```{r}
apply(cars, 2, FUN=mean) # Data frames are arrays
map(cars, mean) # Data frames are also lists
map_dbl(cars, mean) # map output as a double vector
```
## Example: mutate_if
Let's convert all factor variables in Cars93 to lowercase
```{r}
head(Cars93$Type)
Cars93.lower <- mutate_if(Cars93, is.factor, tolower)
head(Cars93.lower$Type)
```
- Note: this has the effect of producing a copy of the `Cars93` data where all of the factor variables have been replaced with versions containing lowercase values
## Example: mutate_if, adding instead of replacing columns
If you pass the functions in as a list with named elements, those names get appended to create modified versions of variables instead of replacing existing variables
```{r}
Cars93.lower <- mutate_if(Cars93, is.factor, list(lower = tolower))
head(Cars93.lower$Type)
head(Cars93.lower$Type_lower)
```
## Example: mutate_at
Let's convert from MPG to KPML but this time using `mutate_at`
```{r}
Cars93.metric <- Cars93 %>%
mutate_at(c("MPG.city", "MPG.highway"),
list(KMPL = ~ 0.425 * .x))
tail(colnames(Cars93.metric))
```
Here, `~ 0.425 * .x` is an example of specifying a "lambda" (anonymous) function. It is permitted short-hand for
```{r, eval = FALSE}
function(.x){0.425 * .x}
```
## Example: summarize_if
Let's get the mean of every numeric column in Cars93
```{r}
Cars93 %>% summarize_if(is.numeric, mean)
Cars93 %>% summarize_if(is.numeric, list(mean = mean), na.rm=TRUE)
```
## Example: summarize_at
Let's get the average fuel economy of all vehicles, grouped by their Type
```{r}
Cars93 %>%
group_by(Type) %>%
summarize_at(c("MPG.city", "MPG.highway"), mean)
```
## Another approach
We'll learn about a bunch of select helper functions like `contains()` and `starts_with()`.
Here's one way of performing the previous operation with the help of these functions, and appending `_mean` to the resulting output.
```{r}
Cars93 %>%
group_by(Type) %>%
summarize_at(vars(contains("MPG")), list(mean = mean))
```
## More than one grouping variable
```{r}
Cars93 %>%
group_by(Origin, AirBags) %>%
summarize_at(vars(contains("MPG")), list(mean = mean))
```
## Assignments
- **Homework 2** will be posted today
- **Due: Wednesday, November 11, 1:30pm ET**
- Submit your .Rmd and .html files on Canvas
- **Lab 5** is available on Canvas and the course website
- You have until Friday evening to complete it
- Friday's lab session will go over this week's material and help you complete the labs