Fall 2020

What are we trying to accomplish?

The sample analysis was shown only in class and is not viewable in this version of the notes.

Agenda

• Course overview

• Introduction to R, RStudio and R Markdown

• Programming basics

How this class will work

• No programming knowledge presumed

• Some stats knowledge presumed. E.g.:
• Hypothesis testing (t-tests, confidence intervals)
• Linear regression
• Synchronous attendance is encouraged, but not required

• Class will be very cumulative

Mechanics

• Two 80 minute lectures a week:
• First 60-70 minutes: concepts, methods, examples
• Last 10-20 minutes: short labs
• Class participation (10%)
• Quizzes (10%)
• Weekly homework (40%)
• Final project (2.5 weeks) (40%)
• Disclaimer: To pass the class, you must achieve a passing score on the final project
(at least 21 / 40)

Mechanics

• Class participation (10%)
• Labs: Each lecture has an accompanying lab assignment.
• Course website shows how participation grade will be calculated
• Quizzes (10%)
• 4 quizzes in the second half of term. Dates TBA.
• Homework assignments (40%)
• There will be 5 weekly HW assignments
• Single lowest HW score will be dropped
• HW assigned on Wednesdays, due Wednesdays at 1:30PM ET
• Late homework will not be accepted for credit
• Final project (40%)
• You will write a report analysing a policy question using a publicly available data set

Course resources

• Assignments, office hours, class notes, grading policies, useful references on R: http://www.andrew.cmu.edu/~achoulde/94842/

• Canvas for gradebook and for turning in homework

• Piazza for discussion forum (embedded in Canvas)
• Please post class/homework related question on Piazza instead of emailing the teaching staff
• Check the class website for everything else

• No required textbook, but I highly recommend:

Goal of this class

This class will teach you to use R to:

• Generate graphical and tabular data summaries

• Efficiently manipulate data using tidyverse libraries

• Perform statistical analyses (e.g., hypothesis testing, regression modeling)

• Produce reproducible statistical reports using R Markdown

• Near the end of class we’ll also preview how to integrate R with other tools (e.g., databases, web, etc.)

Why R?

• Free (open-source)

• Programming language (not point-and-click)

• Excellent graphics

• Offers broadest range of statistical tools

• Easy to generate reproducible reports

• Easy to integrate with other tools

The R Console

Basic interaction with R is through typing in the console

This is the terminal or command-line interface

RStudio is an IDE for R

RStudio has 4 main windows (‘panes’):

• Source
• Console
• Workspace/History
• Files/Plots/Packages/Help

RStudio is an IDE for R

RStudio has 4 main windows (aka ‘panes’):

• Source
• Console
• Workspace/History
• Files/Plots/Packages/Help

RStudio: Panes overview

1. Source pane: create a file that you can save and run later

2. Console pane: type or paste in commands to get output from R

3. Workspace/History pane: see a list of variables or previous commands

4. Files/Plots/Packages/Help pane: see plots, help pages, and other items in this window.

Console pane

• Use the Console pane to type or paste commands to get output from R

• To look up the help file for a function or data set, type `?function` into the Console
• E.g., try typing in `?mean`
• Use the `tab` key to auto-complete function and object names

Source pane

• Use the Source pane to create and edit R and Rmd files

• The menu bar of this pane contains handy shortcuts for sending code to the Console for evaluation

Files/Plots/Packages/Help pane

• By default, any figures you produce in R will be displayed in the Plots tab
• Menu bar allows you to Zoom, Export, and Navigate back to older plots
• When you request a help file (e.g., `?mean`), the documentation will appear in the Help tab