Chapter 3 | Description

Welcome Back

CHECK-IN : tinyurl.com/miniclassexit

Agenda

2:10 - 2:20. Check-In & Announcements.
2:20 - 3:40. Describing Data with Brain and R.
3:40 - 3:50. BREAK TIME
3:40 - 4:00. Milestone #2 (Intros)
4:00 - 5:00. Okay.

Part 1 : Describing Data with Brain.

KEY IDEA : The Mean as Prediction

How would you feel… GOOD / BAD / INDIFFERENT
- …if I told you that the average on the R Exam last semester was a 50%?
- …if I told you that the average on the R Exam last semester was a 90%?
- …if everyone you knew and loved called you “average”?
Why would you feel this way?

LET’S PLAY : WHERE THE LINE?

Where would a vertical line best fit through these data?

Where the Line?
There the Line!
What’s Going On?

The histogram organizes the data, but you can see the same patterns in both graphs.

The mean family size in this dataset is :

[1] 2.522053

The Mean As Prediction (and Error)

How can you visualize each of these components?
How might you translate this equation into a simpler language?

\[ \Huge y_i = \bar{Y} + \epsilon_i \]

\(y_i\) = the individual’s (\(_i\)) actual score of a variable (\(y\)) we are trying to predict.
\(\bar{Y}\) = our prediction (the mean, in this case)
\(\epsilon_i\) = residual error

Part 2 : Describing Data in R.

hey, another check-in. tinyurl.com/interruptlifewithR

load the interruption data
name the dataset d to follow along with professor.

Lab 3, Problem 1. Interruption Problems

Download the “interruption” data from bCourses, and import this data into R. This dataset has two variables of the number of interruptions counted before (int1) and after (int2) our operationalization.

Load the data (I’ll call it d if you want to follow along with my code), check to make sure it loaded correctly, and report the sample size and names of the variables.
Graph these variables as a histogram (use the par() function to graph them side by side). Change the arguments so the graphs have the same x-axis and y-axis ranges, and nice labels.
Report the mean, median, range, and standard deviation for each variables. Add vertical lines to your graphs above that illustrate the mean (in red), the median (in blue), and the standard deviation (in dashed red).
Describe how these statistics changed after operationalizing an interruption, and why these changes make sense given the nature of our operationalizations. Then, explain why the median is closer to the mean for Time 2.
Just to make sure we can still do this, graph a categorical variable from the dataset, report the frequency of each group, and describe what you learn from the distribution.

BREAK TIME : MEET BACK AT 4:45

Part 3 : Final Project Workshop

Looking at some literature reviews.
Refining our Linear Models.
Introducing the idea of a moderator variable