Lecture 11 - Multilevel Models (MLM)

Check-In : Let’s GOOOOOOOOOO.

Graph A	Graph B

Announcements

Final Project due….12/10?
- Pre-register your analyses…did not mean to scare y’all!
  - Think carefully and clearly about your dataset (outliers, measures, models)
  - Be consistent in your analyses; keep things simple.
  - When you deviate from your analysis plan, that’s okay! You are learning and exploring. Just good to recognize and state this.
  - Review a few? Or y’all good?
- Next Week : Draft of What Ya Got.
The End Is Near.
- TODAY : Multilevel models.
- THE END. The Learning Has Stopped; Would You Like to Know More?

Not a Lab 9

Professor Approach vs. Y’alls Work : we did things differently!
- why is that okay / not a problem? :
- why is that not okay / a problem? :
The Original Paper.

Professor Attempt to Replicate

Comments / Thoughts on the Article or the Analyses?
- How effectively / well did the researcher address the question about dehumanizing language and attitudes about immigration?
- What other considerations (measures, methods, analyses, etc.) might the researcher have included if he wanted to really study dehumanization and immigration?
- What are our takeaways from this exercise?

Multilevel Models (MLM) : Conceptual Understanding

Definition

Multilevel Models (MLM)
- Also Called : Hierarchical linear models, mixed effects models, random effects models, etc.
- MODELS MODELS MODELS : It’s just a line! Lots of lines. (Lots of lines models?)
Level 1 –> Level ? :
- Level 1 : the smallest unit of analysis (where the DV is measured)
- Level 2 : where level 1 data is “nested”; Level 2 groups organize non-independent responses at Level 1.

Some Examples of Multilevel Models

Visual Example	Details
	Level 1 : Students in a school Level 2 : The school Why Non-Independence? Source : analyticsvidhya.com
	Level 1 : Participant data in a study. Level 2 : The results from a study. Why Non-Independence? Source : Meta Analysis in R
	Why Non-Independence? Source : Youtube video on HLM

Level 1 : Students in a school

Level 2 : The school

Why Non-Independence?

Source : analyticsvidhya.com

Level 1 : Participant data in a study.

Level 2 : The results from a study.

Why Non-Independence?

Source : Meta Analysis in R

Why Non-Independence?

Source : Youtube video on HLM

Why These Models Matter?

“Fixed Effects”	“Random Effects”

The Formulas

The Linear Model Equation

\[\huge y = \beta_0 + \beta_1x_1 + ... + \beta_kx_k + \epsilon\]

The MLM Equation (Level Notation)

Why Are We Doing This?

The Assumption of Independence Has Been Violated! (MLM increases our power and reliability as scientists.)
- Multiple measures of an individual gives you a more reliable estimate of what and who they are.
- A person serves as their own control; examining how an individual changes over time (as a result of some other variable or an experimental manipulation)
Better than a purely “fixed effects” approach.
- While we could just account for group variation by adding this to our model as dummy-coded group identifiers…
- …the MLM results in a simpler model (less coefficients; we just allow the intercepts and slopes to vary)
Model more complex phenomenon.
- How people change over time (within-person variation).
- Simpson’s Paradox
- Other Examples?

How Do We Do This in R?

Example 1 : The Sleep Dataset

From the ?sleep dataset; “Data which show the effect of two soporific drugs (increase in hours of sleep compared to control).”

Extra : increase in hours of sleep
Group : drug given (1 = control; 2 = drug)
ID : patient ID

The Linear Model (a “Between Person” Study)

library(ggplot2)
ggplot(sleep, aes(y = extra, x = group)) + 
  geom_point(size=2) + 
  stat_summary(fun.data=mean_se, color = 'red', size = 1.25, linewidth = 2)

lmod <- lm(extra ~ as.factor(group), data = sleep)
summary(lmod)


Call:
lm(formula = extra ~ as.factor(group), data = sleep)

Residuals:
   Min     1Q Median     3Q    Max 
-2.430 -1.305 -0.580  1.455  3.170 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)  
(Intercept)         0.7500     0.6004   1.249   0.2276  
as.factor(group)2   1.5800     0.8491   1.861   0.0792 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.899 on 18 degrees of freedom
Multiple R-squared:  0.1613,    Adjusted R-squared:  0.1147 
F-statistic: 3.463 on 1 and 18 DF,  p-value: 0.07919

The Linear Model, with ID as a grouping factor (a “Within-Person” Study)

Many linear models! Look at the graph below. What’s going on? What do you observe? How might this help us understand the relationship between these two variables?

ggplot(sleep, aes(y = extra, x = group, color = ID)) + 
  geom_point(size=2) + 
  geom_line(aes(group = ID), linewidth = 0.75)

Random Intercepts : Still just one equation. But a lot more lines!

#install.packages("lme4")
library(lme4)

Loading required package: Matrix

library(lmerTest)


Attaching package: 'lmerTest'

The following object is masked from 'package:lme4':

    lmer

The following object is masked from 'package:stats':

    step

library(Matrix)
mlmod <- lmer(extra ~ as.factor(group) + (1 | ID), data = sleep)
summary(mlmod)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: extra ~ as.factor(group) + (1 | ID)
   Data: sleep

REML criterion at convergence: 70

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.63372 -0.34157  0.03346  0.31511  1.83859 

Random effects:
 Groups   Name        Variance Std.Dev.
 ID       (Intercept) 2.8483   1.6877  
 Residual             0.7564   0.8697  
Number of obs: 20, groups:  ID, 10

Fixed effects:
                  Estimate Std. Error      df t value Pr(>|t|)   
(Intercept)         0.7500     0.6004 11.0814   1.249  0.23735   
as.factor(group)2   1.5800     0.3890  9.0000   4.062  0.00283 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr)
as.fctr(g)2 -0.324

Fixed Effects : The “Average” across all the grouping variables. Our friend the linear model!
- Intercept : ???? (discussed in lecture!)
- Slope : ???? (discussed in lecture!)
- Correlation of Fixed Effects : How our intercept and slope are related to each other. ????? (discussed in lecture!)
Random Effects :
- ICC = Intraclass Correlation Coefficient = how much the variation in our grouping variable (here : subject) explains total variation.
- To calculate : take variance of intercept / total variance
```
2.8483 / (2.8483 + 0.7564)
```
```
[1] 0.7901628
```
- OR :
```
library(performance)
icc(mlmod)
```
```
# Intraclass Correlation Coefficient

    Adjusted ICC: 0.790
  Unadjusted ICC: 0.668
```

More about those random effects. We can examine them for the individuals in the study. They are adjustments to the intercept (people start with different baselines of sleep.)

fixef(mlmod)

      (Intercept) as.factor(group)2 
             0.75              1.58

ranef(mlmod)

$ID
   (Intercept)
1   -0.2118668
2   -1.7125900
3   -0.9622284
4   -1.8450067
5   -1.4477565
6    2.0833569
7    2.7013017
8   -0.3001446
9    0.6709115
10   1.0240229

with conditional variances for "ID"

var(ranef(mlmod)$ID)

            (Intercept)
(Intercept)    2.514447

But Professor, Where are the S***s???

By default, lmer does not run statistical tests. I heard this was because the author of the package was philosophically opposed to them, but I think it’s also because there are continued debates about how best to calculate and interpret p-values for statistics that, by definition, can vary.

You can report confidence intervals from the results of the lmer model.

confint(mlmod)

Computing profile confidence intervals ...

                       2.5 %   97.5 %
.sig01             0.9660521 2.786017
.sigma             0.5627986 1.377856
(Intercept)       -0.4630090 1.963009
as.factor(group)2  0.7814283 2.378572

However, if you really want the stars, there’s another package that adds the stars, and gives some other useful features.

#install.packages("lmerTest")
library(lmerTest) # note that the function lmer from package lme4 has been masked.
mlmod <- lmer(extra ~ as.factor(group) + (1 | ID), data = sleep) # the same model; same equation
summary(mlmod) # new output!

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: extra ~ as.factor(group) + (1 | ID)
   Data: sleep

REML criterion at convergence: 70

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.63372 -0.34157  0.03346  0.31511  1.83859 

Random effects:
 Groups   Name        Variance Std.Dev.
 ID       (Intercept) 2.8483   1.6877  
 Residual             0.7564   0.8697  
Number of obs: 20, groups:  ID, 10

Fixed effects:
                  Estimate Std. Error      df t value Pr(>|t|)   
(Intercept)         0.7500     0.6004 11.0814   1.249  0.23735   
as.factor(group)2   1.5800     0.3890  9.0000   4.062  0.00283 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr)
as.fctr(g)2 -0.324

ranova(mlmod) # a way to test whether inclusion of random effect improves the model fit or not.

ANOVA-like table for random-effects: Single term deletions

Model:
extra ~ as.factor(group) + (1 | ID)
         npar  logLik    AIC    LRT Df Pr(>Chisq)   
<none>      4 -34.978 77.956                        
(1 | ID)    3 -39.384 84.768 8.8118  1   0.002993 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

detach("package:lmerTest") # won't be using this?

A Neat Thing : The “Paired T-Test” is Just a Narrow Form of the MLM

d <- sleep # copying the data
sleepwide <- data.frame(d[1:10,1], d[11:20,1]) # moving into wide format
names(sleepwide) <- c("Extra1", "Extra2") # renaming variables
sleepwide # new data; in the "wide" format.

   Extra1 Extra2
1     0.7    1.9
2    -1.6    0.8
3    -0.2    1.1
4    -1.2    0.1
5    -0.1   -0.1
6     3.4    4.4
7     3.7    5.5
8     0.8    1.6
9     0.0    4.6
10    2.0    3.4

t.test(sleepwide$Extra1, sleepwide$Extra2, paired = T) # comparing mean of T1 to mean of T2, assuming a paired distribution....


    Paired t-test

data:  sleepwide$Extra1 and sleepwide$Extra2
t = -4.0621, df = 9, p-value = 0.002833
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -2.4598858 -0.7001142
sample estimates:
mean difference 
          -1.58

What About Random Slopes?

For random intercepts and random slopes : Still just one equation….but….too many lines for the model to converge.

#mlmod2 <- lmer(extra ~ as.factor(group) + (group | ID), data = sleep)

Example 2 : Another Sleep Dataset

A classic teaching dataset from lmer. Hooray!

?sleepstudy
s <- sleepstudy

A Question : Is there a relationship between number of days of sleep deprivation and reaction time?

The Graph

How might we graph this in ggplot?

library(ggplot2)
ggplot(sleepstudy, aes(y = Reaction, x = Days, color = Subject)) + 
  geom_point(size=2) + 
  # facet_wrap(~Subject) + 
  geom_smooth(method = "lm")

`geom_smooth()` using formula = 'y ~ x'

  # geom_line(aes(group = Subject), linewidth = 0.75)

Interpreting the Model (Random Intercepts)

What model would we define?

Reaction is the DV
I’m adding Days as a Fixed IV (so I’ll get the average effect of # of days of sleep deprivation on reaction time)
I’m also adding a random intercept : (1 | Subject) that will estimate how much the intercept (the 1 term) of individual raction times (the level 2 variable) varies by Subject (the level 1 grouping variable).

library(lme4)
l2 <- lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)
summary(l2)

Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ Days + (1 | Subject)
   Data: sleepstudy

REML criterion at convergence: 1786.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.2257 -0.5529  0.0109  0.5188  4.2506 

Random effects:
 Groups   Name        Variance Std.Dev.
 Subject  (Intercept) 1378.2   37.12   
 Residual              960.5   30.99   
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error t value
(Intercept) 251.4051     9.7467   25.79
Days         10.4673     0.8042   13.02

Correlation of Fixed Effects:
     (Intr)
Days -0.371

How do we interpret the results of this model?

Fixed Effects : these deal with the “average” effects - ignoring all those important individual differences (which are accounted for in the random effects.)
- Intercept = 251.41 = the average person’s reaction time at 0 days of sleep deprivation is 251.4 milliseconds.
- Days = 10.47 = for every day of sleep deprivation, the average person’s reaction time increases by 10.47 MS; the standard error is an estimate of how much variation we’d expect in this average slope due
Random Effects : these describe those individual differences in people’s starting responses to the DV (random intercepts) and individual differences in the relationship between the IV and the DV (random slopes).
- Subject (Intercept) = 37.12
- Residual = 30.99

Interpreting the Model (Random Intercepts and Slopes)

What model would we define?

lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)

Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ Days + (Days | Subject)
   Data: sleepstudy
REML criterion at convergence: 1743.628
Random effects:
 Groups   Name        Std.Dev. Corr
 Subject  (Intercept) 24.741       
          Days         5.922   0.07
 Residual             25.592       
Number of obs: 180, groups:  Subject, 18
Fixed Effects:
(Intercept)         Days  
     251.41        10.47

How do we interpret the results of this model?

Would You Like to Learn More?

A Nice Overview, For and By Psychologists
A More In-Depth Mini-Textbook
A Nice Overview of the sleepstudy
TAKE A CLASS :
- Sophia Rabe-Hasketh in the Education Department [Stata]
- Aaron Fisher and / or Keenan Joyner is offering a class maybe? Unclear what the psych department is doing.

Check-In : Let’s GOOOOOOOOOO.

Announcements

Not a Lab 9

Presentation

Multilevel Models (MLM) : Conceptual Understanding

Definition

Some Examples of Multilevel Models

Why These Models Matter?

The Formulas

Why Are We Doing This?

How Do We Do This in R?

Example 1 : The Sleep Dataset

The Linear Model (a “Between Person” Study)

The Linear Model, with ID as a grouping factor (a “Within-Person” Study)

But Professor, Where are the S***s???

A Neat Thing : The “Paired T-Test” is Just a Narrow Form of the MLM

What About Random Slopes?

Example 2 : Another Sleep Dataset

The Graph

Interpreting the Model (Random Intercepts)

Interpreting the Model (Random Intercepts and Slopes)

Would You Like to Learn More?