Lecture 4 | Sampling Error and Linear Models

Check-In Here

Load the onboarding dataset (name this d to follow along with professor code in lecture).

The variable can.forloop asked students whether they could write a for-loop or not.

What is the difference in the number of students who said that YES they could for-loop, compared to the number who said either NO, MAYBE, or had NO IDEA? 

Great to find the answer, but try to find a way to get R to calculate this difference using code so you don’t have to manually type in any numbers for the calculation. (Hint : use indexing and the summary function.)

d <- read.csv("~/Dropbox/!GRADSTATS/Datasets/CLASS DATASETS - 102 - FA25/Onboarding Data/honor_onboard_FA25.csv", stringsAsFactors = T)

head(d)
                        time
1 2025/09/08 12:17:08 PM MDT
2 2025/09/08 12:19:21 PM MDT
3  2025/09/08 1:43:08 PM MDT
4  2025/09/09 2:29:56 PM MDT
5  2025/09/09 5:29:25 PM MDT
6 2025/09/09 11:32:21 PM MDT
                                                   area has.laptop write.code
1 developmental, looking into industrial organizational        Yes        Yes
2                                              Clinical        Yes        Yes
3                                  Social / Personality        Yes        Yes
4                   Behavioral and Systems Neuroscience        Yes        Yes
5                                  Social / Personality        Yes        Yes
6                                  Social / Personality        Yes        Yes
  know.prog has.data know.r can.import         can.clean         can.graph
1        No      Yes      2      Maybe Could Not Do This Could Not Do This
2       Yes      Yes      3        Yes               Yes               Yes
3        No       No      3        Yes             Maybe             Maybe
4       Yes      Yes      3        Yes             Maybe               Yes
5                Yes      3        Yes             Maybe               Yes
6       Yes      Yes      3        Yes               Yes               Yes
             can.render            can.lm can.interp can.pvalue can.sevsd
1 No Idea What This Is. Could Not Do This        Yes        Yes     Maybe
2                 Maybe               Yes        Yes        Yes       Yes
3     Could Not Do This             Maybe        Yes        Yes       Yes
4                 Maybe             Maybe        Yes        Yes       Yes
5                 Maybe             Maybe      Maybe        Yes       Yes
6                 Maybe               Yes      Maybe        Yes     Maybe
           can.95ci           can.forloop self.skills class.skills self.excite
1             Maybe No Idea What This Is.           3            4           5
2             Maybe     Could Not Do This           2            4           4
3               Yes     Could Not Do This           3            3           5
4               Yes                   Yes           4            3           5
5             Maybe                 Maybe           3            3           4
6 Could Not Do This     Could Not Do This           3            3           4
  class.excite self.prep class.prep learn.concept learn.r can.science
1            5         4          4             4       3           3
2            3         3          3             4       4           4
3            4         4          4             4       4           3
4            5         5          5             5       5           4
5            4         4          4             4       4           4
6            4         4          4             5       4           5
  should.science truth.people not.psychsci perf.pred noperf.pred too.complex
1              5            2            1         1           5           4
2              3            2            1         1           4           3
3              3            3            2         1           4           4
4              4            4            1         3           2           4
5              3            3            2         2           4           4
6              5            4            1         2           5           5
                                                                                                                                    epistemology
1 Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
2 Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
3 Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
4 Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
5 Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
6 Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
                                            consent
1 yes, you can add my responses to a class dataset.
2 yes, you can add my responses to a class dataset.
3 yes, you can add my responses to a class dataset.
4 yes, you can add my responses to a class dataset.
5 yes, you can add my responses to a class dataset.
6 yes, you can add my responses to a class dataset.
summary(d$can.forloop)
    Could Not Do This                 Maybe No Idea What This Is. 
                    5                     9                     1 
                  Yes 
                    5 
# asking R to find the # of yeses
length(d$can.forloop[d$can.forloop == "yes"]) # says Zero because it's Yes not yes
[1] 0
length(d$can.forloop[d$can.forloop == "Yes"]) # says Zero because it's Yes not yes
[1] 5
d$can.forloop == "Yes"
 [1] FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE
sum(d$can.forloop == "Yes") # adds up all the True values.
[1] 5
summary(d$can.forloop) # shows me the frequency for each level
    Could Not Do This                 Maybe No Idea What This Is. 
                    5                     9                     1 
                  Yes 
                    5 
summary(d$can.forloop)[4] # the number of Yes
Yes 
  5 
# asking R to find the other groups.
sum(!d$can.forloop == "Yes") # adds up all the True values for things that ARE NOT ! can.forloop = "Yes"
[1] 15
# finding the difference.
sum(d$can.forloop == "Yes") - sum(!d$can.forloop == "Yes")
[1] -10
## OR; define quick objects to make the code a little more readable.
notYes <- sum(!d$can.forloop == "Yes")
Yes <- sum(d$can.forloop == "Yes")

Yes - notYes
[1] -10

Announcements & Agenda

Agenda

  • 11:10 - 12:00 : Check-In and Bootstrapping Review
  • 12:00 - 12:40 : Break + Presentation
  • 12:40 - 1:00 : Linear Models (the basics)
  • 1:00 - 2:00 : Linear Models (there’s more)

Announcements

  • Lab 3. Just giving everyone credit. Four rooms was wild I don’t like thinking about it.

  • No Brian Exam. Professor did not know what he was doing but thinks two exams in two weeks is a BAD IDEA.

  • Mini Exam is in TWO Weeks. Yikes!!

    • I give you data and a question, you generate a report (in Quarto? Or just screenshot copy/paste for now is fine too.)

      • Data loading and cleaning.

      • Scale creating & descriptive statistics.

      • Linear Models

      • Bootstrapping

      • A fun challenge problem worth 1 point.

    • Ask questions if / when you have them. Don’t struggle on your own. Plenty of time to do that in other spaces!

    • We will practice / review next week (Lab 5 is a practice exam.)

    • Think it will be chill, and if not then professor takes the blame, alright?

RECAP : Estimating Sampling Error

the for-loop stuff we ended on.

We defined a fake dataset, called fakey with mean = 100 and sd = 30.

fakey <- rnorm(10000000, mean = 100, sd = 30)
hist(fakey)
abline(v = mean(fakey), lwd = 5)

mean(fakey)
[1] 100.0031

We then took 1000 random samples (n = 10) from this fake dataset, calculated a statistic for each of these samples, and then

truthbucket <- array()
for(i in c(1:1000)){
  lilfakey <- fakey[sample(1:length(fakey), 10)] # ten random individuals from fakey.
  truthbucket[i] <- mean(lilfakey)
}

hist(truthbucket)
abline(v = mean(truthbucket), lwd = 5, col = 'red')

mean(truthbucket) # the mean
[1] 99.84733
sd(truthbucket) # the standard deviation 
[1] 9.471462

DISCUSSION :

  • ICE-BREAKER : least favorite candy?

  • why are we doing this (taking random samples from this fake dataset?)

    • TLDR : trying to see how valid the samples we take out of the dataset are to the population; one sample is not necessarily valid because of a) sampling bias (people may differ in systematic ways) and b) sampling error (random people in our sample are different from the population and might bias our results in non-systematic ways).

    • TLDR4R : estimating effects of sampling error

  • what does the distribution of fakey describe?

    • a NORMAL distributon of the entire population of individual scores.

    • the variation in our distribution illustrates the fact that people in the population differ.

  • what does the distribution of truthbucket describe?

    • a NORMAL distibution of a test statistic (the mean, in this case) from a sample of (n = 10) individuals.

    • the variation in our distribution illustrates sampling error (the fact that each sample will yeild a slightly different mean from our “true” population mean.

  • which graph (TRUTHBUCKET or FAKEY) would you want to minimize variation in?

    • minimize variation in TRUTHBUCKET = each sample will be more representative of the “true” population.

    • minimize variation in FAKEY = making people be more similar to each other…which can be bad from a celebrating individual differences perspective (yay bay area; boo fascism) but…increased variation —> “more error”

  • what are TWO things you could do in order to decrease the standard deviation of truthbucket?

    • increase the sample size

    • increase the number of samples

    • decrease the standard deviation of our original population

ACTIVITY : the TWO things you could do in order to decrease the standard deviation of lilfakey.

  • Thing One : increase the sample size?
truthbucket100 <- array()
for(i in c(1:1000)){
  lilfakey <- fakey[sample(1:length(fakey), 100)] # 100 random individuals from fakey.
  truthbucket100[i] <- mean(lilfakey)
}

hist(truthbucket100)
abline(v = mean(truthbucket100), lwd = 5, col = 'red')

mean(truthbucket100) # the mean
[1] 100.0627
sd(truthbucket100) # the standard deviation 
[1] 3.084897
  • Thing Two : Increase the number of samples.
megatruthbucket <- array()
for(i in c(1:10000)){
  lilfakey <- fakey[sample(1:length(fakey), 10)] # ten random individuals from fakey.
  megatruthbucket[i] <- mean(lilfakey)
}

hist(megatruthbucket)
abline(v = mean(megatruthbucket), lwd = 5, col = 'red')

mean(megatruthbucket) # the mean
[1] 99.88343
sd(megatruthbucket) # the standard deviation 
[1] 9.540754
  • Why does increasing the number of samples not influence the standard deviation of TRUTHBUCKET???

    • too long, zoned out : each sample is still taking from the same population and the same number of people…so increasing the number of samples doesn’t really influence how far away each individual statistic will be.

    • the standard deviation of truth bucket = an estimate of sampling error = equivalent to “standard error”.

bootstrapping.

the logic.

Okay, let’s work through a real example of using a for-loop to estimate sampling error with real data.

The idea is somewhat impossible (just like raising yourself up by your bootstraps) : we are going to :

  1. use our ONE dataset….to generate NEW datasets
  2. from each NEW dataset, we will :
    1. calculate the test statistic we are interested in (i.e., the mean, but could be anything.)
    2. save this test statistics.
    3. examine the variation in test statistics.

Discussion : what’s a problem with this method? why do people do this??

an example.

Remember that in the onboarding survey, we saw people rated their own skills as lower than their classmates’ skills.

d <- read.csv("../datasets/Onboarding Data/honor_onboard_FA25.csv", stringsAsFactors = T, na.strings = "")
par(mfrow = c(1,2))

hist(d$self.skills, breaks = c(0:5), 
     col = 'black', bor = 'white', main = "Computer Skills\n(Self-Perceptions)")

hist(d$class.skills, breaks = c(0:5),
     col = 'black', bor = 'white', main = "Computer Skills\n(Perceptions of Classmates)")

mean(d$self.skills)
[1] 3.25

But would we expect to observe this same difference in a different sample of students???

Let’s use a for-loop to do this. Here’s the logic bootstrapping to test this.

d[sample(1:nrow(d), # taking our original dataset (d) and indexing new rows from 1-nrow(d)
         nrow(d), # tells R to resample the same size as our original dataset
         replace = T), ] # randomly sampling with replacement. so one person in the original data can be in the new dataset multiple times.
                           time
9     2025/09/12 2:31:07 PM MDT
4     2025/09/09 2:29:56 PM MDT
5     2025/09/09 5:29:25 PM MDT
2    2025/09/08 12:19:21 PM MDT
18   2025/09/15 12:48:04 AM MDT
18.1 2025/09/15 12:48:04 AM MDT
12    2025/09/12 3:33:14 PM MDT
16   2025/09/14 10:28:13 PM MDT
15    2025/09/14 1:06:27 PM MDT
11    2025/09/12 2:54:38 PM MDT
17   2025/09/15 12:23:22 AM MDT
3     2025/09/08 1:43:08 PM MDT
19   2025/09/15 10:51:27 AM MDT
20   2025/09/15 11:04:10 AM MDT
18.2 2025/09/15 12:48:04 AM MDT
9.1   2025/09/12 2:31:07 PM MDT
19.1 2025/09/15 10:51:27 AM MDT
6    2025/09/09 11:32:21 PM MDT
13   2025/09/13 10:27:55 AM MDT
1    2025/09/08 12:17:08 PM MDT
                                                      area has.laptop
9                                                 Clinical        Yes
4                      Behavioral and Systems Neuroscience        Yes
5                                     Social / Personality        Yes
2                                                 Clinical        Yes
18                                           Developmental        Yes
18.1                                         Developmental        Yes
12                                           Developmental        Yes
16                                    Social / Personality        Yes
15                                           Developmental        Yes
11                                           Developmental        Yes
17                                    Social / Personality        Yes
3                                     Social / Personality        Yes
19                                           Developmental        Yes
20                                                Clinical        Yes
18.2                                         Developmental        Yes
9.1                                               Clinical        Yes
19.1                                         Developmental        Yes
6                                     Social / Personality        Yes
13                                    Social / Personality        Yes
1    developmental, looking into industrial organizational        Yes
     write.code know.prog has.data know.r can.import         can.clean
9           Yes       Yes      Yes      3      Maybe             Maybe
4           Yes       Yes      Yes      3        Yes             Maybe
5           Yes      <NA>      Yes      3        Yes             Maybe
2           Yes       Yes      Yes      3        Yes               Yes
18          Yes       Yes      Yes      3      Maybe             Maybe
18.1        Yes       Yes      Yes      3      Maybe             Maybe
12          Yes       Yes     <NA>      3      Maybe Could Not Do This
16          Yes        No      Yes      3        Yes             Maybe
15          Yes        No      Yes      2        Yes             Maybe
11          Yes       Yes       No      2      Maybe             Maybe
17          Yes       Yes      Yes      4        Yes               Yes
3           Yes        No       No      3        Yes             Maybe
19          Yes       Yes      Yes      4        Yes               Yes
20          Yes       Yes       No      2      Maybe             Maybe
18.2        Yes       Yes      Yes      3      Maybe             Maybe
9.1         Yes       Yes      Yes      3      Maybe             Maybe
19.1        Yes       Yes      Yes      4        Yes               Yes
6           Yes       Yes      Yes      3        Yes               Yes
13          Yes       Yes       No      3      Maybe             Maybe
1           Yes        No      Yes      2      Maybe Could Not Do This
             can.graph            can.render            can.lm can.interp
9                  Yes     Could Not Do This             Maybe        Yes
4                  Yes                 Maybe             Maybe        Yes
5                  Yes                 Maybe             Maybe      Maybe
2                  Yes                 Maybe               Yes        Yes
18               Maybe     Could Not Do This             Maybe      Maybe
18.1             Maybe     Could Not Do This             Maybe      Maybe
12               Maybe No Idea What This Is.             Maybe      Maybe
16               Maybe                 Maybe             Maybe      Maybe
15                 Yes No Idea What This Is.             Maybe      Maybe
11                 Yes No Idea What This Is.             Maybe        Yes
17                 Yes                   Yes               Yes        Yes
3                Maybe     Could Not Do This             Maybe        Yes
19                 Yes                   Yes               Yes        Yes
20               Maybe                 Maybe             Maybe      Maybe
18.2             Maybe     Could Not Do This             Maybe      Maybe
9.1                Yes     Could Not Do This             Maybe        Yes
19.1               Yes                   Yes               Yes        Yes
6                  Yes                 Maybe               Yes      Maybe
13               Maybe     Could Not Do This Could Not Do This      Maybe
1    Could Not Do This No Idea What This Is. Could Not Do This        Yes
            can.pvalue         can.sevsd          can.95ci
9                  Yes             Maybe             Maybe
4                  Yes               Yes               Yes
5                  Yes               Yes             Maybe
2                  Yes               Yes             Maybe
18               Maybe Could Not Do This             Maybe
18.1             Maybe Could Not Do This             Maybe
12               Maybe Could Not Do This             Maybe
16               Maybe             Maybe Could Not Do This
15               Maybe Could Not Do This             Maybe
11               Maybe Could Not Do This Could Not Do This
17                 Yes Could Not Do This               Yes
3                  Yes               Yes               Yes
19                 Yes               Yes               Yes
20               Maybe             Maybe             Maybe
18.2             Maybe Could Not Do This             Maybe
9.1                Yes             Maybe             Maybe
19.1               Yes               Yes               Yes
6                  Yes             Maybe Could Not Do This
13   Could Not Do This Could Not Do This Could Not Do This
1                  Yes             Maybe             Maybe
               can.forloop self.skills class.skills self.excite class.excite
9                      Yes           3            4           4            4
4                      Yes           4            3           5            5
5                    Maybe           3            3           4            4
2        Could Not Do This           2            4           4            3
18                     Yes           3            5           5            5
18.1                   Yes           3            5           5            5
12       Could Not Do This           4            4           5            5
16       Could Not Do This           3            3           5            5
15                   Maybe           3            4           4            4
11                   Maybe           1            4           3            3
17                     Yes           4            4           5            5
3        Could Not Do This           3            3           5            4
19                   Maybe           4            5           5            5
20                   Maybe           4            5           4            2
18.2                   Yes           3            5           5            5
9.1                    Yes           3            4           4            4
19.1                 Maybe           4            5           5            5
6        Could Not Do This           3            3           4            4
13                   Maybe           4            5           4            4
1    No Idea What This Is.           3            4           5            5
     self.prep class.prep learn.concept learn.r can.science should.science
9            3          3             4       5           3              3
4            5          5             5       5           4              4
5            4          4             4       4           4              3
2            3          3             4       4           4              3
18           3          5             4       3           3              5
18.1         3          5             4       3           3              5
12           5          5             4       4           4              4
16           5          5             5       5           4              4
15           3          4             4       3           4              4
11           2          4             4       2           5              5
17           5          5             5       5           4              3
3            4          4             4       4           3              3
19           4          5             5       4           5              5
20           3          4             5       3           4              3
18.2         3          5             4       3           3              5
9.1          3          3             4       5           3              3
19.1         4          5             5       4           5              5
6            4          4             5       4           5              5
13           3          5             4       3           4              3
1            4          4             4       3           3              5
     truth.people not.psychsci perf.pred noperf.pred too.complex
9               4            3         3           4           4
4               4            1         3           2           4
5               3            2         2           4           4
2               2            1         1           4           3
18              3            2         4           5           5
18.1            3            2         4           5           5
12              1            1         3           3           5
16              4            2         3           2           4
15              3            3         2           4           4
11              3            1         3           4           3
17              2            3         1           4           4
3               3            2         1           4           4
19              4            1         4           2           2
20              2            1         2           4           3
18.2            3            2         4           5           5
9.1             4            3         3           4           4
19.1            4            1         4           2           2
6               4            1         2           5           5
13              4            1         2           4           2
1               2            1         1           5           4
                                                                                                                                       epistemology
9    Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
4    Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
5    Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
2    Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
18   Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
18.1 Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
12   Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
16                                                    There is a REAL TRUTH about what makes people think, feel, and act that we can someday learn.
15   Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
11   Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
17   Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
3    Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
19   Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
20   Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
18.2 Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
9.1  Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
19.1 Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
6    Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
13   Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
1    Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
                                               consent
9    yes, you can add my responses to a class dataset.
4    yes, you can add my responses to a class dataset.
5    yes, you can add my responses to a class dataset.
2    yes, you can add my responses to a class dataset.
18   yes, you can add my responses to a class dataset.
18.1 yes, you can add my responses to a class dataset.
12   yes, you can add my responses to a class dataset.
16   yes, you can add my responses to a class dataset.
15   yes, you can add my responses to a class dataset.
11   yes, you can add my responses to a class dataset.
17   yes, you can add my responses to a class dataset.
3    yes, you can add my responses to a class dataset.
19   yes, you can add my responses to a class dataset.
20   yes, you can add my responses to a class dataset.
18.2 yes, you can add my responses to a class dataset.
9.1  yes, you can add my responses to a class dataset.
19.1 yes, you can add my responses to a class dataset.
6    yes, you can add my responses to a class dataset.
13   yes, you can add my responses to a class dataset.
1    yes, you can add my responses to a class dataset.
d[sample(1:nrow(d), nrow(d), replace = T), ] # same code, all as one line.
                           time                                area has.laptop
10    2025/09/12 2:39:32 PM MDT                Social / Personality        Yes
19   2025/09/15 10:51:27 AM MDT                       Developmental        Yes
19.1 2025/09/15 10:51:27 AM MDT                       Developmental        Yes
12    2025/09/12 3:33:14 PM MDT                       Developmental        Yes
19.2 2025/09/15 10:51:27 AM MDT                       Developmental        Yes
18   2025/09/15 12:48:04 AM MDT                       Developmental        Yes
2    2025/09/08 12:19:21 PM MDT                            Clinical        Yes
19.3 2025/09/15 10:51:27 AM MDT                       Developmental        Yes
11    2025/09/12 2:54:38 PM MDT                       Developmental        Yes
5     2025/09/09 5:29:25 PM MDT                Social / Personality        Yes
16   2025/09/14 10:28:13 PM MDT                Social / Personality        Yes
6    2025/09/09 11:32:21 PM MDT                Social / Personality        Yes
4     2025/09/09 2:29:56 PM MDT Behavioral and Systems Neuroscience        Yes
20   2025/09/15 11:04:10 AM MDT                            Clinical        Yes
3     2025/09/08 1:43:08 PM MDT                Social / Personality        Yes
13   2025/09/13 10:27:55 AM MDT                Social / Personality        Yes
4.1   2025/09/09 2:29:56 PM MDT Behavioral and Systems Neuroscience        Yes
6.1  2025/09/09 11:32:21 PM MDT                Social / Personality        Yes
20.1 2025/09/15 11:04:10 AM MDT                            Clinical        Yes
8     2025/09/12 2:09:53 PM MDT                            Clinical        Yes
     write.code know.prog has.data know.r can.import         can.clean
10          Yes       Yes      Yes      4        Yes               Yes
19          Yes       Yes      Yes      4        Yes               Yes
19.1        Yes       Yes      Yes      4        Yes               Yes
12          Yes       Yes     <NA>      3      Maybe Could Not Do This
19.2        Yes       Yes      Yes      4        Yes               Yes
18          Yes       Yes      Yes      3      Maybe             Maybe
2           Yes       Yes      Yes      3        Yes               Yes
19.3        Yes       Yes      Yes      4        Yes               Yes
11          Yes       Yes       No      2      Maybe             Maybe
5           Yes      <NA>      Yes      3        Yes             Maybe
16          Yes        No      Yes      3        Yes             Maybe
6           Yes       Yes      Yes      3        Yes               Yes
4           Yes       Yes      Yes      3        Yes             Maybe
20          Yes       Yes       No      2      Maybe             Maybe
3           Yes        No       No      3        Yes             Maybe
13          Yes       Yes       No      3      Maybe             Maybe
4.1         Yes       Yes      Yes      3        Yes             Maybe
6.1         Yes       Yes      Yes      3        Yes               Yes
20.1        Yes       Yes       No      2      Maybe             Maybe
8           Yes       Yes      Yes      3      Maybe             Maybe
     can.graph            can.render            can.lm        can.interp
10         Yes                   Yes               Yes               Yes
19         Yes                   Yes               Yes               Yes
19.1       Yes                   Yes               Yes               Yes
12       Maybe No Idea What This Is.             Maybe             Maybe
19.2       Yes                   Yes               Yes               Yes
18       Maybe     Could Not Do This             Maybe             Maybe
2          Yes                 Maybe               Yes               Yes
19.3       Yes                   Yes               Yes               Yes
11         Yes No Idea What This Is.             Maybe               Yes
5          Yes                 Maybe             Maybe             Maybe
16       Maybe                 Maybe             Maybe             Maybe
6          Yes                 Maybe               Yes             Maybe
4          Yes                 Maybe             Maybe               Yes
20       Maybe                 Maybe             Maybe             Maybe
3        Maybe     Could Not Do This             Maybe               Yes
13       Maybe     Could Not Do This Could Not Do This             Maybe
4.1        Yes                 Maybe             Maybe               Yes
6.1        Yes                 Maybe               Yes             Maybe
20.1     Maybe                 Maybe             Maybe             Maybe
8        Maybe                 Maybe             Maybe Could Not Do This
            can.pvalue         can.sevsd          can.95ci       can.forloop
10                 Yes             Maybe               Yes             Maybe
19                 Yes               Yes               Yes             Maybe
19.1               Yes               Yes               Yes             Maybe
12               Maybe Could Not Do This             Maybe Could Not Do This
19.2               Yes               Yes               Yes             Maybe
18               Maybe Could Not Do This             Maybe               Yes
2                  Yes               Yes             Maybe Could Not Do This
19.3               Yes               Yes               Yes             Maybe
11               Maybe Could Not Do This Could Not Do This             Maybe
5                  Yes               Yes             Maybe             Maybe
16               Maybe             Maybe Could Not Do This Could Not Do This
6                  Yes             Maybe Could Not Do This Could Not Do This
4                  Yes               Yes               Yes               Yes
20               Maybe             Maybe             Maybe             Maybe
3                  Yes               Yes               Yes Could Not Do This
13   Could Not Do This Could Not Do This Could Not Do This             Maybe
4.1                Yes               Yes               Yes               Yes
6.1                Yes             Maybe Could Not Do This Could Not Do This
20.1             Maybe             Maybe             Maybe             Maybe
8    Could Not Do This             Maybe             Maybe             Maybe
     self.skills class.skills self.excite class.excite self.prep class.prep
10             4            4           4            4         5          5
19             4            5           5            5         4          5
19.1           4            5           5            5         4          5
12             4            4           5            5         5          5
19.2           4            5           5            5         4          5
18             3            5           5            5         3          5
2              2            4           4            3         3          3
19.3           4            5           5            5         4          5
11             1            4           3            3         2          4
5              3            3           4            4         4          4
16             3            3           5            5         5          5
6              3            3           4            4         4          4
4              4            3           5            5         5          5
20             4            5           4            2         3          4
3              3            3           5            4         4          4
13             4            5           4            4         3          5
4.1            4            3           5            5         5          5
6.1            3            3           4            4         4          4
20.1           4            5           4            2         3          4
8              4            4           5            5         5          5
     learn.concept learn.r can.science should.science truth.people not.psychsci
10               5       5           5              5            5            2
19               5       4           5              5            4            1
19.1             5       4           5              5            4            1
12               4       4           4              4            1            1
19.2             5       4           5              5            4            1
18               4       3           3              5            3            2
2                4       4           4              3            2            1
19.3             5       4           5              5            4            1
11               4       2           5              5            3            1
5                4       4           4              3            3            2
16               5       5           4              4            4            2
6                5       4           5              5            4            1
4                5       5           4              4            4            1
20               5       3           4              3            2            1
3                4       4           3              3            3            2
13               4       3           4              3            4            1
4.1              5       5           4              4            4            1
6.1              5       4           5              5            4            1
20.1             5       3           4              3            2            1
8                4       4           3              3            2            1
     perf.pred noperf.pred too.complex
10           5           2           3
19           4           2           2
19.1         4           2           2
12           3           3           5
19.2         4           2           2
18           4           5           5
2            1           4           3
19.3         4           2           2
11           3           4           3
5            2           4           4
16           3           2           4
6            2           5           5
4            3           2           4
20           2           4           3
3            1           4           4
13           2           4           2
4.1          3           2           4
6.1          2           5           5
20.1         2           4           3
8            3           5           5
                                                                                                                                       epistemology
10   Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
19   Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
19.1 Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
12   Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
19.2 Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
18   Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
2    Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
19.3 Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
11   Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
5    Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
16                                                    There is a REAL TRUTH about what makes people think, feel, and act that we can someday learn.
6    Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
4    Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
20   Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
3    Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
13   Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
4.1  Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
6.1  Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
20.1 Reality exists, but we may not be able to ever understand REAL TRUTH because our ability to study reality will always be influenced by people.
8      Reality and “truth” are made up by people (and scientists); scientific knowledge is just a reflection of our society’s pre-existing beliefs.
                                               consent
10   yes, you can add my responses to a class dataset.
19   yes, you can add my responses to a class dataset.
19.1 yes, you can add my responses to a class dataset.
12   yes, you can add my responses to a class dataset.
19.2 yes, you can add my responses to a class dataset.
18   yes, you can add my responses to a class dataset.
2    yes, you can add my responses to a class dataset.
19.3 yes, you can add my responses to a class dataset.
11   yes, you can add my responses to a class dataset.
5    yes, you can add my responses to a class dataset.
16   yes, you can add my responses to a class dataset.
6    yes, you can add my responses to a class dataset.
4    yes, you can add my responses to a class dataset.
20   yes, you can add my responses to a class dataset.
3    yes, you can add my responses to a class dataset.
13   yes, you can add my responses to a class dataset.
4.1  yes, you can add my responses to a class dataset.
6.1  yes, you can add my responses to a class dataset.
20.1 yes, you can add my responses to a class dataset.
8    yes, you can add my responses to a class dataset.
newclass <- d[sample(1:nrow(d), nrow(d), replace = T), ]
mean(newclass$self.skills)
[1] 3.05
allmyclassesaregreat <- array()
for(i in c(1:1000)){
  newclass <- d[sample(1:nrow(d), nrow(d), replace = T), ]
  allmyclassesaregreat[i] <- mean(newclass$self.skills)
}

hist(allmyclassesaregreat)

defining the 95% confidence interval.

BREAK TIME : Meet Back at 12:50

Presentations

RECAP : The Mean as Prediction

The Mean is a Prediction (of the Sample)

plot(d$self.skills, 
     ylab = "Self-Perception of Skills",
     xlab = "Index") 
abline(h = mean(d$self.skills, na.rm = T), lwd = 0)

plot(d$self.skills, 
     ylab = "Self-Perception of Skills",
     xlab = "Index") 
abline(h = mean(d$self.skills, na.rm = T), lwd = 5)

hist(d$self.skills, 
     ylab = "Frequency",
     xlab = "Self-Perception of Skills") 
abline(v = mean(d$self.skills, na.rm = T), lwd = 5)

There is Error in Our Prediction of the Sample (Residual Error)

This prediction of the sample has some error (residual error). We can (and will need to) quantify this error.

## quantifying errors (residuals)
residuals <- d$self.skills - mean(d$self.skills, na.rm = T)
SST <- sum(residuals^2)
SST
[1] 13.75
SST/length(residuals) # average of squared residuals (variance)
[1] 0.6875
sqrt(SST/length(residuals)) # average of residuals, unsquared (standard deviation)
[1] 0.8291562
sd(d$self.skills) # slightly higher
[1] 0.8506963
sqrt(SST/(length(residuals)-1)) # the 'real' equation; n-1 to inflate our estimate / adjust for small samples.
[1] 0.8506963

The Mean is a Prediction of our Population (with Sampling Error)

m <- array()
for(i in c(1:1000)){
 nd <- d[sample(1:nrow(d), nrow(d), replace = T),] # a new sample
 m[i] <- mean(nd$self.skills, na.rm = T)
}
mean(d$self.skills, na.rm = T)
[1] 3.25
mean(m) # similar!
[1] 3.2496
sum(m > 2.5) # all of them (100% greater than the midpoint of the scale.)
[1] 999
sd(m) # sampling error!
[1] 0.1880298
hist(m, xlim = c(1,5)) # our distribution of sampling estimates 
abline(v = c(mean(d$self.skills),
             mean(d$self.skills) + 1.96 * sd(m),
             mean(d$self.skills) - 1.96 * sd(m)),
       lwd = c(5,2,2), # two line widths
       lty = c(1,2,2)) # two line types

Linear Models : Improving our Predictions (Numeric IV)

The Mean as a Linear Model

We can define a linear model to predict a variable from some constant value (1), and the result of that will be the mean, since the mean is our best prediction (minimizes the residual errors) when we don’t have any other information about the variable.

lm(self.skills ~ 1, data = d) # predicting self.skills from a constant (1), using the datset = d

Call:
lm(formula = self.skills ~ 1, data = d)

Coefficients:
(Intercept)  
       3.25  
mod0 <- lm(self.skills ~ 1, data = d) # saving this as a model object
coef(mod0) # looking at the coefficients
(Intercept) 
       3.25 
mod0$residuals # finding the residuals
    1     2     3     4     5     6     7     8     9    10    11    12    13 
-0.25 -1.25 -0.25  0.75 -0.25 -0.25  0.75  0.75 -0.25  0.75 -2.25  0.75  0.75 
   14    15    16    17    18    19    20 
-1.25 -0.25 -0.25  0.75 -0.25  0.75  0.75 

Use information in the IV to predict the DV

Let’s try the same activity, but now we will graph each individual’s self-skill (still on the y-axis) in relationship to their perception of their classmates’ skill (on the x-axis).

plot(jitter(self.skills) ~ learn.r, data = d, 
     ylab = "Self-Perception of Skills", ylim = c(1,5),
     xlab = "Confidence in Learning R", xlim = c(1,5)) 
abline(lm(self.skills ~ learn.r, data = d), lwd = 0)

plot(jitter(self.skills) ~ learn.r, data = d, 
     ylab = "Self-Perception of Skills", ylim = c(1,5),
     xlab = "Confidence in Learning R", xlim = c(1,5)) 
abline(lm(self.skills ~ learn.r, data = d), lwd = 5)

The Linear Model :

To define a linear model, we will first use the lm() function to predict some DV from an IV.

Then, we will graph the relationship between these two variables using the plot() function. I’m using jitter() on the DV in order to shift the points a little, since they are overlapping.

Then, I draw a line (defined by the linear model) using the abline() function. I’ve made the line width = 5 and color red to make it POP.

I can look at the coefficients of the model with the coef() function. These coefficients are described by the starting place of the line when the x value is zero (the intercept), and the adjustment we make to Y as the X values increase.

mod1 <- lm(self.skills ~ learn.r, data = d)

plot(jitter(self.skills) ~ learn.r, # dv is jittered
     data = d, 
     main = "Jittered Data",
     xlim = c(1,5), ylim = c(1,5))

abline(mod1, lwd = 5, col = 'red')

coef(mod1)
(Intercept)     learn.r 
  1.4639175   0.4639175 
# intercept = 1.46 = the predicted value of Y when ALL X values are ZERO.
# slope = .46 = relationship between learn.r and our DV (self.skills)
### as learn.r increase by ONE, then self.skills will increase by .46
### these units are in the original unit of measurement (1-5 likert scale.)

There is Error in Our Prediction (residual error –> R^2)

In the graph above, I can see that the dots are not all exactly on the line. My predictions are wrong; this is residual error!

For example, a person who said their R knowledge is a 2 rated their skills as a 1. But this is different from our prediction (the line.)

To calculate this specific prediction, we can plug the person’s R knowledge rating of 1 into our linear model :

  • self.skill ~ 1.46 + .46 * learn.r

  • self.skill ~ 1.46 + .46 * 2

  • self.skill ~ 2.38

So we predict a person who’s learnR score is 2 would have a self-skill of 2.38. But this person is different - we call this a residual score = the difference between their actual score and our prediction = 2 - 2.38 = -.38. The fact this value is negative means the person is below the line.

Fortunately, R does the residual calculations for us, from the linear model object.

mod1$residuals # R does the residual calculation for us. what will happen if we add this up?
         1          2          3          4          5          6          7 
 0.1443299 -1.3195876 -0.3195876  0.2164948 -0.3195876 -0.3195876  0.6804124 
         8          9         10         11         12         13         14 
 0.6804124 -0.7835052  0.2164948 -1.3917526  0.6804124  1.1443299 -0.8556701 
        15         16         17         18         19         20 
 0.1443299 -0.7835052  0.2164948  0.1443299  0.6804124  1.1443299 
sum(mod1$residuals) # they add to....
[1] 4.440892e-16
SSE <- sum(mod1$residuals^2) # so I square them
SSE # the total squared error when I use my model to make predictions.
[1] 10.61856
## Visualizing Our Errors. (distance between actual scores and the line).
par(mfrow = c(1,2))
plot(d$self.skills, 
     ylab = "Self-Perception of Skills",
     xlab = "Index", main = "Mean as Model \n(SST = Total Sum of Squared Errors)") 
abline(h = mean(d$self.skills, na.rm = T), lwd = 5)
plot(jitter(self.skills) ~ learn.r, data = d, main = "Linear Model \n(SSE = Sum of Squared Errors When Model Making Predictions)", 
     xlim = c(1,5)) # jittered
abline(mod1, lwd = 5, col = 'red')

SST <- sum((d$self.skills - mean(d$self.skills))^2) # defining the total error

SST # the total squared error when usign the mean to make predictions.
[1] 13.75
SST - SSE # a difference in errors when using the mean vs. our model
[1] 3.131443
(SST - SSE)/SST # the relative difference in errors = R^2 (R-squared.)
[1] 0.2277413
summary(mod1)$r.squared # R does this for us. But good to do "by hand" to understand.
[1] 0.2277413

There is Error in Our Prediction of the Population (sampling error)

Yes, it’s bootstrapping time!

bucket <- array()
for(i in c(1:1000)){
  nd <- d[sample(1:nrow(d), nrow(d), replace = T), ]
  modx <- lm(self.skills ~ learn.r, data = nd)
  bucket[i] <- coef(modx)[2]
}
hist(bucket) # what do we expect to see?
abline(v = mean(bucket), lwd = 5)
abline(v = mean(bucket) + 1.96*sd(bucket), lwd = 2, lty = 'dashed')
abline(v = mean(bucket) - 1.96*sd(bucket), lwd = 2, lty = 'dashed')
mean(bucket)
sd(bucket)

Time for Another Example?

Probably not! But maybe!!!?

names(d) # what other (numeric, for now) variable might predict self.skills?
 [1] "time"           "area"           "has.laptop"     "write.code"    
 [5] "know.prog"      "has.data"       "know.r"         "can.import"    
 [9] "can.clean"      "can.graph"      "can.render"     "can.lm"        
[13] "can.interp"     "can.pvalue"     "can.sevsd"      "can.95ci"      
[17] "can.forloop"    "self.skills"    "class.skills"   "self.excite"   
[21] "class.excite"   "self.prep"      "class.prep"     "learn.concept" 
[25] "learn.r"        "can.science"    "should.science" "truth.people"  
[29] "not.psychsci"   "perf.pred"      "noperf.pred"    "too.complex"   
[33] "epistemology"   "consent"       

FOR LAB 4.

  1. Define linear models to predict a numeric DV from a numeric IV.
  2. Interpret the intercept, slope, and R^2 value.
  3. Do some bootstrapping.
  4. Repeat w/ a different dataset.
  5. Yeah!