Class 2 | Working With Data

Check-In, Agenda, and Announcements

Welcome Back! Access this Document Here : catterson.github.io/calstats/calstatsFA25.html

PLEASE COMPLETE THIS CHECK-IN : tinyurl.com/againmodels

Agenda

  • 3:10 - 3:20. Check-In & Review

  • 3:20 - 4:00. No stats in this class.

  • 4:00 - 4:10. BREAK TIME #1

  • 4:10 - 5:00. R in this class (loading data & graphing variables)

  • 5:00 - 5:05. Break Time #2

  • 5:05 - 5:30. Operationalization, Construct, and Measurement, Oh My!

  • 5:30 - 6:00. Wrap Up & Questions About Grad School.

Announcements

  • DISCORD IS BUMPIN :

    • students (and sometimes professor) helping students : with the R stuff; the real stuff; the gobears stuff.

    • students helping professor : thinking about non-binary; non-categorical measures of gender (and race).

  • Lab 2 is Posted. We are going to work on this today. Yay.

  • Brain Exam in Two Weeks.

    • We will practice next week. Gonna be chill.

    • If can’t attend section; attend another section. Or talk to GSI.

    • DSP : you’ll have extra time as needed. Can find quiet space as needed (just leave TECH in the classroom?)

  • “I want more work!”, said the student(s).

    • ggplot2 : for fancy graphs

    • Quarto / RMarkdown : for more authentic ways of embedding code, output, and words together. (what I’m writing the textbook, class notes, and lab assignments in.)

PART 1 : No Statistics In This Class

A Picture Is Worth 1000 Words

Florence Nightingale :

  1. Did you learn about Florence Nightingale in other classes?
  2. What did you remember learning about Florence Nightingale in other classes?

Discussion.

Look at the graph below, and use it to answer the following questions.

  1. Ice Breaker : What’s the worst time you’ve been sick? What’s your best way of trying to feel better / self-care when sick?
  2. Look at the graph below : What’s going on in this graph / who cares / how can we use this knowledge??
    1. there are three colors!!!
      1. each color = a specific cause of death = A VARIABLE! = A CATEGORICAL FACTOR VARIABLE = in the way this variable was measured, you could only die for one reasons.
        1. LEVELS = the specific causes = wounds, disease, other
        2. pattern : see more gray (death from disease)
      2. months is also happening on this graph!! = A CATEGORICAL VARIABLE = you cannot be in multiple monthses; you must be in one month.
        1. FACTOR = months
        2. LEVEL = january, february, march, etc…
      3. number of deaths = NUMERIC VARIABLE (COUNT DATA)
    2. lots of dying [war is awful]
  3. It’s all linear models : What are the variables in this graph? How would you organize these as a linear model?
    1. death ~ type of death + month + error

Learning from Histograms (No Statistics Terms!)

Below are some data that I graphed1. Take 1-2 minutes and SILENTLY (on your own) think about what you learn about the variable from this graph. Avoid FANCY STATS LANGUAGE - just explain the main ideas without those labels for now.

1 You will work with these later in the semester once we review how to create a likert scale (that combines 10 questions into one variable).

hist(d$SELFES, col = 'black', bor = 'white', 
     main = "Histogram of Self-Esteem", 
     xlab = "Self-Esteem Score", breaks = 15)

Things We Learned From the Graph

  • the graph is about self-esteem

  • the graph goes from a self-esteem of 1.0 to 4.0

  • RIVA : most self-esteem scores are about 2.5

  • MIA : as you go toward each end (1 or 4) there are less people with that specific self-esteem.

    • more people have a self-esteem in the middle than either really high or really low –> twice-ish as many people reported 4 vs. a 1.
  • graph is a little lumpier on the RIGHT side; more people tend to say they have a higher self-esteem than a lower self-esteem.

  • this is a LARGE dataset

Things We Cannot Learn From the Graph

  • RYAN : no idea the full range of the scale; or people’s individual scores (people are grouped into categories); we don’t know how self-esteem was measured.

  • other variables related to the demographics of these people : age, gender, occupation, height, weight, nation, education, etc.

  • REASONS FOR WHY GRAPH IS A LITTLE LUMPIER ON THE LEFT-SIDE THAN THE RIGHT SIDE????

    • healthy to have a high self-esteem, and most people are healthy in 2025 CAPITALISM RULES NO PROBLEMS!!!

    • social desirability to have higher self-esteem, so people tending to report that.

    • the people who were surveyed had high self-esteem .

BREAK TIME : MEET BACK AT 4:25!!!!

an image from the movie THE MATRIX, where we see a man sitting at a variety of computer terminals with many wires.

PART 2 : Working in R

CODE BOOK : The Covid-19 Behavior Dataset

  • Look over the codebook (below).

    • What is one variable from the dataset that is interesting to you (if any)?

    • Is this categorical or numeric data?

    • What predictions do you have about this variable?

    • How might you use this variable in a linear model (as a DV or as a IV?)

  • Loading Data Issues :

    • rename this to something short!

    • posit.cloud : clicking on the name to load (vs. the “Import Dataset”)

Link to Data (also on bCourses)

IN R : The Covid-19 Behavior Dataset

Things we will do.

  • Open up Lab 2

  • Create an RScript

  • Load the Covid-19 Behavior Dataset (.csv file) and the CODE BOOK (.pdf)

    • the CODEBOOK explains what the variables measured

    • the .csv data file contains the data.

    • Make sure the data loaded correctly into R

  • Graph some variables and learn about the individuals from this graph

    • numeric data

    • categorical data

  • Save your work for Lab 2, Questions 1 and 2 and 3. Yeah!

BREAK TIME : MEET BACK AT 5:18 [5 min]

WE BACK, AND MINI CLASS DATA IS LIVE : tinyurl.com/miniclassexit

MEANWHILE, SOME STUDENT QUESTIONS :

  • can u share your script?

  • how does your code wrap around (like NEO’s sunglasses??)

PART 3 : Defining Data

Operationalization, Construct, and Measurement Error

  • operationalization : how researchers define the variable(s) they will study; this is a process; often the focus of a researcher’s question in the scientific method.
  • construct : some operationalized psychological phenomenon of interest. some examples below :
    • voxel : three dimensional area of brain activation
    • self-esteem : how a person feels about themselves
    • secure attachment style : how much a person seeks out and trusts a relationship partner.
  • measurement error : when there is a lack of validity in our measures. The more error in our measures, the more error there will be in our predictions (“garbage in → garbage out”).

KEY IDEA : the way a variable is measured is CRITICAL.

  • The News Article : What comes to mind when you think of a “Cognitive Test”?

  • The Scientific Operationalization of this “Cognitive Test”

ACTIVITY : Counting Interruptions

Check-in :tinyurl.com/dudesinterrupting

  1. Count the number of interruptions in the video (which professor will play below). 

  2. Submit your answer, then wait for the letter of the day.

DISCUSSION TOPICS :

  • Why is this a problem for science???

    • everyone has a different definition of the number of interruptions AND everyone all watched the SAME video.

    • if we had a PERFECT measure of an interruption, then everyone’s number would be the same.

  • How do we OPERATIONALIZE an INTERRUPTION?

    • ONLY COUNT the number of times guy on the RIGHT is interrupted.

    • ONE INTERRUPTION = the person on the RIGHT has to STOP their sentence and RESTART.

  • What PREDICTIONS can we make about counting interruptions a second time?

    • there will be more similarity in our data the second time (LESS variation in t2 responses vs. t1)

    • people will count fewer interruptions in T2 vs. T1

      • we are only counting ONE person’s interruption.

      • the person had to RESTART their sentence.

FOR NEXT WEEK

  1. Discussion Section : working on Lab 2; chatting about final project ideaz.
  2. Lab 2
  3. Read Chapter 3 and Complete Quiz 3
  4. Stay safe out there; be kind to self and others.