Lab 3: Sampling Error and Bias
Answer the questions below as a .PDF (you can either render from Quarto as .PDF, or render as .html and then print this html file as a .PDF, or just copy/paste the questions into Google Sheets and take screenshots of R output.) Make sure to reference any external sources (e.g., stack exchange; ChatGPT; peer help) that you used at the top of your lab assignment.
DO IN SECTION, NO NEED TO SUBMIT :
In Lab 2, we worked with the self-esteem dataset. We will work with these data again in Lab 3. Feel free to re-use your code from Lab 2, or look at & learn from my key (posted to the course page) as needed.Load the data again and check to make sure the data loaded correctly.Graph the variable age as a histogram, and report the mean, median, standard deviation, and range. What do these statistics tell you about the data?Remove the outliers in age from the dataset, and explain why you considered them to be outliers. Graph the newly-cleaned variable as a histogram, and report the mean, median, standard deviation, and range.Describe what you learn about this variable (and the participants in the sample) from this distribution. Do these data appear representative of the population? Why / why not?
Now, use bootstrapping to estimate the sampling error of the age variable above. (In other words, resample the selfesteem dataset, calculate the average age from this “new” sample, save the value somewhere, and then repeat this 1000 times). Graph these 1000 average ages, and report the mean of this distribution of sample estimates (it should be very close to the mean of age in the original sample) and the standard deviation of the distribution of sample estimates. Why is the standard deviation of the distribution of sample estimates so small? Why is the graph of these 1000 average ages normal when the original distribution of scores is skewed??? Make sure you’ve removed the outliers.In Lab 2, we created a self-esteem scale from the 10-item Rosenberg (1965) self-esteem scale. The mean of this variable was higher than the mid-point of the scale (2.5), suggesting that people (on average) tend to see them a little more positively than truly neutral. Another possibility is that this difference is just due to sampling error. Create the scale again (you can copy / paste your code from Lab 2, or use / learn from the key if you struggled with this.) Then, use bootstrapping to resample the original dataset 1000 times, and estimate the average sampling error of the mean. Report the estimate of sampling error, and the 95% Confidence Interval of the mean. Do these statistics give you more or less confidence that people tend to see themselves as higher in self-esteem (than the mid-point)? Why / why not?NO R REQUIRED! Find an empirical article relevant to your research interests. This should be an original study, where the authors collected their own data. Submit your answers to this question as part of this lab, and also share your ideas in the Vision Board. (You’ll have to log in to your @berkeley.edu google account to edit the Google Sheet.)
- Export the article citation in APA format.
- What is the reseach question & theory. Just explain in a few words what the authors are studying.
- Who is the Population for this research question?
- Who was in the sample of this study? Report whatever characteristics were reported in the article. If there were multiple studies in the article, just pick one.
- Is the sample representative of the population? Why / Why Not?
- How might this bias influence the results? Be specific here! Okay not to know, but come up with a few ideas.
- EXTRA BONUS TIME : how did the authors write about their results? Were they careful to make it seem like they are specific to their sample, or did they generalize to all populations (and was this warranted)?