Lecture 2 : Describing Data with Pictures and Numbers

Check-In (Loading Data)

Hi class, please work on the check-in. We will get started soon. Ask for help if you are stuck!

Announcements

Agenda and Goals

This week, we’ll learn how to describe data using R and statistics and human language.

  • REVIEW : Announcements, Check-In, and Week 1 (30 Minutes)

  • PART 1 : For-Loops for Fun and Profit.

  • PART 2 : Describing Data with R, Statistics, and Human Language (50 Minutes)

    • Describing Categorical Data
    • Describing Numeric Data
  • BREAK TIME (10 Minutes) + Article Presentation (30 Minutes)

  • PART 3 : QUARTO

    • Why Quarto?
    • How Quarto?!

Week 1 Recap and Review

R Stuff

  • For-Loops

  • Monty Hall with 100 Doors.

Data Organization and Loading

DISCUSSION : Below are the twelve (12) criteria Broman & Woo (2018) articulated for thinking about data. Review the list. What practices below have you encountered or struggled with in working with data so far? What terms or ideas do you have questions about? Is there anything else that should be added to this list?

  1. Consistency
  2. Good Names
  3. Write Dates as YYYY-MMM-DD
  4. No Empty Cells
  5. Put Just One Thing in a Cell
  6. Make it a rectangle.
  7. Data Dictionary
  8. No calculations in the raw data file.
  9. No font color or highlighting as data.
  10. Make backups
  11. Data validation to avoid errors.
  12. Saving data in plaintext.

PART 2 : Describing Data with R, Statistics, and Human Language

Categorical Data

  • summary() or table(): to count frequencies of categorical data

  • as.factor() and as.numeric() : to translate data

  • levels() or factor() : to relevel or change factor names

  • plot : graphing categorical data

Numeric Data

  • summary() or psych::describe() # this comes from the psych package which you must install : to summarize data

    • mean()

    • median()

    • sd()

    • range()

    • note : if there are missing data (often!) you must manually tell R to remove the missing data : mean(d$variable, na.rm = T)

  • hist() or boxplot() : graphing numeric data

Using Quarto to Make and Share Nice Reports

Quarto is a version of R Markdown, which is a version of Markdown, which is a powerful way to author code that is meant for both humans and computers to read.

  • Advantages :

    • can create a document that works for R code, can create a presentation, or a website
    • much faster to get your code from R to something that humans can read
      • no more copy-paste graphs or output.
      • can update graphs and output as your needs / datset changes
    • lots of features - open-source heritage and culture, but supported financially my Micro$oft.
      • ability to format your code; render it as a website, pdf, book, etc.
      • interactive documents (Shiny; html-live; etc.)
  • Disadvantages :

    • code must be “perfect” in order to correctly render.
    • can go down formatting and feature rabbit holes that are not necessarily condusive to good science.
    • another dialect of the language you are trying to learn
      • in R : code is the default; human comments added with #s
      • in Quarto : human text is the default; you insert a code block when you want R to do something (and can then comment in that code)
    1+1 # like this
    [1] 2
  • In this class, we will work with both .R scripts and .qmd Quarto Markdown Files

    • .R Scripts for tinkering with data (in-class tutorials; initial analyses)
    • .qmd files for “final” products (Lecture notes, lab documents, your project)
  • There are many thorough guides on how to use Quarto, but honestly the official Quarto reference book is almost constantly open on my computer (in multiple tabs….sigh.) But let me know if you find another cool resource.

  • Things to do in Quarto :

    • write in human text

    • insert a code block

    • insert inline code

    • render everything with no pain and drama as a .pdf or .html file and share this with others.

BREAK TIME & ARTICLE PRESENTATION

Using Quarto to Make and Share Nice Reports

Quarto is a version of R Markdown, which is a version of Markdown, which is a powerful way to author code that is meant for both humans and computers to read.

  • Advantages :

    • can create a document that works for R code, can create a presentation, or a website
    • much faster to get your code from R to something that humans can read
      • no more copy-paste graphs or output.
      • can update graphs and output as your needs / datset changes
    • lots of features - open-source heritage and culture, but supported financially my Micro$oft.
      • ability to format your code; render it as a website, pdf, book, etc.
      • interactive documents (Shiny; html-live; etc.)
  • Disadvantages :

    • code must be “perfect” in order to correctly render.
    • can go down formatting and feature rabbit holes that are not necessarily condusive to good science.
    • another dialect of the language you are trying to learn
      • in R : code is the default; human comments added with #s
      • in Quarto : human text is the default; you insert a code block when you want R to do something (and can then comment in that code)
    1+1 # like this
    [1] 2
  • In this class, we will work with both .R scripts and .qmd Quarto Markdown Files

    • .R Scripts for tinkering with data (in-class tutorials; initial analyses)
    • .qmd files for “final” products (Lecture notes, lab documents, your project)
  • There are many thorough guides on how to use Quarto, but honestly the official Quarto reference book is almost constantly open on my computer (in multiple tabs….sigh.) But let me know if you find another cool resource!

  • Things to do in Quarto :

    • write in human text

    • insert a code block

    • insert inline code

    • render everything with no pain and drama as a .pdf or .html file and share this with others.

EXIT SURVEY HERE