Appendix A — Rcode

Here is a list of the R code we use in this class.

A.1 Creating Variables in R

A.1.1 Numeric Variables in R

Code Description

variable <- c(#, #, #, #, etc.)

tired <- c(1,2,3,4)

variable = an object that you will define in R

<- = “assign”; tells R to save whatever comes on the right to whatever object is on the left.

c = combine : tells R to combine whatever happens in the parentheses

() = parentheses to group related terms

# = what you store in the variable; each item should be separated by a comma and space.

hist(dat$variable) For continuous variables : draws a histogram.

A.1.2 String Variables

variable <- c(“name1”, “name2”, “name1”, etc.)

emotion <- c(“sad”, “happy”, “sad”)

variable = an object that you will define in R

<- = “assign”; tells R to save whatever comes on the right to whatever object is on the left.

c = combine : tells R to combine whatever happens in the parentheses

() = parentheses to group related terms

# = what you store in the variable; each item should be separated by a comma and space.

as.factor(variable)

as.factor(emotion)

as.factor() # converts a string variable into a categorical factor
variable <- as.factor(variable) # “saves” this conversion as the original variable
plot(dat$variable) For categorical variables : draws a barplot. For continuous variables :  illustrates values of the variable (y-axis) as a function of their index (x-axis).

A.2 Importing and Navigating Datasets

R Command What it Does
dat <- read.csv(“path/file.csv”, stringsAsFactors = T) loads the data file into R (or use the “point & click method”); sets string variables to be categorical factor variables.
head(dat) looks at the first 6 rows of the data file
tail(dat) looks at the last 6 rows of the data file
nrow(dat) displays the number of rows (each row = an individual)
ncol(dat) displays the number of columns (each column = a variable)
names(dat) displays the names of the object (column names = names of variables)
dat$variable displays the variable from a dataset
dat$variable[i] displays the individual row [i] from the variable
dat[i, j] displays an individual row [i] and column [j] from the dataset

A.3 Working with One Variable

A.3.1 Visualizing a Variable

R Command What it Does
summary(dat) Reports descriptive statistics for all variables in the dataset.
summary(dat$variable) Reports descriptive statistics for a categorical variable (frequency / number of individuals in each level) or continuous variable (mean, range, etc.)
as.numeric(dat$variable) Makes the variable numeric (for continuous graphs)
as.factor(dat$variable) Makes the variable a categorical factor (for categorical graphs)
dat$variable <- as.factor(dat$variable) Assigns the as.factor output to the original variable. (In other words, this saves your new categorical factor variable by overwriting the old one.)
plot(dat$variable) For categorical variables : draws a barplot. For continuous variables :  illustrates values of the variable (y-axis) as a function of their index (x-axis).
hist(dat$variable) For continuous variables : draws a histogram.
par(mfrow = c(i, j)) Splits your graphics window into i rows and j columns.

A.3.2 Descriptive Statistics for a Variable

R Command What It Does
summary(dat) Reports descriptive statistics for all variables in the dataset.
summary(dat$variable)

Reports descriptive statistics for a continuous variable.

Reports frequency for a categorical variable.

mean(dat$variable, na.rm = T) Reports the mean (average) of a variable; you must include the na.rm = T argument if there is missing data (otherwise R will return NA as the result).
median(dat$variable, na.rm = T) Reports the median (middle point) of a variable.
range(dat$variable, na.rm = T) Reports the lower limit and upper limit of the variable.
sd(dat$variable, na.rm = T) Reports the standard deviation of the variable.

hist(dat$variable)

abline(v =mean(dat$variable))

Draws a line on a plot or histogram at specified values (e.g., this draws a vertical line at the mean of dat$variable. You can replace v with h to draw a horizontal line. We will use abline() later in the semester in a different way.
par(mfrow = c(i, j)) Splits your graphics window into i rows and j columns (replace i and j with numbers)

A.3.3 Removing Outliers from a Variable

R Command What It Does
dat$variable[dat$variable > #] <- NA One method for removing outliers - the code inside the bracket sets a rule to identify possible outliers, and then the <- NA tells R to remove these data. Make sure to change the operator (e.g., > or < or ==) depending on the type of outlier you need to remove.
dat$variable[dat$variable == “baddata”] <- NA The method adapted for excluding a specific string response from a variable.

A.3.4 Defininig a Likert Scale Variable

R Command What It Does
SCALE.df <- data.frame(dat$item1, dat$item2, dat$item3) Organize your items into a dataframe. Make sure that all items are keyed in the same direction. If a variable is negatively-keyed (e.g., a high response on the item means they are LOW on the variable) then you will need to reverse score it by subtracting from range+1 (e.g., a 1-5 scale = 6 - item).
alpha(SCALE.df) From the ‘psych’ package; evaluates the alpha reliability of your scale. The higher the alpha, the more reliable the scale. Alpha increases as the items become more related to each other and as the number of items increases. For scales with 2-3 items, an alpha around .3-.5 is expected; for scales with 8 items an alpha over .8 is considered “good”.
dat$SCALE <- rowMeans(SCALE.df) The final step; averages all the items from your dataframe into a variable.

A.4 Predictive Statisitcs

A.4.1 Linear Models

R Command What It Does
mod <- lm(DV ~ IV, data = dat) defines a linear model, predicting the DV (a variable) from the IV, and saves this model as an object.
plot(DV ~ IV, data = dat) WHEN THE IV IS CONTINUOUS : draws a scatterplot with individual DV scores on the y-axis and IV scores on the x-axis
abline(mod) draws the linear model on the above plot.
summary(mod)$r.squared the ‘easy’ way to calculate R2
plot(mod) ## CAREFUL! WARNING : returns a series of diagnostic plots for the regression analysis; keep hitting [enter] to move to the next plot, or [escape] to get out of the plots. you probably meant ot plot(DV ~ IV, data = dat)
plotmeans(DV ~ IV, data = dat, connect = F) WHEN IV IS CATEGORICAL - REQUIRES THE gplots LIBRARY : draws a plot of a linear model with a categorical IV; the predicted value of each group is a dot, with standard error bars drawn above and below to illustrate sampling error. connect = F removes the line that illustrates the slope.

A.4.2 Inferential Statistics (NHST)

R Command What It Does
summary(mod) returns the null hypothesis significance testing results