Appendix A — Rcode

Here is a list of the R code we use in this class.

A.1 Creating Variables in R

Code Description

Code	Description
`variable <- c(#, #, #, #, etc.)` `tired <- c(1,2,3,4)`	variable = an object that you will define in R <- = “assign”; tells R to save whatever comes on the right to whatever object is on the left. c = combine : tells R to combine whatever happens in the parentheses () = parentheses to group related terms # = what you store in the variable; each item should be separated by a comma and space.
`hist(dat$variable)`	For continuous variables : draws a histogram.

variable <- c(#, #, #, #, etc.)

tired <- c(1,2,3,4)

variable = an object that you will define in R

<- = “assign”; tells R to save whatever comes on the right to whatever object is on the left.

c = combine : tells R to combine whatever happens in the parentheses

() = parentheses to group related terms

# = what you store in the variable; each item should be separated by a comma and space.

hist(dat$variable) For continuous variables : draws a histogram.

`variable <- c(“name1”, “name2”, “name1”, etc.)` `emotion <- c(“sad”, “happy”, “sad”)`	variable = an object that you will define in R <- = “assign”; tells R to save whatever comes on the right to whatever object is on the left. c = combine : tells R to combine whatever happens in the parentheses () = parentheses to group related terms # = what you store in the variable; each item should be separated by a comma and space.
`as.factor(variable)` `as.factor(emotion)`	as.factor() # converts a string variable into a categorical factor
`variable <- as.factor(variable)`	# “saves” this conversion as the original variable
`plot(dat$variable)`	For categorical variables : draws a barplot. For continuous variables : illustrates values of the variable (y-axis) as a function of their index (x-axis).

R Command	What it Does
`dat <- read.csv(“path/file.csv”, stringsAsFactors = T)`	loads the data file into R (or use the “point & click method”); sets string variables to be categorical factor variables.
`head(dat)`	looks at the first 6 rows of the data file
`tail(dat)`	looks at the last 6 rows of the data file
`nrow(dat)`	displays the number of rows (each row = an individual)
`ncol(dat)`	displays the number of columns (each column = a variable)
`names(dat)`	displays the names of the object (column names = names of variables)
`dat$variable`	displays the variable from a dataset
`dat$variable[i]`	displays the individual row [i] from the variable
`dat[i, j]`	displays an individual row [i] and column [j] from the dataset

R Command	What it Does
`summary(dat)`	Reports descriptive statistics for all variables in the dataset.
`summary(dat$variable)`	Reports descriptive statistics for a categorical variable (frequency / number of individuals in each level) or continuous variable (mean, range, etc.)
`as.numeric(dat$variable)`	Makes the variable numeric (for continuous graphs)
`as.factor(dat$variable)`	Makes the variable a categorical factor (for categorical graphs)
`dat$variable <- as.factor(dat$variable)`	Assigns the as.factor output to the original variable. (In other words, this saves your new categorical factor variable by overwriting the old one.)
`plot(dat$variable)`	For categorical variables : draws a barplot. For continuous variables : illustrates values of the variable (y-axis) as a function of their index (x-axis).
`hist(dat$variable)`	For continuous variables : draws a histogram.
`par(mfrow = c(i, j))`	Splits your graphics window into i rows and j columns.

R Command	What It Does
`summary(dat)`	Reports descriptive statistics for all variables in the dataset.
`summary(dat$variable)`	Reports descriptive statistics for a continuous variable. Reports frequency for a categorical variable.
`mean(dat$variable, na.rm = T)`	Reports the mean (average) of a variable; you must include the na.rm = T argument if there is missing data (otherwise R will return NA as the result).
`median(dat$variable, na.rm = T)`	Reports the median (middle point) of a variable.
`range(dat$variable, na.rm = T)`	Reports the lower limit and upper limit of the variable.
`sd(dat$variable, na.rm = T)`	Reports the standard deviation of the variable.
`hist(dat$variable)` `abline(v =mean(dat$variable))`	Draws a line on a plot or histogram at specified values (e.g., this draws a vertical line at the mean of dat$variable. You can replace v with h to draw a horizontal line. We will use abline() later in the semester in a different way.
`par(mfrow = c(i, j))`	Splits your graphics window into i rows and j columns (replace i and j with numbers)

R Command	What It Does
`dat$variable[dat$variable > #] <- NA`	One method for removing outliers - the code inside the bracket sets a rule to identify possible outliers, and then the <- NA tells R to remove these data. Make sure to change the operator (e.g., > or < or ==) depending on the type of outlier you need to remove.
`dat$variable[dat$variable == “baddata”] <- NA`	The method adapted for excluding a specific string response from a variable.

R Command	What It Does
`SCALE.df <- data.frame(dat$item1, dat$item2, dat$item3)`	Organize your items into a dataframe. Make sure that all items are keyed in the same direction. If a variable is negatively-keyed (e.g., a high response on the item means they are LOW on the variable) then you will need to reverse score it by subtracting from range+1 (e.g., a 1-5 scale = 6 - item).
`alpha(SCALE.df)`	From the ‘psych’ package; evaluates the alpha reliability of your scale. The higher the alpha, the more reliable the scale. Alpha increases as the items become more related to each other and as the number of items increases. For scales with 2-3 items, an alpha around .3-.5 is expected; for scales with 8 items an alpha over .8 is considered “good”.
`dat$SCALE <- rowMeans(SCALE.df)`	The final step; averages all the items from your dataframe into a variable.

R Command	What It Does
`mod <- lm(DV ~ IV, data = dat)`	defines a linear model, predicting the DV (a variable) from the IV, and saves this model as an object.
`plot(DV ~ IV, data = dat)`	WHEN THE IV IS CONTINUOUS : draws a scatterplot with individual DV scores on the y-axis and IV scores on the x-axis
`abline(mod)`	draws the linear model on the above plot.
`summary(mod)$r.squared`	the ‘easy’ way to calculate R2
`plot(mod) ## CAREFUL!`	WARNING : returns a series of diagnostic plots for the regression analysis; keep hitting [enter] to move to the next plot, or [escape] to get out of the plots. you probably meant ot plot(DV ~ IV, data = dat)
`plotmeans(DV ~ IV, data = dat, connect = F)`	WHEN IV IS CATEGORICAL - REQUIRES THE gplots LIBRARY : draws a plot of a linear model with a categorical IV; the predicted value of each group is a dot, with standard error bars drawn above and below to illustrate sampling error. connect = F removes the line that illustrates the slope.

R Command	What It Does
summary(mod)	returns the null hypothesis significance testing results