Version date: 6 December 2020
ET215: R Tutorial 2
A collection of useful commands
General guidelines for this tutorial:
● In the interest of brevity, the focus here is on providing relevant examples of R syntax, rather than giving a
full explanation of possible applications. Further discussion may be found in lecture notes, Gries (2013),
Johnson (2008), Baayen (2008), Winter (2020), or Field et al. (2012).
● The examples in this handout are intended to illustrate R syntax, and thus they are generally not full working
demonstrations (i.e., there are no real datasets to call).
● Note that in many examples, I use the assignment operator (<-) to save the output of a command, with an
arbitrary name. This step is useful so that you can easily refer back to tables—and view them without printing
hundreds of rows onscreen. (For instance, if you first name your table something like myTable, you can type
head(myTable) or View(myTable) to inspect the contents.
●The document organization is as follows: Part I: General commands (involving data organization, etc.). Part
II: Statistical tests. Part III: Plots and visualizations.
Part I: General R commands.
I.A. LOADING PACKAGES, AND OTHER INITIALIZATIONS.
Often, R scripts will start by loading (and possibly installing) packages that will be needed in the script.
If you need to use a package which you have not previously installed, you first need the command
install.packages. For instance:
install.packages("RCurl")
As a general rule, you do not need to re-install packages you have previously used. (Re-installing doesn’t hurt
anything, but it can take time.) However, every time you close down R, packages you have loaded will close.
Thus for each new R session you begin, you need to re-load packages, as follows:
library("effects")
Another useful initialization for an R session is to indicate if you want to avoid exponential notation for very
small (or very large) numbers. Turning off exponential notation will report numbers in formats like 0.0000061,
rather than 6.1e-06. To turn off exponential notation, type this:
options(scipen = 999)
If at any point you want scientific notation back on, type this:
options(scipen = 0)
I.B. DATA TYPES
R variables can be of several types: numerical (decimal-valued), integer-valued, logical (true/false), character,
or factor. A factor is a variable (either independent or dependent) which has categorical levels.
Version date: 6 December 2020
If you want to see how R currently classifies the data types of a dataframe, type this:
str(myData)
This will list the data types of all columns. If you only need to know about one column, specify the column:
str(myData$participant).
You can change the data type using commands like as.character, as.numeric, as.factor. For
instance, suppose that the identifiers for your participants look like numbers. These labels are meaningless,
and we don’t want R to consider them on a numeric scale. So you would do the following to turn this column
into a factor:
myData$participant <- as.factor(myData$participant).
I.C. FILE MANAGEMENT (READ/WRITE DATA).
Often it’s useful to choose a working directory, so that all files can be accessed in the same location. Find out
your current working directory by typing getwd(). You can change this to a different folder with setwd():
setwd("C:/Users/Me/Desktop/QuanProject1/) #PC locations might look like this
setwd("/Users/Me/Desktop/QuanProject1") #Mac locations might look like this
You can also set the working directory from a dropdown menu in RStudio. Look under Misc > Change Working
Directory, or Session > Set Working Directory.
If you’ve set a working directory, and the folder contains data in a file called MyData.txt, load it like this:
myDat <- read.csv('MyData.txt')
If you prefer to browse for the file instead, do this:
myDat<-read.csv(file.choose())
If you have a data table separated by something other than commas (for instance, tab characters), you can
load like this:
myDat<-read.delim("myDat.txt", sep="\t", header = T)
The header = T command says that the first line of the table provides the column names. If you don’t
include it, R thinks that the first row is just more data.
If you want to save a table or dataframe to your working directory, use write.csv().Refer to the R object
you want to save, and give it a filename. For instance, if you have a dataframe called myDat, save it like this:
write.csv(myDat, "myDat.txt", row.names = FALSE).
It is not essential to include the row.names = FALSE command, but it is helpful for avoid the clutter of
R’s numerical row names.
You can load the file myDat.txt in R at a later time, or open it in a spreadsheet application like Excel.
For information on saving plots/image files, see the header of Part III.
Version date: 6 December 2020
I.D. ORDER/SORT A DATAFRAME.
If you want to order a dataframe by participant ID, do the following:
myDat <- myDat[order(myDat$participant), ]
This code tells R how to order the rows of the dataframe. The syntax here is not immediately intuitive, but it’s
based on a [row, column] referencing system (see Tutorial 1). Note the comma (followed by nothing) inside
the square brackets. This says that we aren’t interested in the columns.
You can also sort by two (or even more) columns. Suppose we have a column called trialNumber, which
indicates the order in which items were presented. If you want to organize the data so that the trials are
sorted for each participant, type this:
myDat <- myDat[order(myDat$participant, myDat$trialNumber), ]
This code sorts by participant, then by trialNumber within each participant group. That is, in the revised table,
we see all the data for participant 1 (listed by trial 1, 2, 3…), then all the data for participant 2 (listed by trial 1,
2, 3…), and so on.
I.E. SUBSET.
Use subset to select only certain rows from an existing dataframe. If the existing dataframe is called
oldDataframe, and we want to pull out a subset where someCondition is true, then the relevant syntax
is subset(oldDataframe, someCondition). For example, if exampDF contains a column called
phase, and we want to pull out only the rows where the phase is “pretest”, then the following command
will work:
pretestOnly<-subset(exampDF, phase == "pretest")
If you have a column which contains year of birth, and you want to focus only on participants born after 1989,
do this:
youngerParticipants<-subset(exampDF, yearOfBirth > 1989)
You can also state subset conditions based on something that’s not the case. “Not” is indicated with the
exclamation point (!). If you want to discard a participant with the label “s6108”, do the following:
elim6108<-subset(exampDF, participant != "s6108")
Note that after you subset data, you often want to tidy things up by dropping factors that are no longer
included in the dataset. (Failing to take this step can result in puzzling reports from table(), which insists
on telling you about lots of zero counts.) Tidy your data like this:
pretestOnly<-droplevels(pretestOnly)
I.F. TABLE:
table() tells us how many rows in a dataframe have a certain value. Thus, if you have a “gender” column,
you can find out the gender breakdown in your dataset like this:
table(myDataframe$gender)
Note that table() should be interpreted carefully; if you ask for a gender breakdown of a dataset that
contains many entries for each participant, the resulting table will count the gender of each participant
Version date: 6 December 2020
repeatedly! Assuming you have a column with a unique “participant” label for each participant, you can
doublecheck your data’s structure like this:
table(myDataframe$participant)
If the resulting table values are something higher than 1, you will need to aggregate data before examining
participant demographics. (See Tutorial 1.)
You can also use table to cross-tabulate variables against one another. In your data, if you want to know
how many men vs. how many women are assigned to each condition in an experiment, you run code like
this:
genderTable <- table(myDataframe$condition, myDataframe$gender)
A cross-tabulated table such as genderTable would be appropriate for analysis in a chi-squared test. But
again, be careful that you are not counting multiple instances of the same grouping of values.
________________________________________________________________________
Part II: Statistical tests.
II.A. T-TEST.
Use a t-test to determine if there’s a difference in numerical values between two categories. A t-test can only
be used if the outcome variable is numerical, and the predictor variable is categorical (with exactly two levels).
Suppose we run two versions of an experiment, and we want to know if the demographics of participants are
comparable, or different. As one check, we test whether participants in Experiment 1 are the same as the ages
of participants in Experiment 2, or whether there is a statistically significant difference.
Here, the independent variable is categorical with two levels (Experiment 1 or 2), and the outcome variable is
numerical. Thus, we can use a t-test to check whether the ages are the same or different across experiments.
The data must be structured such that each participant is represented once in the table. Assume we have this
data in a table called experimentDat, with three columns: participantID, participant age
(numerical), and the experiment (categorical variable: Experiment 1 or Experiment 2).
A t-test is then performed as follows:
t.test(experimentDat$age ~ experimentDat$experiment)
The null hypothesis is that there is no difference in age between Experiment 1 and Experiment 2. The t-test
will provide a p-value; if p > .05, we do not reject the null hypothesis, and we conclude that participant ages
are the same across the two experiments. If p < .05, we conclude that there is a statistically significant
difference between the two categories. Comparing the means by category will indicate which direction the
difference is. (i.e., are participants in Experiment 1 older than those in Experiment 2, or is it the other way
around?)
● This example assumes an independent t-test. If there are meaningful comparisons of particular datapoints
across the two categories, you’ll want to do a paired t-test.
● An additional requirement of t-tests is that numerical values should be normally distributed in each of the
two predictor categories. You can check for normality via visualizations (see below) or with a Shapiro-Wilk test
Version date: 6 December 2020
(shapiro.test()). If your dataset is not normal, use a nonparametric test:
wilcox.test(experimentDat$age ~ experimentDat$experiment)
●To visualize datasets of this type: use a boxplot, or a density plot which splits distributions by category. See
PartIII.C (Continuous Outcome, Categorical Predictor).
II.B. CHI-SQUARED TEST.
Use a chi-squared test with cross-tabulated frequency data, which relates two CATEGORICAL variables. Each
categorical variable will have two or more levels; for simplicity, here we look at variables with exactly two
levels. Suppose you have data (called myDat) in which every row records the survey results for one
participant. Suppose one column indicates whether the participant is a Justin Bieber fan, and another column
indicates whether the participant is a vegan, like this:
participant bieberStatus diet
p1 fan nonvegan
p2 nonfan nonvegan
p3 fan vegan
. . .
Note that for both variables, the data is categorical – it has all-or-nothing values, and does not tell us, for
instance, how much of a Bieber fan the participant is. Each variable can be coded as a logical (TRUE/FALSE)
variable, or as a factor with multiple levels (fan/nonfan).
To do a chi-squared test, first make a contingency table, which will list the frequencies of participants in each
possible ‘bin’.
bieberTable<-table(myDat$bieberStatus, myDat$diet)
That table will look something like this (these numbers are completely random and made-up!):
fan nonfan
vegan 23 46
nonvegan 44 93
Then run a chi-squared test on the contingency table:
chisq.test(bieberTable)
This approach lets us check whether there is some relationship between being a Justin Bieber fan, and being a
vegan. The null hypothesis is that frequencies are randomly distributed in the contingency table (meaning that
there’s no relationship between categories). But if the p-value is less than .05, that supports an alternate
hypothesis, and there is a significant deviation from a random distribution. Use data visualizations to help you
interpret the result (see barplots, assocplots in Part III).
● Be careful to check that each participant (or each item of interest) occurs only once in the dataset you use
to build the contingency table. See TABLE (Part I).
● Be sure that you only perform chi-squared tests on frequencies, not percentages or ratios.
● Check expected values for all the bins in your contingency table:
chisq.test(bieberTable)$expected. If all expected values are greater than 5, a chi-squared test is
acceptable. If at least one expected value is less than 5, you should use a Fisher Exact test:
(fisher.test(bieberTable)).
Version date: 6 December 2020
II.C. LINEAR REGRESSION.
Use linear regression to analyze datasets with a numerical outcome variable.
Predictor variables may be continuous, may be categorical, or any combination of these. You can also check
for variable interactions: testing for whether certain variables are more or less influential depending on the
values of other variables.
In the examples below, DV is the outcome variable, and IV1 and IV2 are predictor variables.
1. Use lm()for standard (fixed-effect) linear regression. Several examples are below.
onePredictorLM<-lm(DV ~ IV1, data=yourData)
summary(onePredictorLM) #View model results
twoPredictorLM<-lm(DV ~ IV1 + IV2, data=yourData)
summary(twoPredictorLM) #View model results
interactionLM<-lm(DV ~ IV1 * IV2, data=yourData)
summary(interactionLM) #View model results
For each predictor, the model summary provides you with an “Estimate” – the slope of the line that relates a
predictor variable to the outcome variable. The null hypothesis (for each predictor) is that the slope is zero.
For lm(), we reject the null hypothesis when a p-value is < .05; this provides evidence that a slope is
significantly different from zero. For a significant predictor, if the “estimate” column is positive, this means it
increases the outcome variable, and if it’s negative, this means it decreases the outcome variable1.
An assumption of lm() models is that data observations are independent from one another. Strictly speaking,
this means that if you have repeated measures from the same participants, or on the same items, you should
use mixed-effects models (lmer).
2. Mixed-effects models. Use lmer if you have repeated measures (on subjects or items) in your dataset. An
example is below, which assumes that we have repeated observations on both participants and items.
library(lme4) #load the library.
library(lmerTest) #You need this package loaded to see lmer p-values.
linearMixedModel<-lmer(DV ~ IV1 + IV2 + (1|item) + (1|participant),
data=yourData)
summary(linearMixedModel) #View model results
#Try adding this code if your model will not converge:
linearMixed2<-lmer(DV ~ IV1 + IV2 + (1|item) + (1|participant),
control=lmerControl(optimizer="bobyqa"), data=yourData)
For both lm and lmer models:
●Visualizing linear regression results: use an effect plot (See Part III.E).
● Linear regression assumptions/diagnostics: see the box below II.D.
1 However, if a model includes any interactions, the direction of an effect is not easy to read directly from summary tables. (See
Gelman and Hill, Data analysis using regression, 2007, p. 55). When there are interactions, plotting a model is especially important in
order to understand what the model means.
Version date: 6 December 2020
II.D. LOGISTIC REGRESSION.
Use logistic regression to analyze datasets with a categorical dependent variable which has two possible
outcomes. As with linear regression, predictor variables may be numerical, may be categorical, or any
combination of these; variable interactions are allowed.
In the examples below, DV is the (binary) outcome variable, and IV1 and IV2 are predictor variables.
1. Use glm()for standard (fixed-effect) linear regression. An example is below. Note that you must specify
family = "binomial" to indicate that the dependent variable is categorical.
interactionGLM<-glm(DV ~ IV1 * IV2, family = "binomial", data = yourDat)
summary(interactionGLM) #View model results
Note that a linear regression model predicts the value of the numerical dependent variable; in contrast, a
logistic regression model tells us about the probability of one outcome vs. another, under different conditions.
More precisely, the model predicts the log-odds of different variables. (The probability, however, can be
obtained with appropriate mathematical transformations.). The null hypothesis is that a variable slope is zero;
this means that a variable produces no change in the log-odds of the outcome. If p < .05, that leads us to
believe that the predictor has a nonzero influence on the probability of the outcome.
2. Mixed-effects models. For logistic regression, use glmer() if you have repeated measures (on subjects or
items) in your dataset. The same syntax is used in glmer (logistic mixed-effects regression) as lmer (linear
mixed-effects regression), but remember that (as with glm) you must specify family = "binomial" for
logistic regression.
logisticMixedModel<-glmer(DV ~ IV1 + IV2 + (1|item) + (1|participant),
family = "binomial", data=yourData)
summary(logisticMixedModel) #View model results
#Try adding the following code if you get convergence errors:
logisticMixed2<-glmer(DV ~ IV1 + IV2 + (1|item) + (1|participant), family
= "binomial", control=glmerControl(optimizer="bobyqa"), data=yourData)
As with other regression models, if p < .05 for a predictor, that leads us to believe that the predictor is
statistically significant. (For logistic regression, this means it has a nonzero influence on the probability of the
outcome.)
For both glm and glmer models:
● Visualizing logistic regression results: use an effect plot (See Part III.E).
● Logistic regression assumptions/diagnostics: see the box immediately below.
REGRESSION MODEL ASSUMPTIONS
As noted above, it is important to make sure you select the right kind of regression:
● Linear regression: DV is numerical.
●Logistic regression: DV is binary, that is, categorical with two possible outcomes. These outcomes could be
anything, as long as they are coded in a two-valued, all-or nothing fashion: TRUE vs. FALSE, HAPPY vs.
UNHAPPY, etc.
Various diagnostics are available for regression models, but for our purposes, there are three essential
assumptions to focus on.
Version date: 6 December 2020
1. Non-collinearity: Your model predictors should not be highly correlated with one another. Your model
results are unstable if you include the same predictor more than once (or include two very similar
predictors). Thus, it would be risky to have a model which includes word frequency coded two different
ways (for instance, coded categorically as HIGH/LOW, and also as a number). Make sure you know what
predictors are included in your model, and see if it makes sense to include all of them at the same time.
2. Independence of datapoints: Each datapoint should be independent of the others, unless you are using a
mixed-effects model. If you are analyzing the gross national product of different nations over time, the
datapoints are not independent: Bulgaria’s economy in 2016 is dependent upon Bulgaria’s economy in
2015. Similarly, any time you have multiple measurements from the same person at different times, or
multiple measurements on the same item, the repeated measurements are interlinked and interdependent.
If there are repeated measures, you should use a mixed-effects model.
3. Linearity (for linear regression only): In linear regression, the relationship between numerical predictors
and the DV should be linear. You do not need to worry about linear relationships for categorical
predictors. The underlying mathematics of linear regression models will, as a matter of course, use a linear
representation to link datapoints in different categories.
When checking linearity of numerical predictors, the simplest approach is to create a scatterplot of raw
data, to examine the relationship between the predictor and outcome (e.g., plot(x, y)). Look for
unexpected curves which clearly deviate from a straight line.
Alternatively, once a model has been run, you can plot fitted values vs. residuals:
plot(fitted(myModel),residuals(myModel)) (See Tutorial 1 by Winter, 2016.) You’re hoping
to see residual plots that look like random clouds, as opposed to curves or patterned clusters However, this
approach is really only useful if all of the predictors in your model are numerical, because categorical
predictors may cause the residuals to group into bands.
__________________________________________________________________________
Part III. Plots and visualizations.
NB: For all plots, to save to an image file, you need a three-step process. First use an image command such
as jpeg(), then build the plot, and finally close the plotting device. For example:
jpeg("scatterplot.jpg") #Will save the image to the specified file.
plot(iris$Sepal.Length, iris$Petal.Length) #Build plot.
dev.off() #Closes things down and builds the file.
If your plotting malfunctions somehow, you may want to type dev.off() again to clear things out, so you
can start over.
Above I chose jpeg( ) for its simplicity. Other options are possible (see ?jpeg).
Image files will save to your current working directory for R. (See File Management in Part I.)
IIIA. CATEGORICAL-CATEGORICAL FREQUENCY RELATIONSHIPS
The following plot types allow you to visualize frequency relationships between two categorical variables. Such
situations arise when a chi-squared test is appropriate, as, for instance, seeing whether men or women are
more likely to be smokers, or whether Justin Bieber fans are more likely to be vegans. (See Chi-squared Tests
in Part II).
Version date: 6 December 2020
Three possible approaches are summarized below.
1. Barplot in ggplot2.
Assume you have a dataframe called myDat, in which a column called bieberFan indicates whether the
respondent is a Justin Bieber fan, and vegan indicates whether or not the respondent is a vegan. (See Chi-
squared tests above.) In this dataset, each person responding to the survey appears on one (and only one) row
of the table. To visualize the relationship between variables, you can build a nice barplot in ggplot2 as follows:
library(ggplot2)
ggplot(data=myDat,aes(x=bieberFan,fill=diet)) + geom_bar()
For this particular approach, you can also exchange the two variables, i.e., the following alternative also
works:
ggplot(data=myDat,aes(x=diet,fill=bieberFan)) + geom_bar()
The variable which follows “x=” will appear on the x-axis, and the bars will be broken down according to the
variable specified by “fill=”. In this particular case, the choice of assignment for the two variables will
depend on which visualization is more intuitive to you.
2. assocplot.
In contrast with the barplot in (1), an assocplot requires that you first build a continency table with the data.
Assume that we have this contingency table, called bieberTable. Then to create an assocplot, simply type
assocplot(bieberTable)
An assocplot helps us visualize very clearly which category combinations are over-represented or under-
represented in the data. Note that the plot alone does not tell us whether the distribution is significantly
different from a random one; for this, you need to include the results of a chi-squared test.
For more information, also see Chapter 4 of Gries (2013, p. 187 ff.).
IIIB. CONTINUOUS-CONTINUOUS RELATIONSHIPS.
A scatterplot provides a visualization of the relationship between a continuous variable on the horizontal axis
(x-axis), and a continuous variable on the vertical axis (y-axis).
A basic scatterplot is specified as follows:
plot(iris$Sepal.Length, iris$Petal.Length)
Here, Sepal.Length and Petal.Length are numerical columns in a dataframe. The first argument
(Sepal.Length) is the horizontal axis, and the second argument (Petal.Length) is the vertical axis.
You can specify titles for the x-axis and y-axis, as well as a main title, as follows:
plot(iris$Sepal.Length, iris$Petal.Length, xlab = "Sepal Length", ylab
="Petal.Length", main = "AWESOME TITLE ABOUT FLOWERS")
Version date: 6 December 2020
For additional enhancements (specifying colours, adding text to a plot, and 3D versions), see the
“Scatterplots” document under the R Resource Collection on Moodle.
Note: This approach is used to plot specific (x, y) values from a dataset. This is a way to check whether a
relationship between variables is linear.
The predictions of a linear regression model can also involve continuous variables plotted vs. continuous
variables. (These will be discussed later under Effect Plots.)
IIIC. CONTINUOUS OUTCOME, CATEGORICAL PREDICTOR.
The following two approaches provide good data visualizations when your outcome variable is numerical, and
your predictor variable is categorical. Such visualizations include datasets where there are exactly two levels to
the categorical predictor (i.e., cases where a t-test might be used to test hypotheses). Here, I focus on the
example used in Part II.D, where experimentDat is the dataset, the condition column represents a
(categorical) independent variable, and age represents a numerical dependent variable.
1.Boxplot. Assuming the designations listed above, a boxplot is made as follows:
boxplot(experimentDat$age ~ experimentDat$condition)
A boxplot visually presents some basic information about distributions by category. For instance, the
horizontal line in each box shows the median for that category. For more on boxplots and their interpretation,
see Chapter 3 of Gries (2013, p. 126-127).
Also note that boxplots can be useful if you want to visualise outcomes for multiple, intersecting categorical
predictors. Suppose you have an experiment with RT as the numerical outcome, which you ran with two
conditions, in two countries (with both conditions being represented in the two countries). If you want to
visualise patterns for both country and condition, do the following to get four boxplots (for the 2 X 2
comparison):
boxplot(experimentDat$RT ~ experimentDat$condition+ experimentDat$country)
#or, to see a different ordering of plots:
boxplot(experimentDat$RT ~ experimentDat$country+ experimentDat$condition)
2. A density plot provides more detail about a numerical distribution (by category). For instance, it is easier to
see from a density plot if one of the categories has a bimodal or skewed distribution.
library(ggplot2)
ggplot(experimentDat, aes(x= age, fill= experiment)) + geom_density(alpha
= 0.3)
The “x=” argument indicates which variable is the (numerical) dependent variable. The “fill=” indicates
which variable is the (categorical) independent variable. The “geom_density” argument tells ggplot2 that we
want a density plot; the “alpha = 0.3” argument is optional, but indicates how translucent the different
parts of the plot will be. A value of 0.3 tends to produce nice-looking plots.
For a simpler density plot, see section IIID.
Version date: 6 December 2020
IIID. HISTOGRAMS AND DENSITY PLOTS.
Histograms and density plots are used to visualize a distribution of numerical outcomes. Another way to
describe this is that such plots show how likely different numerical outcomes are. (See Week 2 in lecture
materials.) These visualizations offer one way to determine if a distribution is normal, or whether it follows
some other pattern (for instance, bimodal).
Suppose we have a dataframe called moduleDat, with a column of exam scores called exam. In visualizing
this data, the range of possible outcomes can be broken up in various ways. If we divide the numerical scale
into discrete 'bins', the visualization is a histogram. If we want bins to be smoothed together, the visualization
is a density plot.
1. Histogram.
A basic histogram is made as follows:
hist(moduleDat$exam)
Optionally, you can specify the number of breaks (number of bins) to use in your histogram. The default
algorithm for hist simply looks at the overall range of the data (see ?hist). Here’s an example of one
alternative:
hist(moduleDat$exam, breaks="FD")
The “FD” approach uses a particular algorithm (Freedman-Diaconis) that looks in more detail at the
distribution of values before calculating the number of bins. For additional discussion of histograms and
breaks, see Chapter 3 of Gries (2013, p. 112-113).
2. Density plot.
Make a basic density plot like this:
plot(density(moduleDat$exam))
For a more complex type of density plot (with separation of data by category), see the ggplot2 example in Part
III.C.
IIIE. EFFECT PLOTS FROM REGRESSION MODELS.
A plot of a regression effect lets you visualize a relationship between a predictor variable and an outcome
variable. In most cases, you would only provide a plot of an effect if it it represents a statistically significant
predictor in your regression model. The effects package allows you to plot relationships between variables
from a range of regression models – including linear regression, or logistic regression, with or without random
effects.
Suppose you have run a regression model (See Part II.C) called twoPredictorLM, and you want to plot the
predictor called IV1. First you need to load the effects library:
library(effects)
Then type the following:
plot(effect(term="IV1", mod = twoPredictorLM))
Version date: 6 December 2020
This will generate a plot which has IV1 on the horizontal axis, and the dependent variable on the vertical axis.
This syntax will work, regardless of whether the predictor variable is categorical or continuous. (The plots, of
course, will look very different depending on the types of variables represented.)
If you want to plot all predictors from a model (including any variable interactions you have included), use
allEffects. If the model is interactionLM, the command looks like this:
plot(allEffects(interactionLM), multiline = T)
The command “multiline = T” is not strictly required. However, when variable interactions are included,
this command makes the plots easier to understand because multiple variables are plotted together.
NB: the effects package is very flexible, but it has some idiosyncrasies. One thing to watch out for: In order to
make a plot, R must be convinced that each predictor is either a factor, or a numerical predictor. (In R
terminology, “factor” simply means that a variable is a categorical predictor.) You will thus encounter an error
if one of your predictors is classified as “logical” (TRUE or FALSE). If this happens, just tell R how to interpret
the variable. If the relevant “logical” column is yourData$trueFalseVariable, type this:
yourData$trueFalseVariable <- as.factor(yourData$trueFalseVariable).
Then re-rerun your regression model, and try plotting the effect again.
学霸联盟