Assignment 2: Regression
Regression: Modelling the relationship between a response (or dependent variable) and one or
more explanatory variables (or independent variables). linear regression is a linear approach to
modelling the relationship.
Before completing the assignment, review the example R markdown document from Tutorial 4.
NOTE: Join the spatial data at the beginning, as it causes issues to do it at the end.
Produce an explanatory regression model for the variation in housing costs by census tract in the
City of Hamilton, Ontario, Canada.
Hamilton Census Tract boundaries, which includes the average house price and the unique
You can access the data with the following command and URL:
You will need to obtain 10 potential explanatory variables from the 2016 Census Data, available
from CHASS: http://dc2.chass.utoronto.ca.myaccess.library.utoronto.ca/census/
The assignment submission will be composed of three files.
1. An R script of your code produced during the project, with the .R file extension.
2. A CSV file of the additional input data you utilized in your model (one table).
3. Answers to the questions listed below in a PDF file.
All three files must be submitted online.
• Ensure all procedures from the lab tutorial are replicated in your work.
• Fit and test 10 linear regression models.
o Example model names: model_1, model_2, etc.
o All models should remain in the code.
o Rename your final model: final_model
• The final model must meet all assumptions with the possible exemption:
o Independent errors due to spatial autocorrelation.
▪ Validate the independent errors assumption in your model with spatial
R Script: 10 Marks
The script you submit should be fully reproducible, which means the TA should be able to run
your script without modification. The only allowable modification would be the file path for the
CSV file of your additional input variables. Review the R Script grading scale below.
The general structure of your R script should follow:
1. Data Munging:
a. Reading Data
b. Merging Data
2. Graphical Analysis Pre-Check
3. Data Transformations
4. Correlation Assessment
5. Model Fitting and model assumption assessment (10 models)
a. If one assumption is broken you can continue to the next model.
i. No need to test every assumption in that case
6. Spatial Autocorrelation Assessment
7. Spatial Autoregressive Modelling
R Script Grading:
10 / 10: The code is properly documented with comments and detailed variable names. No issues
are present in the code. A person versed in R should be able to read through the code in one
9 / 10: The code is well documented. A single error, inconsistency, poor variable name or
documentation is present. A reviewer may need to make a single check of previous code to
8 / 10: The code is documented. A couple errors, inconsistencies, poor variable names or
documentation is present. A reviewer may need to make multiple checks of previous code to
7 / 10: The code is documented. A few errors, inconsistencies, poor variable names or
documentation is present. A reviewer needs to make multiple checks of previous code to
interpret but can understand all sections of the code.
6 / 10: The code is partially documented. Errors, inconsistencies, and poor variable names are
present. A reviewer needs to make multiple checks of previous code to interpret and may not
completely understand all sections of the code.
5 / 10: The code is sparsely documented. Many errors, inconsistencies, and poor variable names
present. A reviewer needs to make multiple checks of previous code to interpret and does not
completely understand all sections of the code.
4 or below: Many inconsistences in the code. It would not be able to be reproduced by another
researcher without many questions directed to the original author.
Missing assignment requirements in the code will also reduce your mark.
• Too few linear models in the code (-1 for each missing model)
• Final_model is not renamed (-1)
• Model Assumptions not tested (-1 for each assumption)
• Moran’s I not tested correctly (-2)
• Code will not run when tested (-3)
• Other errors will be penalized as appropriate.
To achieve a mark above 8, it is likely you would re-write your code after you have completed
working through the assignment to ensure clarity.
CSV File: 2 Marks
The CSV file should contain all the variables that you obtained from the Census for testing in
your model. It must contain 10 variables.
Questions (32 Marks)
All figures must include a figure caption.
1. Complete the following table. (1 Mark)
in CSV File Min Max Mean Variable Description
2. Complete the following table. (2 Marks)
in CSV File Reason why you selected the variable.
3. Produce a publication quality histogram of the dependent variable (transformed if you did a
transformation). (3 Marks)
4. Write 50 words on why you did or did not transform your dependent variable based on the
assumptions of the linear regression model. (2 Marks)
5. Describe in 200 words your process of model fitting. Address the selection of variables, how
you decided to remove or add variables, and the way you assessed each assumption. (4 Marks)
6. Complete the following table (2 Marks)
p < 0.05
(Y/N) List Assumption(s) Violated or All assumptions met?
7. For your final linear regression model, produce a figure from the 4 plots generated by
plot(linear_model). (2 Marks)
8. Produce a publication quality figure of residuals vs fitted values for your final linear
regression model. (3 Marks)
9. Calculate Moran’s I for your residuals. Report in 50 words, your values for Moran’s I and how
you interpret these findings. (3 Marks)
10. Write 150 words interpreting your final linear regression model. (4 marks)
11. Would you require a spatial autoregressive model? Explain how you would have chosen the
model to use. (3 marks)
12. Produce a map of a spatial autoregressive model’s residuals. (3 Marks) 学霸联盟