SSCI 574 – Spatial Econometrics Project 2
USC Spatial Sciences Institute © 2021 1
SSCI 574 Project 2 – Explanatory Spatial Data Analysis & Multiple
Linear Regression
Due Date: Friday, 3/12 @11:59 pm Pacific Time
Submit Project 2 as a Word document into the corresponding assignment link on Blackboard
Value 7% of the course grade
Penalty for late delivery: 2 points deduction up to 4 days late; no points will be given over 4 days
late.
The purpose of this project is for you to apply the concepts and skills learned to explore the
datasets and questions that you are interested in spatial economics. As you have done some
preliminary research work in Project 1, you will want to identify the available datasets that are
suitable for the scope and scale of your interested area and ready to dive into some analysis.
In this project, you will identify and import the data of your own interest in spatial economics
into R and conduct explanatory data analysis, exploratory spatial data analysis (i.e. spatial weights
and global spatial autocorrelation) and multiple linear regression.
Read through the entire document first. Next, go through the hands-on R practices that we did
in previous weeks if you have not done so, so you are familiar with the libraries, functions and
their arguments required in R to complete this project.
Learning Objectives
• To identify available spatial datasets for investigating the spatial economic topic area of
your interest
• To explore spatial autocorrelation using global Moran’s I, and Moran scatterplot
• To conduct multiple linear regression including the pre- and after-assessments of the
datasets
• To interpret the outputs of spatial autocorrelation and multiple linear regression
Assignment Description
This project looks to further your topic of interest into some practical exercises in spatial and
statistical analysis in R. To complete that, follow the instructions below:
1. From your chosen spatial economic research topic and variables in Project 1, identify spatial
datasets of the variable(s) for investigation in spatial analysis and import the data into R.
Focus on the main variable that you are interested in learning to start with. Consider the
spatial extent and unit of analysis so the data size is not too large to manage (e.g. the number
of units is greater than 50 units and not more than 500 units). Often your spatial location
data (e.g. county boundaries) and attributes (e.g. employment rates) might need to come
from different sources and be joined together before using. You can do pre-processing in
Excel or ArcGIS or merge the data in R.
SSCI 574 – Spatial Econometrics Project 2
USC Spatial Sciences Institute © 2021 2
For importing shapefiles, use readOGR( ) in the rgdal package. Use ??readOGR to open the
help file in RStudio. If your data is not projected, you will have to retrieve the geographic
coordinates from polygons then use the spTransform method in the rgdal library.
If your non-spatial attribute table contains latitude and longitude, you can use read.csv( ) or
read.table( ) to import the non-spatial data first, then make your data spatial by creating a
Spatial* object (see the Week 4 handout for how to promote the data spatial).
For any remaining questions about data import, I would suggest you to search for online
resources (e.g. https://rdocumentation.org) and post your question/issues on the
Discussion Forum on Blackboard.
2. Explore the distribution of your imported data by conducting explanatory data analysis
(EDA) in R. For running any statistical or spatial analysis, always examine your data first.
Run descriptive statistics (use the R function that provide at least: sample size, minimum,
mean, median, maximum, and standard deviation) and make a scatterplot, a histogram, and a
boxplot for your main variable(s) – doing all EDA here for one variable is sufficient, but
more is fine (e.g. running EDA for both variables that you want to know the association
with). Consider transformation if the data shows non-normal distribution and show its
normality after transformation.
3. Explore the spatial data you imported by conducting ESDA of your main interested
variable(s). You will build spatial weights matrix followed by global Moran’s I and Moran
scatterplot. Whether you run Moran’s I using Monte Carlo approach is your choice. Other
ESDA is also possible, e.g. kernel density estimation, if you have a point dataset, but not
necessary.
4. Execute standard linear regression to investigate the association of the variables in the topic
of interest using lm( ) function. The number of independent variable can vary but make sure
that your final model contains only the explanatory variables that have their partial
coefficients statistically significant. If you decide to keep insignificant explanatory variable(s)
in your OLS regression, make sure you justify your decision in the report.
5. Write a report that include the following items:
• Introduction (0.5pt): A brief description of your interested spatial economic topic, the
variables you selected (including unit of analysis and spatial extent), and the sources
where you find the data (include the organization that you obtained the data and their
URL if available).
• EDA (1.5pt): R code, their resulting table/plots, and a short paragraph describing othe
distribution (i.e. central tendency and dispersion) of the data and if you performed
transformation or not.
• ESDA (2pt): R code, their resulting display, and 1-2 paragraphs describing and
interpreting the results. Here your results should consist of neighbor list object detail,
visualization of your spatial weights objects, Moran’s I results, and Moran scatterplot.
Describe what each of these analysis results tells you about your data.
• Standard linear regression (2pt): R code, the results, and a paragraph that interpret the
results.
SSCI 574 – Spatial Econometrics Project 2
USC Spatial Sciences Institute © 2021 3
• Reflection (1pt): A short paragraph reflect about the experience you had when working
on this project. What do you find easy? What do you find challenging? What questions
do you still have after you complete the project? Any adjustment you might consider,
either on data or operation, to improve your experience?
Deliverables
Submit a project report with the components requested above in a Word document. Include a
cover page that contains at least the information about the class number (SSCI 574), semester
(Spring 2021), project number/title and your name. Save your Project 2 report document as
Project2_[YourLastName].docx and submit it via the appropriate assignment link in Blackboard.
Additional Resources I: Data Hubs
Below is a list of commonly used data hubs for your reference. If you have a hard time to find
the appropriate datasets, you may consider to use the following sources and adopt the datasets
mentioned here to use in your project. USC Visualization Librarian Andy Rutkowski also
mentioned several databases and programs that contain spatial datasets of various spatial scales
that might be suitable for your need (Guest talk on Week 6 02/24/2021).
1. City of Los Angeles GeoHub: https://geohub.lacity.org. Datasets you may consider to
use include, but not limited to, Los Angeles index of displacement pressure, traffic
collision or traffic accidents data.
2. COVID-19 GIS Hub: https://coronavirus-resources.esri.com. If you are interested in
understanding COVID-19 impact of our social and economic aspects of life, you might
find this data hub useful. Additionally, as I want you to make a story map for the final
presentation that combines the analysis and information for all of your projects this
semester, you might also check out how Esri utilizes its ArcGIS Story Map to tell the
story of its work in COVID-19 (https://esri.com/about/newsroom/blog/gis-to-
achieve-equitable-speedy-vaccine-distribution)
3. The U.S. Census’s American Community Survey 5-year Data:
https://census.gov/data/developers/data-sets/acs-5year.html. The Census Bureau not
only offers spatial data (TIGER/Line data), but also include various socio-economic and
demographic factors that are surveyed every year in various census administrative levels
you can download for use.
4. IPUMS: https://ipums.org. As a part of the Institute for Social Research and Data
Innovation at the University of Minnesota, IPUMS provides census and survey data
from the U.S. and around the world. IPUMS integrates the census type data to make it
easy to study and research. For your information, you may also want to check the
‘ABOUT’ tab if you look for the data analysis type of employment in the near future.
Additional Resources II: Creating neighbor object list for a point data
Assume that we want to explore a dataset that contains three columns including latitude, longitude
and the average math score of schools in one district. We can import this data (.csv),
transform/promote it to a spatial object, and assign its datum WGS84.
SSCI 574 – Spatial Econometrics Project 2
USC Spatial Sciences Institute © 2021 4
To create an object that describes the neighbor relationship from point datasets, consider using a
different spatial relationship in constructing spatial weights matrix. The code here shows you how
to apply the k nearest neighbor (knn) method:
The resulting neighbor object is in ‘knn’ class. You’d like to convert knn into a more generic class
of neighbor object nb before converting it using nb2listw( ) from nb to the listw object as the
spatial weights matrix.