Assignment 3: Air Pollution Interpolation and Clustering
41 Marks
Interpolation: A method of constructing new data points within the range of a discrete set of
known data points.
Clustering: Grouping a set of objects in such a way that objects in the same group (called a
cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
You will work in groups of 4, which you have the option to select on Quercus.
Air quality is a global human health issue and recent estimates from the global burden of disease
study indicate that 7.6% of global deaths can be attributed to particulate matter ambient air
pollution (Cohen et al. 2017). The greatest occurrences of these deaths is in east and south Asian
countries of low and middle incomes; however, in the United States ambient particulate matter
air pollution is estimated as the sixth highest risk factor for death, causing 18.5 deaths per
100,000 people (Cohen et al. 2017). In addition to particulate air pollution, mortality is
associated with gaseous air pollutants that include ground-level ozone and nitrogen dioxide
(Jerrett et al. 2009; Hoek et al. 2013). Health Canada estimates 14,400 deaths annually can be
attributed to anthropogenic air pollution, which includes both acute and chronic mortality
(Health Canada 2017). Mortality is not the only negative human health outcome, chronic and
acute ambient air pollution exposure is associated with negative effects on the respiratory,
cardiovascular, nervous, urinary and digestive systems (Kampa and Castanas 2008). In addition
to human health issues, plant life is very well understood to be negatively affected by air
pollution exposure. The relationship is so strong that plants are reliable biomonitors for sulphur
dioxide, ground-level ozone, and nitrogen dioxide (Cen 2015). While less understood, past
research identifies toxic effects to wildlife from air pollution (Newman and Schreiber 1988).
Research Problem:
You will select five contiguous states in the United States to conduct spatial interpolation and
spatial clustering of annual average nitrogen dioxide (NO2), ozone (O3) and particulate matter
2.5 (PM2.5) air pollution concentrations.
When the groups are determined, each group will be provided a year and one state. This state
must be included in your set of five states and data must be from the provided year.
You are given a template for the final assignment, which is in the format of an academic journal
article. You are to complete this template for submission.
Major Tasks
You will create an interpolated air pollution surfaces for each pollutant in your study area. You
will use both IDW interpolation and Kriging. Remember, when conducting interpolation the key
steps include:
1. Method Selection
2. Initial parameter selection (e.g. k)
3. Fit Variogram (if kriging)
4. Cross-Validate – Use LOOCV
5. Iterate parameters / variogram models
6. Prediction
Spatial Clustering
You will apply spatially constrained clustering to identify regions that contain monitors that are
most similar in air quality using all three pollutants during the clustering.
Tools you can use for spatially-constrained clustering include:
• R
o spdep::skater
o spdep::ClustGeo
▪ Semi-constrained
o spatialcluster::scl_redcap
▪ Installed from Github:
• GeoDa (
o Open-source GUI
The EPA provides many prepared datasets. For this assignment the easiest to work with is the
Annual Summary Data – Concentrations by Monitor:
You can view a map of the stations at this link:
You will need to use the layer selector button to add layers.
Section 5 Points 4 Points 3 Points 2 Points 0 Points
Title & Abstract Clear and descriptive
with no errors or
Clear and descriptive
with one error or
Clear and descriptive
with 2-4 errors or
Partially clear and
descriptive or with >4
errors or omissions.
Not clear or
Introduction Exceptional
introduction that grabs
interest of reader and
states topic.
Proficient introduction
that is interesting and
states topic.
Basic introduction
that states topic but
lacks interest or > 3
Basic introduction
that partially states
topic but lacks interest
or > 5 errors.
Does not introduce
Methods All methods are
clearly defined.
Research could be
reproduced, e.g. R
code commented in
Most methods are
clearly defined.
Many methods are
Some methods are
Unclear what work
was completed.
Results Results provide reader
necessary information
to understand work.
Results provide reader
most information to
understand work.
Results provide reader
some information to
understand work.
Results provide a
broad overview, but
lack clarity and detail.
Results do not support
the article.
Discussion Exceptional
comparisons and
contrasts within and
between other works.
comparisons and
contrasts within and
between other works.
Basic comparisons
and contrasts within
and between other
Few comparisons and
contrasts within and
between other works.
Discussion does not
support the article.
Conclusions Conclusions
summarize research
and are supported by
Conclusions partially
summarize research
and are supported by
Conclusions are not
well supported by
Conclusions are not
supported by the
Formatting No formatting issues:
table headings, figure
captions, etc.
1-2 formatting issues:
table headings, figure
captions, etc.
3-4 formatting issues:
table headings, figure
captions, etc.
4-5 formatting issues:
table headings, figure
captions, etc.
References References are
completed and follow
all structure.
1-3 issues with the
> 3 issues with
Overall Presentation Professional University Level Near University Level Below University
Well Below
University Level