R和Python代写-GEOG0125|学霸联盟

R和Python代写-GEOG0125

时间：2022-03-25

GEOG0125 Advanced Topics in Social and
Geographic Data Science (2021-2022) Coursework
Anwar Musah1,* and Stephen Law1,**
1Department of Geography, University College London, London,UK
*a.musah@ucl.ac.uk
**stephen.law@ucl.ac.uk
1 Coursework
The coursework for GEOG0125 consists of two separate tasks. The first task concerns the use of Bayesian models and the
second task concerns the use of a machine learning model.
2 Spatial Bayesian modelling task
For this part of the coursework, we would like you to select an outcome that follows a Poisson distribution of your choice.
This can be from any scientific discipline of your choosing (e.g., public health, quantitative criminology, disaster reduction,
social sciences etc.,) on which you can perform geospatial analysis of aggregate data within a Bayesian framework. The
aim of this task is to introduce an interesting research problem and apply spatial & spatiotemporal Bayesian models for the
mapping and quantification of area-level risk of an outcome and create an interactive dashboard using the ”Shiny” package in
RStudio.
The final deliverable for this task is an extended abstract of 1,500 words (excluding references) and a RShiny Dashboard. Your
extended abstract should contain the following sections:
2.1 Overview
2.1.1 Background
In this section, introduce your research problem and importance of the selected outcome and then justify why dashboards are an
ideal surveillance tool for monitoring your the chosen outcome.
2.1.2 Data and Methods
For this section you should include a description of the data and selected study area. You are required to use the Spatial
Conditional Auto-regressive Model (CAR) for this exercise as well as provide the model formulation and statistical description
of each model parameter.
2.1.3 Results and Discussion
In this section, you must reports the key findings from the spatial CAR model. It is important that your the interpretation of the
results should touch on the following key points:
• The overall risks of associated of the outcome.
• Descriptive interpretation of the geographical patterns of risk with regards to the selected outcome, and whether these
risk are statistically significance
• Interpretation of the exceedance probabilities
The discussion should relate back to the modelled outputs visualised from the dashboard. You should discuss how the dashboard
can be implemented as a tool to help inform some intervention or support some policy decision making in the context of your
selected problem.
2.1.4 Reference
The information in provided in the background, methods and discussion sections must be supported with the relevant
references.
1
2.2 RShiny Dashboard
When creating the RShiny dashboard, it must contain the following visual map outputs:
• Geographical distribution of the outcome using any measure of frequency (e.g., expressed as rate per capita (i.e., per
100,000 or per km2 etc.,)
• Area-specific risk estimates [i.e., relative risk (RR)]
• Significant regions determined by the 95% credibility intervals
• Exceedance probabilities
2.3 Data sources
You are free to use data you come across outside of the provided list below. You are welcome to use any data previously
implemented for the GEOG0114 Spatial Analysis Project. However, you will need to make sure that the selected outcome
follows the appropriate distribution suitable for the spatial CAR model.
Some examples of appropriate:
1. London Data Store (https://data.london.gov.uk)
2. Consumer Data Research Centre (http://data.cdrc.ac.uk/)
3. UK Metropolitan Police (https://data.police.uk)
4. Office of National Statistics for all UK population data (https://tinyurl.com/5h8hes72)
For this coursework, you are not allowed to use the road accident data from week 8’s practical. We strongly caution against
replicating any examples from online tutorials, or any from the books recommended in lecture 8.
To follow through with our key tenets for GEOG0125, it is a requirement that all analyses and generation of dashboard are
carried out in RStudio.
Some key advice when analysing data for your study population, if you are not able to obtain population counts that is
dis-aggregated by age and sex for the expected number for your outcome. Then, you can specify “n.strata = 1” in the r-code
when computing the expected number. For example:
expectNumber <− expected(population = d$population, cases = d$outcome, n.strata = 1)
Note that it is not obligatory to use an entire country as a study area. You can use a sub-region from that country which is
delineated appropriately for the generation of the adjacency matrix.
2.4 Submission format
The final extended abstract should be submitted in PDF format, font size 11 or 12 points. The report should have a maximum
length of 1,500 words. The total word count includes the background, data and methods, and results and discussion. The word
count excludes the references at the end. Please note that your interpretation of results should be supported by the outputs from
the RShiny dashboard. You may use figures - the maximum allowed is 8 in total (sub-figures are allowed).
An example structure of the abstract would be:
1. Background (300 words)
2. Data and Methods (600 words)
3. Results and Discussion (600 words)
4. References
All R-scripts (i.e., .Rmd or .R) used for the analyses and generation of RShiny app in RStudio must be submitted separately as
a single ZIP file. For reproducibility, the data set behind the RShiny application must be submitted. PLEASE MAKE SURE
THE SUBMITTED CODE FOR THE APP & DATA ARE FULLY FUNCTIONAL. Any large datasets (e.g., .SHP and
.CSV) that you require for these tasks should be uploaded separately with a link provided within the codebook. We recommend
uploading your dataset to OneDrive, create a share link, and providing this link in your notebook.
2/5
3 Machine learning task
For the second part of this coursework, we would like you to identify an outdoor scenes (urban/natural) image dataset on which
you can apply a machine learning research problem. Examples of such a problem are using crowd-sourced imagery to classify
urban scenes1 (eg. with or without shops/greenery) or using satellite imagery to predict population density/wealth of an area2
(eg. LSOA socio-economic data). The aim of this task is to be able to identify a research problem, describe its related works,
setup a research pipeline, construct a machine learning model, and to report and discuss the implication of the results.
The final deliverable for this task is an extended abstract of 1,500 words (excluding references). This may seem like a lot, but it
really is not when you need to properly describe all the necessary parts of your research. Your extended abstract should contain
the following:
3.1 Overview
3.1.1 Background and Related works
In this section, describe the research problem you want to study. It is important to describe why this research problem is
important, what dataset you will be using to study this problem, and which particular machine learning approach you plan
on using. Important here is the justification: why are the data and methods that you have chosen appropriate to study the
problem. As part of the justification, you should describe and include some related works that try to address a similar research
problem.
3.1.2 Dataset
Please describe the dataset you will be using, the source of the dataset, and how you collected the data and the details on how
you prepared the dataset. For image datasets this would involve, for instance, data cleaning, removal of invalid data, data quality
checking, data transformation and exploring the data (visualising).
3.1.3 Methodology and Research Pipeline
In this section, describe the method of your research. Here, you describe the detail the machine learning model you have chosen
and the hyper-parameters of the model. Please also describe and draw a research pipeline describing the tasks you will be
conducting. Its important to also describe the details of the experiments you will be running and why you made these specific
decisions. However, you need to write this at a high-level so that it does not becomes a process report. We would strongly
recommend having a look at how researchers have done this in academic papers that involved similar methods.
3.1.4 Results
In this section, please report the results (train/testset) of the machine learning model for the image regression/classification task
you have identified. Please also interpret the results of the model.
It is important to consider the following questions:
• How is the model’s performance?
• Is your model overfitting?
• Where are the errors coming from?
• What are the implications of the research?
• What are the limitations of the research?
• What are some potential steps for the future?
3.1.5 Conclusion
The final section is to briefly conclude your report by answering your research question / explaining how your results relate to
your research aim. Make sure that the conclusion links nicely to the research problem you have introduced in the introduction.
You can also mention limitations and suggestions for future research.
3.2 Submission format
The final extended abstract should be submitted in PDF format, font size 11 or 12 points. The report should have a maximum
length of 1,500 words. The total word count includes the title, introduction, related works, data, method, results, conclusion,
captions, and excludes the bibliography at the end. The maximum number of figures is 8 in total (subfigures are allowed). An
example structure of the extended abstract could be:
1. Background and Related works
3/5
2. Data
3. Method
4. Results
5. Conclusion
6. References
Code should be submitted separately as a single ZIP file. The code can be submitted as Jupyter worksheet(s) or as a set of
Python files. Any large datasets (eg. images) that you require for these tasks should be uploaded separately with a link provided
within the codebook. We recommend uploading your dataset to OneDrive, create a share link, and providing this link in your
notebook.
Some tips:
1. computation is an important factor to consider when running machine learning models.
2. Hundreds and often thousand of images are sometimes required for a simple image classification task. As a result, to
reduce the need for training data consider using pretrained models for feature extractions.
3. If the learning process still takes too long, consider using Google Colab to run the analysis. However only move to
Google Colab when your processes run on your local machine.
4. use figures with captions when you want to elaborate a point.
5. use tables when you want to summarise your results.
6. Remember to have in-text citation when you are using a specific model and method.
3.3 Example datasets
An example of an image-dataset, you could use is the scenicness dataset that was provided in the week5 lab notebook and
week6 lab seminar. This dataset is a subset of the original Scenic-or-not dataset as used in3. You are not allowed to reuse the
scenicness ratings. As such, you would need to propose a new research question using these images. Below are some example
datasets you could consider using:
• Scenic-or-not images (eg. cannot reuse the scenicness ratings)
• flickR images
• Google Earth Engine imagery
• Google Streetview images
4 Submission details
You should submit both parts of the course work as a single report through Turnitin on the course Moodle page, under the
’Assessment’ tab.
Your code should be submitted as a single ZIP file on the course Moodle page for each task. Two submission links will be
available for you to upload your code for each of the tasks. To be clear: this means that you will have upload two ZIP files in
total, one containing all the code for the first task and another one containing all the code for the second task. Note: Failure to
include your full code will incur a 10-point penalty.
The submission deadline is May 3rd, 2022 at noon. Further details on the submission procedures will be available on
Moodle.
4.1 Queries
A sub-channel has been created specifically for queries about the coursework to be asked in. All related queries must be posted
in this sub-channel; this is largely to address a likely overlap in questions that students may have and so that all students will
benefit from any clarification that is given.
4/5
Questions seeking clarification about, for instance, the wording of the task briefs or format of submission will be answered.
However, as this is an assessed piece of work, you may not ask about questions that pertain directly to the coursework itself,
e.g. ”Is analysis X the best way to answer question 1a?” Because of the same reason, any collaboration or discussion of the
coursework with anyone is strictly prohibited. The rules for plagiarism apply and any cases of suspected plagiarism of other
works, published or not, will be taken very seriously.
The deadline for questions is April 26th, 2022, i.e. 1 week before submission deadline (May 3rd, 2022).
References
1. Law, S., Seresinhe, C. I., Shen, Y. & Gutierrez-Roig, M. Street-frontage-net: urban image classification using deep
convolutional neural networks. Int. J. Geogr. Inf. Sci. 34, 681–707 (2020). URL https://doi.org/10.1080/
13658816.2018.1555832. DOI 10.1080/13658816.2018.1555832. https://doi.org/10.1080/13658816.
2018.1555832.
2. Jean, N. et al. Combining satellite imagery and machine learning to predict poverty. Sci. 353, 790–794 (2016).
3. Seresinhe, C. I., Preis, T. & Moat, H. S. Using deep learning to quantify the beauty of outdoor places. Royal Soc. open
science 4, 170170 (2017).
5/5