Data Analysis
Term 1, Essay Two
ESSAY QUESTIONS 2020–2021 (Term 1 – Part 2)
Guidelines for Completing and Submitting POLS0010 Essay
Read the below guidelines to avoid losing unnecessary marks.
The assessment is due on Monday 11th January 14.00 hours. Please follow all
designated Department of Political submission guidelines. THESE MAY BE
guidelines are available on the Moodle page for this module. You must submit one
copy of your essay via Turnitin. The word limit is 1,500 words, excluding tables
and graphs, references, and your R script appendix (see below).
This is an assessed piece of coursework for the POLS0010 module; collaboration
and/or discussion with anyone is strictly prohibited. The rules for plagiarism apply
and any cases of suspected plagiarism of published work or the work of classmates
will be taken seriously.
The dataset for the essay can be found in the ‘Dataset’ folder on Moodle. The data
for Part A come from the Understanding Society COVID-19 study, April 2020. The
study includes all members of the main Understanding Society samples – see here
for more detail. The data are clustered and stratified probability samples of postal
addresses in the UK. A monthly COVID-10 survey was implemented as a web
survey between April and July 2020. You have been provided with the April 2020
data. In some months there were additional telephone surveys for households
without internet access.
A subset of variables from the COVID-19 study are included in your dataset. The
data are collected about the individuals (e.g. age and sex).
These are real research data from the UK Data Service. The respondents agreed for
their data to be used for research and learning purposes. Before you can access this
data, you need to agree to some important term of use.
Once you have agreed to the conditions of use, you may open up the dataset and
work on the essay questions anytime up until the submission date. There is no limit
on the number of times you may open the data files. Be sure to save your data file
and R script file.
The essay questions comprise two sections; you must complete each part of each
Where appropriate, answers should be written in complete sentences; no bulleting
or outlining. Be sure to answer all parts of the questions posed and interpret output
statistically and substantively.
You should include tabular and graphical output alongside your written answers in
Part A and without any tabular and graphical output in Part B (see below).
You should include a copy of your R script as an appendix to your essay. FAILURE
your R script file should include comments indicating the question being addressed.
Your R script file should contain only the exercises/questions asked here.
All variable names are shown in italics.
You should discuss the interpretation of your results and how they relate to the
questions you were asked.
You may assume the methods you have used (e.g. linear regression) are understood
by the reader and do not need definitions, but you do need to say which techniques
you have used and why.
As this is an assessed piece of work, you may not email/ask the module tutors
questions about the essay questions.
10 points will be awarded for presentation.
This assessment is out of 100 marks and will count towards 50% of the term 1
The data file is ca_indresp_w_POLS0010.dta. You can copy this file in the usual way
from Moodle once you have agreed to the conditions of use. The variables are:
Variable name Variable label
pidp Respondent identification number
psu Primary sampling unit
strata Sampling strata
birthy Year of birth
racel_dv Ethnic group
bornuk_dv Born in UK
ca_age Age
ca_sex Sex
ca_couple Living with a partner
ca_hhcompa Household composition - Aged 0-4
ca_hhcompb Household composition - Aged 5-15
ca_hhcompc Household composition - Aged 16-18
ca_hadsymp Has had symptoms that could be coronavirus
ca_tested Tested for coronavirus
ca_testresult Result of coronavirus test
ca_hcond_cv96 No chronic health condition
ca_sclonely_cv How often feels lonely
ca_blwork Worked in Jan-Feb 2020
ca_furlough Furloughed under the Coronavirus Job Retention Scheme
ca_sempgovt Government support for self-employed
ca_keyworker Key worker
ca_wah Worked at home Jan-Feb 2020
ca_timechcare Time spent on childcare or home schooling
ca_auditc1_cv Main source of income
ca_auditc3_cv Alcohol frequency last 4 weeks
ca_smoker Smoker
ca_scghq1_dv Subjective wellbeing (GHQ)
ca_gor_dv Government Office Region
ca_hhcompa Number of residents aged 0-4
ca_hhcompb Number of residents aged 5-15
ca_hhcompc Number of residents aged 16-18
ca_betaindin_xw Survey weight
For more information about these data can be found from following link:
DOs and DON’Ts
- DON’T include raw variable names in the text or tables
- DON’T use too many decimal places, but be consistent
- DON’T include unedited R output in the main text of your essay or you will
lose marks
- DO make sure tables and figures have titles and referenced in the text
- DO make sure your tables and figures can be understood without reading the
- DO make sure you have given a clear enough description of what you have
done so that the reader can reproduce any numbers/results that you present
- DO be careful how you use the terms ‘significant’ and ‘correlation’ because
they have specific meanings in social statistics.
PART A: Multiple Linear Regression (60 Points)
This question uses the ca_indresp_w_POLS0010.dta dataset. You have been asked to
write a short report on the relationship between the amount of time spent on childcare
or home-schooling during the UK’s coronavirus lockdown and psychological distress.
You should create a categorical variable of time spent on childcare or home-schooling
with up to four levels and use this variable to fit a multiple linear regression model(s)
predicting psychological distress. You may choose to add additional explanatory
variables to your model that may explain the relationship between time spent on
childcare and home schooling and psychological distress. You may choose to recode or
transform some variables in your data. You should report any decisions you take to
adjust for individual non-response in your data and to take account of any complex
survey design. These decisions should form an introduction that also includes a
description of your dataset, subsetting of respondents, descriptive statistics and your
research hypothesis. Briefly explain any limitations to your analysis in a concluding
section that also summarises your main substantive finding.
PART B: Regression Interpretation (30 Points)
The model below is from a paper published in an economic geography journal on how
responsive cross-border commuters are to changes in exchange rates between home
and host countries’ currencies. Data are taken from the Swiss Earning Structure
Survey on the number of hours worked by hourly employees in the Ticino,
Switzerland. Your task is to interpret the model and write a short report on the results
and conclusions for employers in Ticino. Interpret the model statistically and
substantively. Your results section should report on descriptive and statistical
findings. Your conclusion should discuss whether the results support an effect of
changes in exchange rates on the number of hours worked by Italian cross-borders
commuters in Ticino. A table containing descriptive statistics is appended.
Linear regression of labour supply response to exchange rate
Coefficients (Standard error)
ln e 0.930 (0.848)
CBC 0.802*** (0.046)
CBC * ln e 1.117*** (0.111)
N 27,155
R-squared 0.482
Controls Yes
Notes: The dependent variable is the log of the number of hours worked per employee.
The sample includes only cross-border commuters from Italy and Swiss residents. ln e is
the log of the Euro to Swiss Franc exchange rate, CBC is a dummy variable for cross-
border commuter and CBC * ln e is an interaction term between the dummy and the log
of the exchange rate. Controls include dummies for sex, age, education level and sector
of the firm, the log of the Italian GDP, the log of the Swiss GDP and the log of
unemployment rate in Lombardy, Italy. A dummy for the period after 2007 is also
included. Robust standard errors in parentheses. The following symbols indicate
different significance levels: ***p0.01, **p0.05, *p0.1.
Source: Swiss Earnings Structure Survey (2004–2012, biennial).
Mean number of hours worked per month, mean age and share of women among
hourly paid employees for Swiss and Italian cross-border commuters in Ticino,
Variable Hours worked Age Female
Swiss residents 76.8147 43.1426 0.6993
SD 61.394 12.5878 0.4586
N 11,135 11,135 11,135
Italian residents 149.1724 40.4598 0.5039
SD 52.4078 10.4984 0.5
N 16,020 16,020 16,020
Source: Swiss Earnings Structure Survey (2004–2012, biennial).
10 points are reserved for clear presentation and clarity of answers, especially in
regard to production of tabular and graphical outputs.
8-10 clear answers with outputs shown in concise format
5-7 correct answers with outputs that can be understood but cumbersome
0-4 confused answers with unclear outputs.