Data Analysis
Term 1, Part 2

ESSAY QUESTIONS 2021–2022 (Term 1 – Part 2)

Guidelines for Completing and Submitting POLS0010 Essay
Read the below guidelines to avoid losing unnecessary marks.
The assessment is due on 10th January 2022 at 14.00 hours. Please follow all
designated Department of Political Science submission guidelines. THESE MAY
guidelines are available on the Moodle page for this module. You must submit one
copy of your essay via Turnitin. The word limit is 1,500 words, excluding tables
and graphs, references, and your R script appendix (see below).
This is an assessed piece of coursework for the POLS0010 module; collaboration
and/or discussion with anyone is strictly prohibited. The rules for plagiarism apply
and any cases of suspected plagiarism of published work or the work of classmates
will be taken seriously.
The dataset for the essay can be found in the ‘Dataset’ folder on Moodle. The data
for Part A come from the Understanding Society: Longitudinal Teaching Dataset,
Waves 1-9, 2009-2018. The study includes all members of the main Understanding
Society samples – see here for more detail. The data are clustered and stratified
probability samples of postal addresses in the UK.
These are real research data from the UK Data Service. The respondents agreed for
their data to be used for research and learning purposes. Before you can access this
data, you need to agree to some important terms of use.
Once you have agreed to the conditions of use, you may open up the dataset and
work on the essay questions anytime up until the submission date. There is no limit
on the number of times you may open the data file. Be sure to save your data file
and R script file regularly.
The essay questions comprise two sections; you must complete each part of each
Where appropriate, answers should be written in complete sentences; no bulleting
or outlining. Be sure to answer all parts of the questions posed and interpret output
statistically and substantively.
You should include tabular and graphical output alongside your written answers in
Part A and without any tabular and graphical output in Part B (see below).
You should include a copy of your R script as an appendix to your essay. FAILURE
your R script file should include comments indicating the question being addressed.
Your R script file should contain only the exercises/questions asked here.
All variable names are shown in italics.
You should discuss the interpretation of your results and how they relate to the
questions you are asked.
You may assume the methods you have used (e.g. linear regression) are understood
by the reader and do not need definitions, but you do need to say which techniques
you have used and why.
As this is an assessed piece of work, you may not email/ask the module tutors
questions about the essay questions.
10 points will be awarded for presentation.
This assessment is out of 100 marks and will count towards 50% of the term 1

The data file is longitudinal_td.dta. You can copy this file in the usual way from
Moodle once you have agreed to the conditions of use. The variables you might use

Variable name Variable label
pidp Respondent identifier
wave Interview wave
hidp Household identifier
psu Primary sampling unit
strata Sampling strata
indinus_lw_9 Sampling full-interview weight
indscus_lw_9 Sampling self-completion weight
gor_dv Government Office Region
urban_dv Urban or rural area
country Country of residence
age_dv Age
doby_dv Year of birth
sex_dv Sex
ethn_dv Ethnic group
bornuk_dv Born in UK
mstat_dv Marital status
hiqual_dv Highest qualification ever reported
sf12pcs_dv Short Form 12 (SF-12) Physical Component Summary
sf12mcs_dv Short Form 12 (SC-12) Mental Component Summary
scghq1_dv Subjective wellbeing (GHQ)
jbstat Current economic activity
tenure_dv Housing tenure
fihhmngrs_dv Gross household income

For more information about these data can be found from following link:

DOs and DON’Ts
- DON’T include raw variable names in your text or tables
- DON’T use too many decimal places, but be consistent
- DON’T include unedited R output in the main text of your essay or you will
lose marks

- DO make sure tables and figures have titles and are referenced in the text
- DO make sure your tables and figures can be understood without reading the
- DO make sure you have given a clear enough description of what you have
done so that the reader can reproduce any numbers/results that you present
- DO be careful how you use the terms ‘significant’ and ‘correlation’ because
they have specific meanings in social statistics.

PART A: Multiple Linear Regression (60 Points)
This question uses the longitudinal_td.dta dataset. You have been asked to write a short
report on the health inequalities in the UK using Understanding Society data from wave
9. You should select one measure of health or wellbeing (e.g. SF-12 [physical or mental
component summaries] or GHQ) available at wave 9 of Understanding Society as your
outcome variable. Predict this variable using one measure of socioeconomic status (e.g.
education or income). You may choose to add additional explanatory variables to your
model that may explain the relationship between socioeconomic status and health. You
may choose to recode or transform some variables in your data. You should report any
decisions you take to adjust for survey non-response in your data and to take account
of any complex survey design. These decisions should form an introduction that also
includes a description of your dataset, selection of a complete case study sample,
sample characteristics and your research hypothesis (i.e. why you expect your measure
of socioeconomic status to be related to health). Briefly explain any limitations to your
analysis in a concluding section that also summarises your main substantive findings.

PART B: Regression Interpretation (30 Points)
The model below is from a paper published in the Journal of Ethnic and Migration
Studies on attitudes to immigration in Europe from lower income countries by skill
level and country of origin. The data are taken from the European Social Survey
(ESS). The ESS respondents were randomly assigned in each country to state to what
extent they think their country should allow professionals from one of four groups
combining skill level of migrants (professional or unskilled) and origin country
(poorer sending country within Europe or poorer sending country outside Europe) to
come to live in their country. Your task is to interpret the model estimates and report
on implications of your findings for immigration policies in European nation states.
You should report on descriptive findings first and statistical findings second. You
should discuss whether the results support an increase in immigration of high skilled
immigration and whether this is contingent on whether immigrants are from other
European countries. A table containing descriptive statistics is appended.

Linear regression model of immigrant acceptance by country origin and skill level
Estimate Std. error t value Pr(>|t|)
European migrant 0.17 0.01 13.02 0.000
High skilled migrant 0.60 0.01 44.95 0.000
European × high skill −0.05 0.02 −2.535 0.011
(Intercept) 1.10 0.009 116.7 0.000
Notes: The dependent variable is a 0–3 scale, where a higher value refers to greater
acceptance of migrants to come and live in a respondent’s country. The response options were
allow none; allow a few; allow some; or allow many to come and live here. European
migrant is a dummy variable for respondents who provided answers on migrant acceptance
from a poorer European origin country versus a poorer non-European country. High skilled
migrant is a dummy variable for respondents who provided answers on migrant acceptance of
professional versus unskilled. The model contains an interaction between these two dummy
Percentage of immigrant acceptance level by migrant group
Allow many 23 19 9 8
Allow some 44 44 32 26
Allow few 25 26 34 35
Allow none 9 11 25 31

10 points are reserved for clear presentation and clarity of answers, especially in
regard to production of tabular and graphical outputs.
8-10 clear answers with outputs shown in concise format
5-7 correct answers with outputs that can be understood but cumbersome
0-4 confused answers with unclear outputs.