ECOM30001/ECOM90001: -ecom30001代写-Assignment 2|学霸联盟

ECOM30001/ECOM90001: -ecom30001代写-Assignment 2

时间：2023-05-10

Department of Economics
The University of Melbourne
ECOM30001/ECOM90001: Basic Econometrics
Semester 1, 2023
Assignment 2
Introduction
There are two assignments in the subject, each contributing 10% towards your final grade.
The assignments are linked together and will involve completing an econometric analysis
of a topic, chosen by you, on one of the real-world data-sets that have been provided to
you.
The principal purpose of the first assignment was to provide you with extensive feedback
on the feasibility of your proposed project. Specifically, you were required to provide:
- A description of your research question to be examined.
- Data: a description of the data used.
- Model: a description of the model to be estimated. This should include an explicit
definition of your dependent variable, as well as am list of your intended explanatory
variables.
- Analysis: a description of the proposed estimation methodology to be used, as well
as a statement of any identifying assumptions required for the methodology to be
appropriate.
The first requirement for Assignment 2 will be to re-write this section of your report,
incorporating the feedback that you have received. This might involve:
- a refinement and narrowing of your research question to become more specific and
focused.
- a more precise description of the data used, including details and the motivation
for any additional sample restrictions that you impose.
- a more precise description of your empirical model, including a discussion of the
functional form for your dependent variable (in logs or levels) and the functional
form of your conditional mean function, and whether you have included any quadratic
terms or other interactions in your model.
1
- a refinement of your proposed methodology, incorporating any material that we
have covered since you submitted the first assignment.
The second requirement for Assignment 2 will be to include:
- Summary Statistics : a description and interpretation of the summary statistics
associated with your chosen sample. This would involve a table of means and
standard deviations of your variables used in the analysis. A key component will
be a discussion of the sample characteristics of your sample.
- Results : a description and interpretation of your main results, which will need
to be presented in table(s). Your report should also include a discussion of the
‘robustness’ of your results to different modelling assumptions, if applicable, such
as functional forms, set of included explanatory variables, and/or robust/ordinary
standard errors.
- Conclusions : a discussion of your main conclusions, as well as a discussion of the
limitations of your project.
Word Limit: Although your submitted assignment will include a rewritten section of the
material submitted for Assignment 1, the 650 word limit just applies to the ’new material’
submitted: the second section outlined above that contains your analysis, estimation
results, and conclusions.
Assignment Weighting: The principal aim of the first assignment was to provide you
with extensive feedback on your proposed research project that will be completed in
Assignment 2. Since the two assignments are linked together, I will be awarding you final
marks based on the maximum of your grade in Assignment 2 only or the sum of the grades
in Assignment 1 and the second component (analysis) of Assignment 2 (whichever gives
you a higher grade). This provides you with an excellent opportunity to considerably
improve your final report by incorporating your feedback from Assignment 1.
Extra Help: I will be holding a ‘drop-in’ session on Wednesdays from 10:00am - 12:00pm
(or by appointment) if you would like to obtain some specific advice with your research
project. Please feel free to drop by my office during these times.
2
UK Census Data
The data file uk tidy.csv provides 445,638 observations from the 2011 United King-
dom Population Census that includes data for England and Wales. This file contains a
‘cleaned’ version of a random 1% sample of the 2011 census data. Some further informa-
tion about the data file can be found here:
https://www.ons.gov.uk/census/2011census/2011censusdata/censusmicrodata/
microdatateachingfile
I have already cleaned the data file so I would strongly recommend that you use the ‘raw’
data file uk tidy.csv, rather than downloading the raw (uncleaned) data file from the
above web-site.
The data file uk tidy.csv contains the following variables:
id = Unique individual identifier
region = Region of residence on census night
familycomposition = Family composition
gender = Self-Indentified gender
age = Age in years on census night (Aged over 15)
maritalstatus = Marital status on census night
student = Currently studying?
countryofbirth = Country of birth (born in the United Kingdom?)
health = Self-Reported heath
occupation = Last reported occupation (current occupation if employed)
industry = Last reported industry of employment (current industry if employed)
hours = Hours worked per week (if employed on census night)
lfstatus = Labour force status (on census night)
You will notice that the data file does not contain any continuous variables. All variables
are categorical variables. Further information on the coding of these variables can be
found in the file uk variable listing.xls. The R-script file uk tidy.R provide R code
to include the value labels in the data file.
Note that it is not recommended to include categorical variables directly into an econo-
metric model to be estimated. You will first need to create indicator variables (‘dummy’
variables) for each value of the categorical variables. You will also need to omit one of
these indicator variables to avoid the ‘dummy variable trap’. Please refer to Week 4, Lec-
ture 2: Dummy Variables I for more details on this. The R-script file uk tidy.R provides
some sample R code to create these indicator variables from categorical variables, using
as an example the variable region. You might find the R package fastDummies useful.
3
Some specific issues you may want to explore:
- The potential dependent variables or outcomes are indicator variables. Should
you use a Linear Probability Model (LPM) or a Probit model? Your report should
include a discussion of the main advantages and disadvantages associated with your
modelling choice.
- The errors in the Linear Probability Model (LPM) are inherently heteroskedas-
tic so you will need to use robust standard errors.
- Be careful with your interpretation of marginal effects when using the Probit
model.
- All of the possible outcomes or explanatory variables are categorical variables. How-
ever, you want to explore interactions between these variables. For example, you
want to allow for a differing effect of country of birth on your outcome, by gender.
In this case, you could interact (multiply) these variable together in your model
and then test whether these interaction terms are statistically significant.
Melbourne House Prices
The data file houseprices.csv contains data on the selling prices of 4,238 houses sold in
Melbourne during the period April 2016 to March 2018. You have already seen some of
this data since a subset of this data was used in Tutorial 2. The data file houseprices.csv
contains the following variables:
year = Year of Sale
month = Month of Sale
day = Day of Sale
price = Selling Price, in dollars
rooms = Number of Rooms
bedroom = Number of Bedrooms
bathroom = Number of Bathrooms
car = Number of Car Spaces
buildingarea = Building Area of Property, in square metres
landsize = Landsize of Property, in square metres
large = 1 if landsize ≥ 650 metres squared, 0 otherwise
yearbuilt = Year Property was built
distance = Distance from Melbourne C.B.D., in kilometres
propertycount = Number of properties in postcode
regionname = Region Location of Property
You will notice that the variable regionname is a categorical variable. Further information
on the coding of this variable can be found in the file houseprices variable listing.xls.
The R-script file houseprices tidy.R provide R code to include the value labels for this
variable in the data file.
4
As noted above, it is not recommended to include categorical variables directly into an
econometric model to be estimated using the method of Ordinary Least Squares (OLS).
You will first need to create indicator variables (‘dummy’ variables) for each value of the
categorical variables. You will also need to omit one of these indicator variables to avoid
the ‘dummy variable trap’. Please refer to Week 4, Lecture 2: Dummy Variables I for
more details on this. You might find the R package fastDummies useful.
Some specific issues you may want to explore:
- the data contain a variable indicating the year of sale and also a variable indicating
the month of sale. While, it is feasible to create dummy variables for the year of
sale, there may not be much variation in prices over the two years of data. However,
it might also be worthwhile creating a set of season (or quarter of sale) to explore
or control for seasonal variation in prices. Please review Question 2 in Tutorial 5.
- Heteroskedasticity is likely an issue in this data so you may want to explore different
ways of addressing this: (1) White test for heteroskedasticity; (2) Huber-White
(robust) standard errors that correct for heteroskedasticity of an unknown form; or
(3) Feasible Generalised Least Squares (FGLS).
- Should your dependent variable be expressed in levels or in natural logarithms?
Your report should provide a brief discussion motivating your particular choice of
functional form for you dependent variables and the advantages of your choice.
Remember that the interpretation of the marginal effects will change, depending
on whether your outcome is expressed in levels or natural logarithms.
- Model Specification: You may want to explore different functional forms for your
continuous explanatory variables, such as quadratic functions for age, building area,
and/or distance. Alternatively, you may also wish to explore interactions between
the variables. For example, you want to allow for a differing effect of distance from
the C.B.D. on prices, by region. In this case, you could interact (multiply) these
variable together in your model and then test whether these interaction terms are
statistically significant.
- Multicollinearity : The variables rooms and bulidingarea likely both identify the
variation in prices associated with the size of the house. You may to explore whether
multicolliearity is an issue when both variables are included in your model. Of
course, in applied work there is always a trade-off associated with including as
many variables as feasible to avoid the omitted variables problems while at the
same time avoiding the multicollinearity problem.
- Omitted Variable Bias : You may want to consider the likely impact or direction
of the bias on your estimated effect of interest when there are important variables
excluded from your empirical model. While the limited number of variables in the
data preclude any feasible solutions to this issue, you may want to acknowledge the
possibility of omitted variable bias.
5

学霸联盟