ECO-7000A-stata代写
时间:2023-11-02
SCHOOL OF ECONOMICS
ECO-7000A: Econometric Methods
ECO-7009A: Finanacial Econometrics
Take-home Assignment - Autumn Semester 2023
This piece of work should be submitted to the Summative Assessment folder on Blackboard no later
than 3pm on Monday 6 November 2023 (week 7). It accounts for 40% of the overall mark for the module.
The exercise is divided into many parts. Parts marked with an asterisk (*) are the most important.
The file HOUSE_2023 contains data on 2,335 residential properties traded in Norwich between January 2017
and October 2023. The variables are:
house: Property number
price: Sale price in thousands of pounds
beds: Number of bedrooms
baths: Number of bathrooms
recs: Number of recreation rooms
garages: Number of garages
type: 1 if empty plot of land
2 if flat
3 if bungalow
4 if chalet
5 if terraced house
6 if end-terraced house
7 if semi-detached house
8 if detached house
pcode: 1 if post code is NR1 (South and East Central Norwich)
2 if NR2 (West Central Norwich)
3 if NR3 (North Central Norwich)
4 if NR4 (South-West Norwich)
5 if NR5 (West Norwich)
6 if NR6 (North Norwich)
7 if NR7 (East Norwich)
8 if NR8 (North-West Norwich)
sqm: Internal area in square metres.
dg: One if property has double glazing; zero otherwise
solar: One if property has solar panels; zero otherwise
loft: One if property has loft insulation; zero otherwise
gsize: Size of Garden in square metres
poll: Air pollution at property (measured in millionths of a gram of particulate matter
per cubic metre of air)
noise: Level of traffic noise at property (measured in Decibels, DB).
age: Age of property in years
month: Month of transaction: 1 if Jan 2017;
2 if Feb 2017;
:
82 if Oct 2023.
(a) Compare mean and median price (sum price, detail), and obtain a histogram of price (hist
price). Describe the distribution of property prices in Norwich. What does the comparison of mean
and median tell us about the nature of the distribution?
(b) Find the mean price for each of the eight postcodes (table pcode, stat(mean price)). Rank
the eight postcodes by mean property price.
(c) Estimate a regression model using the OLS estimator, with price as the dependent variable, with sqm,
beds, baths, recs and garages as quantitative explanatory variables, and with a set of dummy
variables for type, using “empty plot” as the base case. (To introduce the type dummies, you just need
to include i.type in the regress command).1 Present the results in a table.
[continued over
1 Dummy variables are explanatory variables that take on only two values, 0 and 1. Dummy variables will be covered in
lectures soon.
(d) Explain why one of the eight type dummies must be omitted in order for estimation to be possible.
(e) Evaluate how well the model in part (c) fits the data. That is, quote and interpret R2; quote the F-
statistic for overall significance, conduct an F-test for overall significance, and interpret the result.
(f) Does the intercept parameter have a meaningful interpretation in the model of (c)?
(g) Number of bedrooms and number of recreation rooms both appear to have a negative effect on price.
Does this have a logical explanation?
(h) Using economic concepts where appropriate, interpret each of the other coefficients in the model of (c).
Briefly indicate which are significantly different from zero.
(i) Still using the results from the model of (c), report a 95% confidence interval for the slope parameter
associated with “sqm”. Interpret this interval estimate.
(j) Extend the model of part (c) to include a set of postcode dummies, with NR1 as the “base case” (just
add i.pcode to the regress command). Report the regression results.
(k) Using the formula for the F-test given in the lecture notes, conduct an F-test for the significance of
the postcode dummies, in order to assess the importance of location in price determination. (This is a
test of the model of (c) as a restricted version of the model of (j)).
(l)* Draw up a ranking of the eight postcodes, based on ceteris paribus price comparisons (i.e. based on
your regression results). Does your answer contradict your answer to (b)? If so, why?
(m) Using the model with postcode dummies (part j), predict the price of a terraced house in East Norwich,
with 2 bedrooms, 1 bathroom, 1 recreation room, no garage and an internal area of 60 square metres.
(Give the answer in pounds.) Predict the price of a detached house in North-West Norwich, with 5
bedrooms, 2 bathrooms, 3 recreation rooms, two garages and an internal area of 200 square metres.
(n) Which of the 2,335 properties appears to have been the best in terms of “value for money”, and which
worst? (Hint: Look at the residuals).
FOR PARTS (o)-(r), DO NOT PROVIDE TABLES OF RESULTS; JUST FOCUS ON THE PARTS OF THE
RESULTS THAT ARE RELEVANT IN ANSWERING THE QUESTIONS.
(o)* Add the variables age and age-squared to the model of (j). (To do this, you need to add
c.age##c.age to the regress command). Test the individual and joint significance of age and age-
squared. Plot predicted price against age, with other variables set to means (to do this, you need the
two commands: margins, at(age=(0(20)200)), followed by marginsplot.) There are two
economic arguments for why age affects property price: the depreciation effect (the value of a property
declines as it gets older); and the vintage effect (older properties attract a premium). Are you finding
evidence of the depreciation effect, or the vintage effect, or both?
(p) Add the variable month to the the model of (o), and interpret the coefficient of month. Why is this
information useful to a homeowner?
(q)* Instead of adding just month (as in p), try adding c.month##c.month. Does this improve the model?
Using margins and marginsplot (see part o), obtain a plot of predicted price for month between 0
and 100. Can you use this model to identify the month in which property prices reached (or will reach)
a maximum in Norwich?
(r) Test for heteroskedasticity in the model of (q) using the command hettest age, fstat. Should we
be adjusting for heteroscedasticity? Suggest reasons why the variable age is likely to cause
heteroskedasticity in the present situation.
(s)* Other variables are available in the data set. Experiment with these by adding them in different
combinations (with squared terms and interaction terms where appropriate) to the model (you should
continue to include the variables you have previously used). REPORT ONLY ONE COMPLETE SET
OF RESULTS; THIS SET OF RESULTS SHOULD BE FROM YOUR MOST PREFERRED
SPECIFICATION. Make sure you explain why it is your most preferred specification. Interpret the
coefficients of the variables you have added (there is no need to interpret coefficients of variables
previously used).
Hints for part (s):
• The relationship between price and garden size is almost certainly non-linear, with the marginal
value of an additional square metre falling as garden size rises. For this reason it is inappropriate
to use gsize itself as an explanatory variable. Try log(gsize). Take care in interpreting the
coefficient.
• Can anyone get the adjusted-R2 above 0.97?


essay、essay代写