matlab代写-EBA 35302
时间:2022-04-26
Final exam EBA 35302 autumn
The exam contains 6 main blocks and there is a total of 100 points available. Given the current Covid-19
situation and exam format, the assignment deliberately contains many questions. Do your best to answer
as many of them as possible. Write short and precise answers rather than long answers. You do
not need to provide a reference list.
The answer paper must be written and prepared individually. Collaboration with others is not per-
mitted and is considered cheating. All answer papers are automatically subject to plagiarism control.
Students may also be called in for an oral consultation as additional verification of an answer paper.
Good luck!
1. (10 points) True or false
Read the statements below. Only answer whether they are true or false
(a) (1 points) Using cross-validation routines is always better than relying on simple information
criteria such as BIC or AIC.
(b) (1 points) The p-value reflects the probability that a parameter is zero.
(c) (1 points) More data observations will by construction make the standard errors of the esti-
mated parameters smaller.
(d) (1 points) The Ridge estimator incorporates regularization, but does not allow for variable
selection.
(e) (1 points) You can use the OLS estimator for binary outcome data, but it is generally better
to use the LASSO.
(f) (1 points) The RMSE is typically used as a measure for out-of-sample forecasting accuracy.
(g) (1 points) The LASSO estimator is the best linear unbiased estimator you can use.
(h) (1 points) When you do not know the true data generating process you can construct confidence
intervals around your point estimates using Bootstrapping routines.
(i) (1 points) When the penalization weight is infinitely large the LASSO estimator is identical
to the OLS estimator.
(j) (1 points) The scale of the variables you use in a OLS regression do affect the parameter
estimates, but not the significance level.
2. (20 points) Scoring rules and uncertainty
(a) (5 points) Make a drawing with explanations illustrating how to perform out-of-sample fore-
casting evaluation with time series data. Give at least two reasons for why using out-of-sample
validation might be a good idea
(b) (3 points) You are working with binary outcome variables and have used the ROC curve to
evaluate your model. Explain the elements of the confusion matrix, and illustrate how an
optimal ROC curve would look like. What would the ROC curve look like if the predicted
outcomes were purely random?
(c) (3 points) A high R2, estimated in-sample, might not lead to good out-of-sample performance.
But, does a model that predicts well out-of-sample have a good in-sample fit? Why/Why not?
(d) (3 points) Exemplify, with equations, the estimators for the out-of-sample bias and Root Mean
Squared Forecast Error. Explain clearly what the elements in the equations are. Explain, in
words, what these scoring rules actually measure.
1
(e) (3 points) Explain the bias-variance trade-off. If you want to estimate the elasticity of demand
for a product you are selling, would you be more concerned about biased estimates, or of
having a high variance?
(f) (3 points) What is the purpose of Monte Carlo algorithms? Explain, in generic terms, how
you would perform a Monte Carlo experiment based on a linear regression model
3. (15 points) Time series data
(a) (3 points) Exemplify, with an equation, how we can conceptually think about the different
components of time series data. Make figures illustrating these components.
(b) (3 points) Explain the meaning of stationarity. In terms of the figures you drew in the previous
question, which of the components are stationary/non-stationary?
(c) (3 points) Write down the equation for a AR(2). Consider now an AR(1), what determines if
this process is stationary or not? Given that the autoregressive parameter is 0.9 and yt = 10,
what is E(yt+2|yt)? What do we call the process if the autoregressive parameter equals 1?
(d) (3 points) Assume that you have some time series data that is trending and that you are not
willing to assume that the trend is linear. Explain how you can, by regression techniques,
remove a non-linear trend from the raw data.
(e) (3 points) If the data actually follows a Random Walk, what happens if you detrend it using
a linear trend specification?
4. (15 points) Dimension reduction and regularization
You are a summer intern at Norges Bank Investment Management (NBIM). This is one of the world
largest funds, and its main office is on Oslo. It is important for the company to predict future stock
market developments. At your department you have access to daily return series from over 1000
international companies covering more than 2 decades. Your boss is interested in figuring out the
co-movement in these returns series and how this co-movement might be related to developments
in other markets, such as oil prices and interest rates.
(a) (1.5 points) Mention at least two methods you can use to say something about the co-movement
in the data.
(b) (3 points) Explain Principal Component Analysis (PCA), and why a regression like f1,t =
β∆oilPricet + ut can say something about how the return series relate to oil market develop-
ments. (Here f1,t is the first factor estimate from PCA analysis using all the return series).
How can you check how much variance is captured by each principal component? In what
situations would this regression be a really bad idea.
(c) (1.5 points) Explain two approaches you can use to interpret the results from Principal Com-
ponent Analysis.
(d) (3 points) Assume now that your boss wants to learn something about the linear relationship
between the funds overall return and the return on the 1000 companies in your dataset. What
is the dependent variable in this question, and what method would you use to estimate this
relationship; OLS or LASSO? Explain your answer.
(e) (3 points) In terms of LASSO, describe the role of the regularization parameter. What methods
can you use to determine the optimal degree of shrinkage?
(f) (3 point) What important transformations do you need to apply on your data prior to running
PCA or LASSO?
5. (20 points) Instrumental variable methods
You are a summer intern at one of the major importers of fruits and vegetables in Norway. During
the last decade, your firm has been campaigning actively to make guacamole an essential part
of the traditional Norwegian Taco Friday. To make enough guacamole for a family, one needs
multiple avocados — a fruit that has become widely popular also for other uses the last couple
2
of years. Avocado markets have been volatile lately due to poor harvests and multiple economic
downturns, both globally and in Norway. As the effects of climate change and income inequality
are expected to become more pronounced in the coming years, your firm wants to know how price-
sensitive Norwegian consumers are in order to make revenue forecasts for the next fiscal years. From
microeconomics class last year, you remember that this number X is called the price elasticity of
demand and is interpreted as if the price changes by 1%, the quantity demanded changes by X%.
You also remember that when prices increase, quantities demanded tend to decrease. So typically,
the elasticity is negative.
You internship mentor that has been at the firm for almost 25 years and worked himself up the
corporate ladder from hauling banana cases, has handed you a flash drive containing an extraction
from the inventory system with monthly data for quantities and import prices on avocados from
the previous three years. With it is also a Microsoft Office Excel workbook with a spreadsheet
containing an example of a previous analysis. You are tasked with performing the same analysis on
the most recent data sample. That analysis entails estimating the following linear regression model
lnQt = β0 + β1Pt + et (1)
where Pt is avocado price at time t and Qt is the quantity at time t. You do as you are told.
(a) (5 points) Discuss (briefly) how to interpret the β1 parameter. What data transformation do
you need to do to give β1 a percentage change interpretation.
(b) (5 points) Regardless of data transformations, provide your mentor with an explanation for
why β1 is not the price elasticity of demand. You can use words like simultaneity bias as well
as drawings to support your argument.
You have access to a time series on the precipitation in San Diego county in Southern California.
San Diego is a major avocado producing region and the key supplier for your firm. You also have
a friend working for Statistics Norway over the summer. She has provided you with a time series
of the median disposable income across Norwegian households. You realise that both of these time
series have the potentials to be used as instrumental variables for this market.
(c) (2.5 points) Explain in general terms the two important instrument validity assumptions. Can
you test any of these assumptions?
(d) (2.5 points) In relation to the case above, explain what instrumental variable you would prefer
for the estimation of the price elasticity of demand. Discuss whether or not this instrument
satisfy the instrument validity assumptions.
You denote median Norwegian household income as HHIt and precipitation in San Diego as RAINt.
You estimate the following reduced forms and 1st stage regressions with the two different instru-
ments. Standard errors are in parenthesis and computed assuming that error terms are heteroscedas-
tic.
Reduced form HHI: ln Qˆt = pi0 + 0.34
(0.09)
HHIt, R
2 = 0.06
Reduced form RAIN: ln Qˆt = pi0 + 0.55
(0.05)
RAINt, R
2 = 0.04
1st stage HHI: ln Pˆt = γ0 + 3.08
(1.2)
HHIt, R
2 = 0.16
1st stage RAIN: ln Pˆt = γ0 − 4.00
(3.00)
RAINt, R
2 = 0.01
(e) (2.5 points) Compute the estimated price elasticity of demand using the instrumental variable
approach described above. Give the elasticity an interpretation. How price-sensitive are Nor-
wegians wrt. avocados? Note that you should only use the equations for the instrument that
you believe to be the suitable one.
(f) (2.5 points) Given the estimates listed above, how would you evaluate the relevance of your
chosen instrument?
6. (20 points) Causal inference
3
(a) (5 points) Assume that you have done an quasi-experiment and apply the difference-in-
difference approach to estimate the effect of treatment. Make an illustration showing the
intuition for the difference-in-difference approach. Explain the difference between the treat-
ment and control group and the crucial parallel trends assumptions used in this approach.
(b) (5 points) Specify appropriate linear regression model(s) that can provide estimate(s) of the
causal effect of interest in a difference-in-difference setting. Clearly pinpoint the parameter(s)
that will represent the causal effect in your model(s).
(c) (5 points) Consider the statement: “With observational data causal inference will always rely
on one or many assumptions. With experimental data less assumptions are needed.” Is this
statement true or false. Discuss your answer.
(d) (5 points) Discuss why a method like two-stage least squares (2SLS) can be considered both
a predictive and causal method. Why does a model that predicts well out-of-sample not
necessarily capture the casual relationship between the predictor and the outcome of interest?
[End of exam]
4