ECO 480-ECO480 stata代写-Assignment 1
时间:2021-10-15
ECO 480 Computer Assignment 1 1 ECO 480 Econometrics 1 Computer
Assignment 1 Due Tuesday, 10/19/2021 Instruction: You have three weeks
to complete this assignment. No late work will be accepted and all the
files must be submitted via UBLearns. Make sure you upload it to the
correct submission slot because no credit will be given for incorrect
submissions. I do not accept email submissions. Important: It is
extremely important to write a clean well-commented program for
transparency and replication purposes in all empirical work. You should
always be able to reproduce your result from raw data to support your
claim. There are 3 items to hand in: (1) Typed write-up (i.e.,
word-file) answering the assigned questions, reporting your results, and
interpreting your findings; if the question asks for graphs or tables,
these must be in the word-file in an organized manner with your
interpretation, (2) do-file (i.e., program file), and (3) log-file
(i.e., output file that shows the results). You MUST use Stata. For
questions involving data analysis, you will NOT get any credit if you do
not provide a program code and the output. You may not use Excel. Do
not submit any undigested log-file that contains errors. Put all your
answers in the word file and do NOT say “please see log-file (or
do-file) for answers. You will not receive any credit for answers that
is not stated in the word file. 1. The following table describes
variables from heightwage.dta we will use in this question. Persico,
Postlewaite, and Silverman (2004) analyzed data from the National
Longitudinal Survey of Youth 1979 cohort to assess the relationship
between height and wages for white men who were between 14 and 22 years
old in 1979. This data set consists of answers from individuals who were
asked questions in various years between 1979 and 1996. Variable Name
Description wage96 Hourly wages (in dollars) in 1996 height85 Adult
height: height (in inches) measured in 1985 height81 Adolescent height:
height (in inches) measured in 1981 athlets Participation in high school
athletics (1=yes, 0=no) clubnum Number of club memberships in high
school, excluding athletics and academic/vocational clubs siblings
Number of siblings age Age in 1996 male Male (1=yes, 0=no) a. Estimate
two regress models: one in which adult wages is regressed on adult
height for all respondents, and another in which adult wages is
regressed on adult height and adolescent height for all respondents.
Discuss differences across the two models. Explain why the coefficient
on adult height changed. [Hint: Think in terms of omitted variable
bias.] b. Calculate heteroskedasticity robust standard error. What has
changed? Discuss whether calculating heteroskedasticity robust standard
error is more appropriate than calculating homoskedasticity only
standard error. ECO 480 Computer Assignment 1 2 c. Assess the
multicollinearity of the two height variables using (1) a plot, and (2)
an auxiliary regression (i.e., a regression that is not directly the one
of interest but yields information helpful in analyzing the equation we
really care about). Run the plot once without a jitter subcommand and
once with it, and choose the more informative of the two plots. d.
Notice that IQ is omitted from the model. Is this a problem? Why or why
not? e. Notice that eye color is omitted from the model. Is this a
problem? Why or why not? f. Estimate a regression model with adult wages
as the dependent variable and adult height, adolescent height, and a
dummy variable for males as the independent variable. Does controlling
for gender affect the results? g. Generate a female dummy variable.
Estimate a model with both a male dummy variable and a female dummy
variable. What happens? Why? h. Re-estimate the model from part (f)
separately for males and females. Do these results differ from the model
in which male was included a dummy variable? Why or why not? i. Every
observation is categorized into one of four regions based on where the
subjects lived in 1996. The four regions are Northeast (norest96),
Midwest (norcen96), South (south96), and West (west96). Add dummy
variables for regions to a control for regional effect. What are the
regional variables you would include? Explain. j. Estimate a regression
model (f) with male and regional dummy variables. First exclude West,
then interpret the coefficient of each regional dummy variables. k.
Re-estimate the model in part (j) except exclude South instead.
Interpret the coefficient of each regional dummy variables. l. You’re
the boss! Use the data in this file to estimate a model that you think
sheds light on an interesting relationship. The specification decisions
include whether to limit the sample and what variables to include.
Report only a single additional specification. Describe in no more than
two paragraphs why this is an interesting way to assess the data. [Hint:
Decide on which control variable you would like to include in your
model and explain.] ECO 480 Computer Assignment 1 3 2. Do cell phones
distract drivers and cause accidents? Worried that this is happening,
many states have passed legislation to reduce distracted driving.
Fourteen states now have laws making handheld cell phone use while
driving illegal, and 44 states have banned texting while driving. This
problem looks more closely at the relationship between cell phones and
traffic fatalities. The following table describes the variables in the
dataset Cellphone_2012.dta. Variable Name Description year Year states
State name state_numeric State name (numeric representation of state)
numberofdeaths Number of traffic deaths cell_subscription Number of cell
phone subscriptions (in thousands) population Population within a state
total_miles_driven Total miles driven within a state for that year (in
millions of miles) a. While we don’t know how many people are using
their phones while driving, we can find the number of cell phone
subscriptions in a state (in thousands). Estimate a simple regression
model with traffic deaths as the dependent variable and number of cell
phone subscriptions as the independent variable. Briefly discuss the
results. Do you suspect that the estimate on cell phone subscription
suffers from omitted variable bias? If so, what would be the direction
of bias? Discuss. b. Add population to the model. What happens to the
coefficient on cell phone subscriptions? Why? [Hint: Discuss in terms of
omitted variable bias and the direction of bias.] c. Add total miles
driven to the model in addition to population. What happens to the
coefficient on cell phone subscription? Why? [Hint: Discuss in terms of
omitted variable bias and the direction of bias.] 3. For this question,
we will use the dataset Growth we used together in class, which contains
data on average growth rates from 1960 through 1995 for 65 countries,
along with variables that are potentially related to growth. This
dataset is already posted in the data folder. A detailed description is
given in Growth Description.pdf. As we did in class, we will examine the
relationship between growth and trade. a. (5) As we saw in class, the
regression function that includes Malta is steeper than the regression
function that excludes Malta. Do a brief research on Malta and explain
why the Malta trade share is so large. Based on your research please
state whether Malta should be included or excluded from the analysis and
explain your reasoning. b. (5) Exclude the data for Malta and run a
regression of Growth on TradeShare. Interpret the slope coefficient. ECO
480 Computer Assignment 1 4 c. (5) Is the estimate regression slope
statistically significant at the 1%, 5%, or 10% level? d. (5) Manually
calculate the t-statistic. Show your work. e. (5) Manually construct a
99%, 95%, and 90% confidence interval and interpret these intervals.
Show your work. f. (10) Instead of two-sided alternative hypothesis,
suppose that you change your alternative hypothesis : 1 > 0. How
would your t-statistics, p-value, and conclusion change? 4. Real data
can be messy because they are not produced by stainless steel robots in
sterilized rooms. Therefore, one of the crucial first steps for any
econometric analysis is to know and understand our data. A useful first
step toward understanding data is to review sample size, mean, standard
deviation, and minimum and maximum for each variable (i.e., basic
summary statistics). Also plotting data is useful for identifying
patterns and anomalies in data. a. The dataset donut.dta contains data
on eating donuts (measured as average weekly donut consumption) and
weights to understand whether eating donuts causes health problems.
However, there is one catch: each of the variable has an error. Identify
the errors in each variable and briefly explain. b. I proposed the
following regression model to characterize the causal relationship of
eating donuts on weight: ℎ = 0 + 1 + Is the least square assumption #1
satisfied? State the least square assumption and explain why or why not.
Also discuss what would be the consequence if the least square
assumption #1 is not satisfied.