ECO 480-ECO480 stata代写-Assignment 1
时间:2021-10-15
ECO 480 Computer Assignment 1 1 ECO 480 Econometrics 1 Computer Assignment 1 Due Tuesday, 10/19/2021 Instruction: You have three weeks to complete this assignment. No late work will be accepted and all the files must be submitted via UBLearns. Make sure you upload it to the correct submission slot because no credit will be given for incorrect submissions. I do not accept email submissions. Important: It is extremely important to write a clean well-commented program for transparency and replication purposes in all empirical work. You should always be able to reproduce your result from raw data to support your claim. There are 3 items to hand in: (1) Typed write-up (i.e., word-file) answering the assigned questions, reporting your results, and interpreting your findings; if the question asks for graphs or tables, these must be in the word-file in an organized manner with your interpretation, (2) do-file (i.e., program file), and (3) log-file (i.e., output file that shows the results). You MUST use Stata. For questions involving data analysis, you will NOT get any credit if you do not provide a program code and the output. You may not use Excel. Do not submit any undigested log-file that contains errors. Put all your answers in the word file and do NOT say “please see log-file (or do-file) for answers. You will not receive any credit for answers that is not stated in the word file. 1. The following table describes variables from heightwage.dta we will use in this question. Persico, Postlewaite, and Silverman (2004) analyzed data from the National Longitudinal Survey of Youth 1979 cohort to assess the relationship between height and wages for white men who were between 14 and 22 years old in 1979. This data set consists of answers from individuals who were asked questions in various years between 1979 and 1996. Variable Name Description wage96 Hourly wages (in dollars) in 1996 height85 Adult height: height (in inches) measured in 1985 height81 Adolescent height: height (in inches) measured in 1981 athlets Participation in high school athletics (1=yes, 0=no) clubnum Number of club memberships in high school, excluding athletics and academic/vocational clubs siblings Number of siblings age Age in 1996 male Male (1=yes, 0=no) a. Estimate two regress models: one in which adult wages is regressed on adult height for all respondents, and another in which adult wages is regressed on adult height and adolescent height for all respondents. Discuss differences across the two models. Explain why the coefficient on adult height changed. [Hint: Think in terms of omitted variable bias.] b. Calculate heteroskedasticity robust standard error. What has changed? Discuss whether calculating heteroskedasticity robust standard error is more appropriate than calculating homoskedasticity only standard error. ECO 480 Computer Assignment 1 2 c. Assess the multicollinearity of the two height variables using (1) a plot, and (2) an auxiliary regression (i.e., a regression that is not directly the one of interest but yields information helpful in analyzing the equation we really care about). Run the plot once without a jitter subcommand and once with it, and choose the more informative of the two plots. d. Notice that IQ is omitted from the model. Is this a problem? Why or why not? e. Notice that eye color is omitted from the model. Is this a problem? Why or why not? f. Estimate a regression model with adult wages as the dependent variable and adult height, adolescent height, and a dummy variable for males as the independent variable. Does controlling for gender affect the results? g. Generate a female dummy variable. Estimate a model with both a male dummy variable and a female dummy variable. What happens? Why? h. Re-estimate the model from part (f) separately for males and females. Do these results differ from the model in which male was included a dummy variable? Why or why not? i. Every observation is categorized into one of four regions based on where the subjects lived in 1996. The four regions are Northeast (norest96), Midwest (norcen96), South (south96), and West (west96). Add dummy variables for regions to a control for regional effect. What are the regional variables you would include? Explain. j. Estimate a regression model (f) with male and regional dummy variables. First exclude West, then interpret the coefficient of each regional dummy variables. k. Re-estimate the model in part (j) except exclude South instead. Interpret the coefficient of each regional dummy variables. l. You’re the boss! Use the data in this file to estimate a model that you think sheds light on an interesting relationship. The specification decisions include whether to limit the sample and what variables to include. Report only a single additional specification. Describe in no more than two paragraphs why this is an interesting way to assess the data. [Hint: Decide on which control variable you would like to include in your model and explain.] ECO 480 Computer Assignment 1 3 2. Do cell phones distract drivers and cause accidents? Worried that this is happening, many states have passed legislation to reduce distracted driving. Fourteen states now have laws making handheld cell phone use while driving illegal, and 44 states have banned texting while driving. This problem looks more closely at the relationship between cell phones and traffic fatalities. The following table describes the variables in the dataset Cellphone_2012.dta. Variable Name Description year Year states State name state_numeric State name (numeric representation of state) numberofdeaths Number of traffic deaths cell_subscription Number of cell phone subscriptions (in thousands) population Population within a state total_miles_driven Total miles driven within a state for that year (in millions of miles) a. While we don’t know how many people are using their phones while driving, we can find the number of cell phone subscriptions in a state (in thousands). Estimate a simple regression model with traffic deaths as the dependent variable and number of cell phone subscriptions as the independent variable. Briefly discuss the results. Do you suspect that the estimate on cell phone subscription suffers from omitted variable bias? If so, what would be the direction of bias? Discuss. b. Add population to the model. What happens to the coefficient on cell phone subscriptions? Why? [Hint: Discuss in terms of omitted variable bias and the direction of bias.] c. Add total miles driven to the model in addition to population. What happens to the coefficient on cell phone subscription? Why? [Hint: Discuss in terms of omitted variable bias and the direction of bias.] 3. For this question, we will use the dataset Growth we used together in class, which contains data on average growth rates from 1960 through 1995 for 65 countries, along with variables that are potentially related to growth. This dataset is already posted in the data folder. A detailed description is given in Growth Description.pdf. As we did in class, we will examine the relationship between growth and trade. a. (5) As we saw in class, the regression function that includes Malta is steeper than the regression function that excludes Malta. Do a brief research on Malta and explain why the Malta trade share is so large. Based on your research please state whether Malta should be included or excluded from the analysis and explain your reasoning. b. (5) Exclude the data for Malta and run a regression of Growth on TradeShare. Interpret the slope coefficient. ECO 480 Computer Assignment 1 4 c. (5) Is the estimate regression slope statistically significant at the 1%, 5%, or 10% level? d. (5) Manually calculate the t-statistic. Show your work. e. (5) Manually construct a 99%, 95%, and 90% confidence interval and interpret these intervals. Show your work. f. (10) Instead of two-sided alternative hypothesis, suppose that you change your alternative hypothesis : 1 > 0. How would your t-statistics, p-value, and conclusion change? 4. Real data can be messy because they are not produced by stainless steel robots in sterilized rooms. Therefore, one of the crucial first steps for any econometric analysis is to know and understand our data. A useful first step toward understanding data is to review sample size, mean, standard deviation, and minimum and maximum for each variable (i.e., basic summary statistics). Also plotting data is useful for identifying patterns and anomalies in data. a. The dataset donut.dta contains data on eating donuts (measured as average weekly donut consumption) and weights to understand whether eating donuts causes health problems. However, there is one catch: each of the variable has an error. Identify the errors in each variable and briefly explain. b. I proposed the following regression model to characterize the causal relationship of eating donuts on weight: ℎ = 0 + 1 + Is the least square assumption #1 satisfied? State the least square assumption and explain why or why not. Also discuss what would be the consequence if the least square assumption #1 is not satisfied.
essay、essay代写