stata代写-PAM 4101
时间:2021-09-17
PAM 4101: Causal Reasoning and Policy Evaluation I
Fall 2021, Problem Set 1
Due September 16, 2021 at 11:59pm ET

Linear regressions

Note: if the answer calls for a calculation, please show your work.

You are working with a dataset containing information about a random sample of babies born in
the U.S. The dataset contains information about each baby’s birth weight in pounds (?!)
and the mother’s age in years (!). The average age of mothers in the dataset is 28 years old.
You run the following regression of birth weight on maternal age:
?! = *" + *#! + -!

The estimated coefficients are *" = 5.5 and *# = 0.07.

1. State in words what the value of the slope coefficient, *#, represents.



2. What is the average birth weight of a baby in the sample?



Suppose the dataset also contains an indicator for whether the mother made regular prenatal care
visits prior to the child’s birth (! = 1) or not (! = 0). You run the following
regression of weight on maternal age and the prenatal care indicator:
?! = *" + *#! + *$! + -!

The estimated coefficients are *" = 5, *# = 0.07, and *$ = 1.

3. Given that the coefficient on maternal age (*#) is unchanged from the previous
regression, what can you say about the relationship between maternal age and whether
the mother made regular prenatal care visits prior to the child’s birth in the sample?



4. What share of babies in the sample had mothers who made regular prenatal care visits?




Suppose the dataset also contains an indicator for whether the mother was a smoker (! =1) or not (! = 0). You run the following regression of birth weight on maternal age, a
smoker indicator, and the interaction of maternal age and the smoker indicator:
?! = *" + *#! + *$! + *%(! × !) + -!

The estimated coefficients on maternal age and the interaction are, respectively, *# = 0.08
and *% = ?0.02. Furthermore, the estimated standard errors for *# and *% are, respectively, C(*#) = 0.04 and C(*%) = 0.01.

5. By how much, on average, does a baby’s birthweight vary when a non-smoking mother is
one year older? By how much does it vary when a smoking mother is one year older?



6. The null hypothesis is that the relationship between a baby’s birth weight and maternal
age in the population is the same for both mothers who are smokers and who are non-
smokers (" ∶ % = 0). What is the value of the test statistic ()? Assuming that the t-
distribution has 100 degrees of freedom, what is the p-value from a two-sided test of the
above null hypothesis?1



7. Would C(*#) be higher or lower if the variance of maternal age (!) in the sample
was higher? Why?



Finally, suppose the dataset also contains a mother’s annual income in thousands of dollars
(!). You run a regression of the natural log of birth weight on maternal age and the
natural log of a mother’s annual income in thousands of dollars:
log (?!) = *" + *#! + *$log (!) + -!

The estimated coefficient on the log of maternal income is *$ = 0.8.

8. What is the interpretation of *$?





1 You can look up p-values here: http://homepage.divms.uiowa.edu/~mbognar/applets/t.html. Where it says “ =”,
enter the degrees of freedom. Where it says “ =”, enter the value of the test statistic. Choose “2( > ||)” from
the drop-down list for a two-sided test.
Data analysis

For these questions, you will be using the dataset “ncrp_mi_2004.dta”. This is the National
Corrections Reporting Program (NCRP) dataset on all people who entered prison in Michigan in
2004.

Please turn in the responses to these questions (as a PDF), the Stata log file, and the Stata do file.
The Stata do file must be commented to make it possible for someone to follow along with what
you are doing; for an example of commented code, please see the linear regression simulation
code uploaded to Canvas (e.g., “PAM4101_linear_regression_bivariate.do”).

1. How many variables are in this dataset?



2. How many observations are in the dataset?



3. How many variables (and which ones) have missing observations?



4. How many variables (and which ones) are numeric, rather than string-based?



5. Report the median and mean admission age of people who entered prison in Michigan in
2004 to one decimal place.



6. Imagine that you did not have the ageadmission variable in this dataset. Construct a
variable containing a person’s approximate admission age. (Hint: everyone entered prison
in 2004 in this dataset.) How do the median and mean of your new variable compare to
the median and mean of the true admission age?

7. Using the totalmaxlength variable, which reports a person’s maximum sentence length in months,
how much longer, on average, are sentences for people whose offense was violence-related,2 and
what is the value of the test statistic for this difference calculated using heteroskedasticity-robust
standard errors?



2 Violence-related offenses in this dataset include Assault, Murder Homicide, Rape, and Robbery.
8. How much longer, on average, are sentences for women whose offense was violence-related
compared to women whose offense was not violence-related, and what is the value of the test
statistic for this difference calculated using heteroskedasticity-robust standard errors? Can you
reject the null that women with violent and non-violent offenses have similar sentence lengths?
Use a regression with an interaction term to obtain this.
essay、essay代写