Household Finance
Problem Set #3
Due: May 16, 2021 by 11:59PM
Instructions: PLEASE READ CAREFULLY
• For Question 1, you will need to create an account on IPUMS-USA by going to the
following link: https://usa.ipums.org/usa/ Go to “Log In” at the top right, and “Create
an account.” Do this immediately because it may take some time for your
account to be approved.
• Your answers should be electronically submitted on Canvas before midnight on May 16,
2021. To make it easier for us to grade, submit your answers as separate *pdf files for
each question, indicating Q1, Q2, Q3 in the filename. Canvas will accept multiple files,
and there is no need to zip your files or put your name in the filename.
• Use the Discussion Board. If you have any questions, please post them to the “Discussions”
section on Canvas.
• For each question, you will need to use a statistics package of your choice (like Stata
or R). Include your code and/or log file with your submission (saved as *.pdf) that
clearly indicates the code ran without error. The log file should include your name and
a time-stamp of the date the code was run. Log files will substitute for submitting
code if the log file prints out all the code, in addition to the output. For example, for
question 3, you should submit two separate files, one titled “Q3 answer.pdf” and one
titled “Q3 log.pdf.”
In Stata, the best way to log is to type the following at the top of your Stata *.do file:
cap log close
log using Q3 log
• Answers submitted without accompanying Excel/log/code file will receive 0 credit for
that question.
• You may consult with your classmates, but you must submit your own independent
answers and code/log files. Your answers and code should indicate that YOU are the
author, and when we look at the file properties the file should say it was created by you.
Under no circumstances should you email each other your answers or code. Students
found to be submitting the work of others, or to have shared their code or answers
(both the supply and demand side), will face academic discipline.
• At least 1 point will be awarded just for submitting your problem set according to these
instructions.
1 of 5
Household Finance Problem Set #3
Due: May 16, 2021 by 11:59PM
1 Using the Census to Estimate the Returns to Education by Major
After creating an account on IPUMS-USA, navigate to “SELECT DATA,” and add the
following variables to your “cart”:
PERSON → DEMOGRAPHIC → AGE
PERSON → DEMOGRAPHIC → SEX
PERSON → EDUCATION → EDUC
PERSON → EDUCATION → DEGFIELD
PERSON → INCOME → INCWAGE
HOUSEHOLD → ECONOMIC CHARACTERISTIC → OWNERSHP
When you are done adding variables, navigate to “SELECT SAMPLES.” Select the 2018
ACS sample only. Be careful, because IPUMS will default to selecting many years, and the
file will be quite large. Click “SUBMIT SAMPLE SELECTIONS” to close this window.
Next, “VIEW CART,” and click “CREATE DATA EXTRACT”
Under “OPTIONS” → “SELECT CASES,” select cases ages 22+, and click “SUBMIT.”
You are now ready to “SUBMIT EXTRACT.” Relax for approximately 10 minutes while
your extract is prepared. (You may need to manually refresh the IPUMS “DOWNLOAD
OR REVISE EXTRACTS” page.)
Download the dataset (the .DAT file) as well as the STATA or R Command File, depending
on which statistics package you are using. You will need to unzip the *.dat file, and put
both the raw *.dat and Command Files in the same folder. In your statistics package, cd to
the directory where you saved the files, and run the Command File. You are now ready to
analyze Census data!
(a) Let’s prepare wages for analysis.
Generate a log wage variable (where log refers to the natural logarithm) and the variables
age2 and age3. In Stata, this code is given by:
gen log wages = log ( incwage )
gen age2 = age ˆ2
gen age3 = age ˆ3
Now, you are ready to run a regression. Calculate the returns to schooling by running a
regression of log wages on an indicator variable for each value of educ, as well as age, age2
and age3. Hint: In Stata, this is very easy if you use the factor variable notation. The code
would be
r e g r e s s log wages age age2 age3 i . educ
2 of 5
Household Finance Problem Set #3
Due: May 16, 2021 by 11:59PM
where i.educ will automatically create indicators for each value education takes on. Note
that one category will be automatically omitted by Stata. In this case, this will be the first
value that “educ” takes on, which represents no schooling. Your effects will be relative to
this omitted group.
Run this regression pooling men and women, and then separately for men and women.
Report your results and describe them.
(b) Create an indicator for any homeownership. In Stata, this is accomplished by:
gen ownhome indicator = ownershp ==1
Run an OLS regression of the indicator on an indicator variable for each value of educ, age,
age2 and age3, i.e. “a cubic in age.” Also run the comparable logit regression. Compare the
results.
(c) Now, let’s see how returns differ across the field of degree. This regression will only make
sense for people who have a college degree, so first keep only people who have a bachelor’s
degree. In Stata:
keep i f educ>=10
Run a regression of log wages on a cubic in age and an indicator variable for each value of
the variable degfield.
Which degree fields have the highest returns? Which degree fields have the lowest? How do
returns differ across men and women?
3 of 5
Household Finance Problem Set #3
Due: May 16, 2021 by 11:59PM
2 Using the Survey of Consumer Finances (SCF) to Measure the Effects of
Student Loans on Home Ownership
In this question, you will run a series of weighted regressions using the SCF.
Read “SCF 2016 processed” into your favorite statistics package. This file is a lightly
processed version of the original SCF data for 2016. If you are curious how I processed
the data, please see my files on Box.
Note, because the SCF is not a random sample of the population but oversamples wealthy
households, it is very important to account for this when running regressions! In Stata,
sampling weights are applied in a regression according to the following syntax:
r e g r e s s y x1 x2 [pw = weight ]
where y is a dependent variable of interest, x1 is the first independent variable, x2 is another
independent variable, and weight is a variable containing the sampling weights. Again,
make sure to use weights when running your regressions below!
(a) The variable balance studentloan reports student loan balances. Construct a new
indicator variable that takes on the value 1 if a consumer unit carries a positive student
loan balance and 0 otherwise.
The variable ownhome indicator is an indicator variable for owning a home that I
already constructed for you. Run a regression of ownhome indicator on your newly
created indicator you constructed for having a student loan balance. What do you find?
(b) Use the age variable to restrict the sample to 25-39 your olds and rerun your regression
restricting to this age group. How do your results compare?
(c) Continue to focus on 25-39 year olds. Add the following controls to your regression:
high school, college, grad school, white, numberkids, log income, log income2
How do your results compare?
(d) The regressions above reports the effect of having a student loan for the average student
loan balance. We can also use variation in the amount of student loan balance that
an individual carries. Construct a new variable that captures the student loan debt-
to-income ratio by taking balance studentloan and dividing it by income. So that
that variable does not contain any large outliers from people with very low income or
very high debt, “topcode” your new independent variable by replacing all values above
5 with 5.
Replace the indicator variable for student loans in part (c) with your new independent
variable measuring the debt-to-income ratio and rerun your regression. Describe your
results.
4 of 5
Household Finance Problem Set #3
Due: May 16, 2021 by 11:59PM
3 Consumption versus Income Inequality
Read in the file “CGK replication” to your statistics package. This is annual data from the
consumer expenditure survey used in the paper “Consumption Inequality and the Frequency
of Purchases” by Olivier Coibion, Yuriy Gorodnichenko and Dmitri Koustas.
(a) The variable fincbtax is income before taxes (Note that it is missing for 2004 and 2005.)
Calculate and plot the cross-sectional standard deviation (a measure of inequality) of
yearly income for each year. Describe this figure.
(b) The variable nondurables estimation sample is non-durables spending. Calculate
and plot the standard deviation of non-durables spending for each year. Describe this
figure and compare it to the figure for income.
(c) The variable educ ref reports the education of the household reference person. The
way this variable is coded changes over time. Before 1996, a value of 5 or 6 for educ ref
indicated a college degree, while 1997 and later it was changed so that a value of 15,
16 or 17 indicated a college degree. The Stata code to recode this variable is given as
follows.
gen c o l l e g e = 0
r ep l a c e c o l l e g e = 1 i f ( educ re f>=5 & educ re f<=6 & year <1996) | \\\
educ re f>=15 & educ re f<=17 & year>=1996)
Plot the standard deviation of nondurables consumption separately for people with and
without a college degree. What do you find? These results are only for nondurable
goods. If you add in durable goods to this analysis, how might inequality change?
(Hint: think back to Q1, Part (b), above.)
5 of 5
学霸联盟