EMET8005 -stata代写|学霸联盟

EMET8005 -stata代写

时间：2025-05-05

EMET8005 Assignment
Instructions The assignment is due 12 noon on Monday 12 May 2025. Late submissions will
receive a mark of 0 unless an extension has been granted before the deadline, as per the course
outline.
The assignment can be completed individually or in groups of up to three students. Either
way, your submission must be all your own or your group’s own original work. To submit, you
need first to create and register a (possibly one-person) group using the ‘Assignment group
selection’ link in the Week 10 block on Wattle. A group makes a single submission. Multiple
submissions of the same or similar reports will be considered plagiarism.
Your report should be uploaded to Wattle using the link provided. The upload link has
three tabs for uploading the report, the do file and the log file separately. Your report should
be typed and the file should be in either Word or pdf format. Part of the assignment is to
present results ‘professionally’. This means that there should be no Stata commands or Stata
output in the main text. Extract the information you need from the Stata output, and create
nice tables and figures similar to those you see in textbooks and journal articles. The do file
must be annotated with explanatory comments, so that it is clear what results are sought, and
it must run without syntax errors (assuming the data file is in the current working directory).
There is no strict word limit but, everything else equal, a clear and concise writing style
may attract higher marks. We anticipate most reports will be between 600 and 1400 words
(excluding tables).
If you have any questions about the assignment, please email us. There is no penalty for
asking.
Do questions (a), (b) and (c) and two of the remaining questions.
Children and household saving and spending behaviour This assignment will explore
how households’ saving and spending behaviour in China vary with the gender and the age of
children.
The information available in English about the data source seems to be limited. It appears
the data are representative of 29 provinces. The sampling design is complicated. The population
of households is stratified by province. Within each province, the primary sampling units are
counties (including county-level cities and districts) and the counties are stratified according
to GDP and urbanisation. Some counties are randomly selected within each stratum. Within
each county, a small number of household clusters are randomly selected.
Download the file as2025.dta from Wattle. This data extract is limited to households
whose survey respondent is between the ages of 18 and 60 years, who have 1 or 2 unmarried
children under the age of 30, and have strictly positive income and assets. The variables have
brief explanatory labels. Some abbreviations are used: HH household; SR survey respondent (ie
the person in the household who answers the questions); FB first-born child; LB last-born child
(for households with 2 children). Note that first-born and last-born are based on the children
who are living with their parents at the time of the survey, and children who have moved out
are not included. Beware that the FB and the LB is the same person for households with 1
child.
The dataset has several variables for family type. (Most households consist of a single
family, so we can use the terms more or less interchangeably.) The variable famA has 16 types
based on the number of children, their gender and their child/adult status: 1 vs 2 children, FB
male or female, FB over vs under age 18, LB male or female, LB over vs under age 18. The
2value labels for famA are of the form xy , where x represents the FB child and y the LB child
and where x and y are one of f for female child under 18, m for male child under 18, F for
female child 18 or over, and M for male child aged 18 or over. (Hint: Use tabulate famA to
see the value labels.) The variable famB ignores the characteristics of the second child, and
famC further ignores the characteristics of children under 18.
We will use these data to investigate two findings about household behaviour from a team
of Chinese researchers:
• Compared with households where the FB is an adult female, households with a FB adult
male child save a higher proportion of the household’s income.
• Compared with households where the FB is an adult male, households with a FB adult
female child spend more on education.
There is also a third finding, namely that the gender of young (ie under 18) children don’t
matter much for household saving and spending behaviour.
The main outcome variables of interest for us are the household’s overall saving rate, savr1 ,
and the household’s expenditure on education, eduexp. The main explanatory variable is the
family type, but of course other control variables may be used.
There are several issues we need to be aware of up front:
• Roughly 40% of households have negative saving. The distributions of saving rates are
extremely skewed with long left tails. It is unclear how much of this is real and how much
is measurement error. The original researchers left-censored the saving rates at −2. The
censoring rate is roughly similar across famB family types.
• The distribution of education expenditure is extremely skewed with a long right tail, and
roughly 10% of households do not spend on education. The original researchers chose
to analyse logex = ln(eduexp + 1) instead of eduexp. Given the distribution of eduexp,
this is essentially the same as left-censoring eduexp at 1 RMB. The censoring rate varies
substantially across famB family types.
• Note that some households may have education expenditure for someone who doesn’t
live in the household (eg at boarding school or university) and, vice versa, the education
expenditure for someone enrolled in education may be paid by someone not living in the
household (eg by extended family).
• The variable for household education expenditure is actually the aggregate of expenditure
on education and entertainment (presumably books, music, movies, bars, toys, artworks,
sports, etc). The original researchers did not explain or comment on this issue.
The data file comes with a household sampling weight, based on inverse sampling proba-
bilities. Sampling weights should be used, or regressions should include county dummies (or
controls for income and urbanisation as well as province dummies) to counter the distortion
from stratification. Standard errors should be clustered at the county level (or higher).
(a) All good researchers will begin by thoroughly examining the properties of their data, even
if their their final reports include only a table with summary statistics. Here we include
more detailed information so that you can get marks for your work.
For each variable, report whether it is binary, categorical or continuous. (Treat a variable
as continuous if the mean and standard deviation have practical meaning). For dummies
3and categorical variables, report the number of categories, the number of observations in
the smallest cell, the number of missing values, and anything else you find interesting.
For continuous variables, report the unit of measurement, the mean, standard deviation,
the range, and number of missing observations, and anything else you find interesting.
Create histograms for the variables savr1 and logex and for res age. Add commentary
were relevant.
(b) Create a table of estimation results (coefficients, standard errors and anything else useful
for your discussion) from regressing savr1 on the simplified family type variable famB
and other control variables. Consider 2 estimation methods, WLS and OLS, and 4 model
specifications with an increasing set of regressors: first, only famB ; second, famB and
county dummies; third, further add controls for household characteristics logincome,
logassets, nchild , and rural ; and finally add controls for the respondent’s characteristics
res age, res age2, res male, res yos, res health, res married amd res job. For each
regression, compute also estimates of the difference in the conditional average saving
rate between households where the FB is an adult male vs the FB is an adult female, and
add them to the table. (Hint: You can use the lincom command.) The table should have
8 columns for the 4 model specifications with 2 estimation methods each. As mentioned,
cluster all standard errors at the county level. For simplicity, you do not need to report
results for the control variables other than famB unless you notice something worthwhile
commenting on. (Hint: The areg and reghdfe commands can be useful for reducing
uninteresting output.)
Discuss the results in your table. In particular, do you think the results support a conclu-
sion that households with an FB adult male saves more than households with an adult
FB female? How much more?
(c) Repeat the analysis in (b) for logex instead of savr1 . In your discussion, explain whether
you think the results support a conclusion that households with an adult female FB spend
more on on education than households with an adult male FB? How much more?
(d) The variable savr2 excludes the expenditure on education (and entertainment) from con-
sumption when computing the household’s saving rate. Examine how the conditional
mean of savr2 vary across family types and discuss if the findings are consistent or not
with the results for savr1 and logex .
(e) The variables fbenr and fbyos relate to the FB’s past and current involvement with
education. Restricting the sample to households with at least 1 adult child, examine how
their conditional means across family types and discuss if the findings are consistent or
not with the results for logex .
(f) Investigate whether it would be ok to simplify the model specification by restricting the
saving or spending behaviour of households with young male FB and young female to be
the same.
(g) Investigate whether the average saving behaviour and/or spending on education is the
same or different for families with 1 and 2 children. For example, try analysing the
subsamples separately and/or add appropriate controls in the regressions for the full
(combined) sample.
4(h) Investigate the sensitivity of the analysis and the conclusions to the left-censoring of
savr1 . For example, try comparing the result with different degrees of censoring or try
trimming/truncating the sample instead of censoring it.
(i) Investigate and discuss an interesting issue of your own choosing.

学霸联盟