EMET8005 Assignment Instructions The assignment is due 12 noon on Monday 12 May 2025. Late submissions will receive a mark of 0 unless an extension has been granted before the deadline, as per the course outline. The assignment can be completed individually or in groups of up to three students. Either way, your submission must be all your own or your group’s own original work. To submit, you need first to create and register a (possibly one-person) group using the ‘Assignment group selection’ link in the Week 10 block on Wattle. A group makes a single submission. Multiple submissions of the same or similar reports will be considered plagiarism. Your report should be uploaded to Wattle using the link provided. The upload link has three tabs for uploading the report, the do file and the log file separately. Your report should be typed and the file should be in either Word or pdf format. Part of the assignment is to present results ‘professionally’. This means that there should be no Stata commands or Stata output in the main text. Extract the information you need from the Stata output, and create nice tables and figures similar to those you see in textbooks and journal articles. The do file must be annotated with explanatory comments, so that it is clear what results are sought, and it must run without syntax errors (assuming the data file is in the current working directory). There is no strict word limit but, everything else equal, a clear and concise writing style may attract higher marks. We anticipate most reports will be between 600 and 1400 words (excluding tables). If you have any questions about the assignment, please email us. There is no penalty for asking. Do questions (a), (b) and (c) and two of the remaining questions. Children and household saving and spending behaviour This assignment will explore how households’ saving and spending behaviour in China vary with the gender and the age of children. The information available in English about the data source seems to be limited. It appears the data are representative of 29 provinces. The sampling design is complicated. The population of households is stratified by province. Within each province, the primary sampling units are counties (including county-level cities and districts) and the counties are stratified according to GDP and urbanisation. Some counties are randomly selected within each stratum. Within each county, a small number of household clusters are randomly selected. Download the file as2025.dta from Wattle. This data extract is limited to households whose survey respondent is between the ages of 18 and 60 years, who have 1 or 2 unmarried children under the age of 30, and have strictly positive income and assets. The variables have brief explanatory labels. Some abbreviations are used: HH household; SR survey respondent (ie the person in the household who answers the questions); FB first-born child; LB last-born child (for households with 2 children). Note that first-born and last-born are based on the children who are living with their parents at the time of the survey, and children who have moved out are not included. Beware that the FB and the LB is the same person for households with 1 child. The dataset has several variables for family type. (Most households consist of a single family, so we can use the terms more or less interchangeably.) The variable famA has 16 types based on the number of children, their gender and their child/adult status: 1 vs 2 children, FB male or female, FB over vs under age 18, LB male or female, LB over vs under age 18. The 2value labels for famA are of the form xy , where x represents the FB child and y the LB child and where x and y are one of f for female child under 18, m for male child under 18, F for female child 18 or over, and M for male child aged 18 or over. (Hint: Use tabulate famA to see the value labels.) The variable famB ignores the characteristics of the second child, and famC further ignores the characteristics of children under 18. We will use these data to investigate two findings about household behaviour from a team of Chinese researchers: • Compared with households where the FB is an adult female, households with a FB adult male child save a higher proportion of the household’s income. • Compared with households where the FB is an adult male, households with a FB adult female child spend more on education. There is also a third finding, namely that the gender of young (ie under 18) children don’t matter much for household saving and spending behaviour. The main outcome variables of interest for us are the household’s overall saving rate, savr1 , and the household’s expenditure on education, eduexp. The main explanatory variable is the family type, but of course other control variables may be used. There are several issues we need to be aware of up front: • Roughly 40% of households have negative saving. The distributions of saving rates are extremely skewed with long left tails. It is unclear how much of this is real and how much is measurement error. The original researchers left-censored the saving rates at −2. The censoring rate is roughly similar across famB family types. • The distribution of education expenditure is extremely skewed with a long right tail, and roughly 10% of households do not spend on education. The original researchers chose to analyse logex = ln(eduexp + 1) instead of eduexp. Given the distribution of eduexp, this is essentially the same as left-censoring eduexp at 1 RMB. The censoring rate varies substantially across famB family types. • Note that some households may have education expenditure for someone who doesn’t live in the household (eg at boarding school or university) and, vice versa, the education expenditure for someone enrolled in education may be paid by someone not living in the household (eg by extended family). • The variable for household education expenditure is actually the aggregate of expenditure on education and entertainment (presumably books, music, movies, bars, toys, artworks, sports, etc). The original researchers did not explain or comment on this issue. The data file comes with a household sampling weight, based on inverse sampling proba- bilities. Sampling weights should be used, or regressions should include county dummies (or controls for income and urbanisation as well as province dummies) to counter the distortion from stratification. Standard errors should be clustered at the county level (or higher). (a) All good researchers will begin by thoroughly examining the properties of their data, even if their their final reports include only a table with summary statistics. Here we include more detailed information so that you can get marks for your work. For each variable, report whether it is binary, categorical or continuous. (Treat a variable as continuous if the mean and standard deviation have practical meaning). For dummies 3and categorical variables, report the number of categories, the number of observations in the smallest cell, the number of missing values, and anything else you find interesting. For continuous variables, report the unit of measurement, the mean, standard deviation, the range, and number of missing observations, and anything else you find interesting. Create histograms for the variables savr1 and logex and for res age. Add commentary were relevant. (b) Create a table of estimation results (coefficients, standard errors and anything else useful for your discussion) from regressing savr1 on the simplified family type variable famB and other control variables. Consider 2 estimation methods, WLS and OLS, and 4 model specifications with an increasing set of regressors: first, only famB ; second, famB and county dummies; third, further add controls for household characteristics logincome, logassets, nchild , and rural ; and finally add controls for the respondent’s characteristics res age, res age2, res male, res yos, res health, res married amd res job. For each regression, compute also estimates of the difference in the conditional average saving rate between households where the FB is an adult male vs the FB is an adult female, and add them to the table. (Hint: You can use the lincom command.) The table should have 8 columns for the 4 model specifications with 2 estimation methods each. As mentioned, cluster all standard errors at the county level. For simplicity, you do not need to report results for the control variables other than famB unless you notice something worthwhile commenting on. (Hint: The areg and reghdfe commands can be useful for reducing uninteresting output.) Discuss the results in your table. In particular, do you think the results support a conclu- sion that households with an FB adult male saves more than households with an adult FB female? How much more? (c) Repeat the analysis in (b) for logex instead of savr1 . In your discussion, explain whether you think the results support a conclusion that households with an adult female FB spend more on on education than households with an adult male FB? How much more? (d) The variable savr2 excludes the expenditure on education (and entertainment) from con- sumption when computing the household’s saving rate. Examine how the conditional mean of savr2 vary across family types and discuss if the findings are consistent or not with the results for savr1 and logex . (e) The variables fbenr and fbyos relate to the FB’s past and current involvement with education. Restricting the sample to households with at least 1 adult child, examine how their conditional means across family types and discuss if the findings are consistent or not with the results for logex . (f) Investigate whether it would be ok to simplify the model specification by restricting the saving or spending behaviour of households with young male FB and young female to be the same. (g) Investigate whether the average saving behaviour and/or spending on education is the same or different for families with 1 and 2 children. For example, try analysing the subsamples separately and/or add appropriate controls in the regressions for the full (combined) sample. 4(h) Investigate the sensitivity of the analysis and the conclusions to the left-censoring of savr1 . For example, try comparing the result with different degrees of censoring or try trimming/truncating the sample instead of censoring it. (i) Investigate and discuss an interesting issue of your own choosing.
学霸联盟