ECOM30002/90002 Econometrics 2, Semester 2, 2025 Capstone Project Assignment 1 (Proposal) 1 Overview The capstone project combines knowledge in econometrics and economics with practical data analysis skills. You will use the 2018-19 wave of the “Burkina Faso Enqueˆte Har- monise´e sur le Conditions de Vie des Me´nages (Harmonized Survey on Household Living Standards)” dataset to develop a unique research question and undertake empirical anal- yses across three Capstone Project Assignments, of which this is the first. • The capstone project is a group project. – Groups must consist of 3 students, all from the same tutorial. Groups will be the same for the three assignments of the capstone project throughout the entire semester. – Students may form their groups independently and register their group with their tutor during tutorials of Week 1–3. – During the Week 3 tutorial, the tutor will form groups of any remaining stu- dents who are not already in a group. • This assignment (Assignment 1 – Proposal) is worth 5% towards the final mark and has two components: 1. Oral defence Takes place in person during Week 4 (18–22 August). Details in section 2.1 below. 2. Proposal report Submission of the report must be online through the LMS assignment portal and is due Friday, 29 August, 11:59 p.m. Late submissions will not be accepted. Details in section 2.2 and 2.3 below. • Marks: – Each group member will receive equal marks. – The oral defence is marked between 0.4 and 1, where 1 is awarded for a satis- factory oral defence. 1 – The submitted proposal report is marked between 0 and 5. – The total marks for “Assignment 1 – Proposal” are obtained as “toal marks” = “marks oral defence” × “marks submitted proposal report”. – For groups that do not book or show up to the oral defence, 0.4 marks will be given for the oral defence. – In the case that not all members are able to attend the oral defence, 0.4 marks will be awarded for the oral defence unless the group submits a signed statement of cooperation (template available on the LMS) to “Project proposal oral defence (statement)” on Canvas. 2 Assignment tasks 2.1 Oral defence The oral defence is a required component of the 1st capstone assessment and forms an essential part of the feedback process that helps you towards a successful proposal submission and capstone project delivery. • Groups book a 10-minute oral defence via the LMS. Details on how to book will be announced through the LMS. • Penalties may be imposed for being late or/and exceeding the time limit. • All group members attend the booked oral defence. • During the oral defence, the group members show a written full-length draft of their Proposal report and discuss it with the capstone tutor: The group presents their proposed research question (or questions) and the variables they have identified as suitable to answer the question. The group shows/discusses some descriptive statistics and at least one regression result for these variables, and answers questions regarding the preparation of the analysis and the results produced. 2.2 Written assessment: Research proposal report • Download the data set following the instructions in section 3. • Develop an interesting research question: Identify variables within the “Burkina Faso Enqueˆte Harmonise´e sur le Conditions de Vie des Me´nages (Harmonized Sur- vey on Household Living Standards)” dataset that are related and have a plausible 2 causal link. This will form the basis of your research question. Briefly motivate your research question based on economic arguments and previous literature. • Pin down the key causal relationship: Construct a causal linear regression model. Clearly define your outcome (dependent) variable and primary explanatory (inde- pendent) variable, explaining the expected direction and nature of their relation- ship. Provide a rationale for the hypothesized causal relationship. • Selecting key variables: Choose an outcome/dependent variable that reflects the impact or effect you are interested in examining. Identify a treatment/independent variable that represents the cause or intervention you believe affects the outcome variable. Include a few additional regressors that can control for other factors influ- encing the relationship between your primary independent and dependent variables, enhancing the robustness of your findings. Remember, the selection of variables should be driven by your research question. This approach aids in the clarity and precision of your analysis. • You may choose up to two dependent variables that are relevant for your research question, with one key explanatory variable and up to four further control variables relating to each dependent variable. If you have two dependent variables, they should reflect the same causal relationship of interest. • Present a table of descriptive statistics for the selected variables. The table should show the mean, standard deviation, minimum, and maximum for each variable. Briefly interpret the statistics. • Present a table of regression results for the selected model and variables. No dis- cussion of these results needed. • Report submission guidelines and formal requirements are detailed in section 2.3. 2.3 Proposal report submission • The proposal report submission is due Friday 29 August at 23:59 (11:59 PM). • The report must be submitted online as a single PDF file. • The proposal report must start with a cover page that contains a preliminary title and the names and student numbers of all group members. • Reports need to use a font size of 12, line spacing of 1.5, and margins of at least 2 cm. 3 • Maximum word count: 200 words. Tables and graphs are exempted from the word count. Penalties may be imposed for exceeding the word count by 50% or more. There are no penalties for reports under the word limit. • Any data analysis has to be performed in R/RStudio, and the R-code used to produce all results in the report must be submitted as an appendix. R-code does not count towards the word count limit. The R code should include clear comments explaining the purpose of major steps. • You are required to keep a copy of your submission after it has been submitted. 3 The dataset • Overview of the data The World Bank owns a library of survey datasets, many of which are freely avail- able for public, non-commercial use. For this capstone project, you will utilize one of these datasets, the “Burkina Faso Enqueˆte Harmonise´e sur le Conditions de Vie des Me´nages (Harmonized Survey on Household Living Standards) 2018-2019”, to formulate your own research question and conduct empirical analyses. This is a nationally representative survey which interviewed with 7,010 households and as much as 45,612 individuals in Burkina Faso during 2018–2019. Surveyed top- ics include living conditions, health, education, food consumption, and other char- acteristics. The full survey dataset contains community, household, and individual- level data. You may initially focus on the individual-level datasets to explore variables you may be interested in. • Downloading the data To access the data, download the “Burkina Faso Enqueˆte Harmonise´e sur le Con- ditions de Vie des Me´nages (Harmonized Survey on Household Living Standards) 2018-2019” from the World Bank Microdata Library at https: //microdata.worldbank.org/index.php/catalog/4290/get-microdata. You will need to sign-up for a free account. Steps to sign-up such an account are as below: 4 – Click on the “Get Microdata” tab of the page to download the data, accepting the terms and conditions first. – Click the “Register Button” – Fill-in the user registration information. Once you register, a confirmation email will be set to the address you provided. – Log in – Return to the page of the dataset and click on the “Get Microdata.” – You will be redirected to a data use application, where you only need to enter a brief description of your intended use of the data. A short sentence, such as “Use for a university project,” should suffice. – Accepting the terms and conditions and submitting the application will redi- rect you to the data files. Apart from the link to download the data, the webpage contains a description of the survey, a description of the data, and other documentation such as the questionnaires used in the survey, which will be indispensable for working on your project. Key information on the webpage: – Variable descriptions: DATA DESCRIPTION > data file – Questionnaires: DOCUMENTATION > Questionnaires – Data files: GET MICRODATA Note that since the survey was conducted in French, the documentation does not provide English translations of the questions corresponding to each variable. How- ever, automatic translation functions available in most browsers, such as Google Chrome, Safari and Microsoft Edge, can be used for quick translations. • Data formats There are several data formats available for download. You may use the csv format as in the tutorials. Alternatively, the data is available in Stata format (dta), which can be imported into R via the navigation menu by selecting File > Import Dataset > From Stata. This method attaches descriptive labels to the variables, potentially facilitating data exploration. • Data files 5 The download package contains 50 data files (and a further zip file with more data files about consumption) with varying levels of granularity (i.e., community, household, or individual-level). To use information from different sections, you need to merge the corresponding data files in R into one single dataframe that you can use for your econometric analysis. See ?merge in R. • Merging dataframes: Example Across datasets, each individual can be identified using their cluster (grappe), house (menage) , an individual (s01q00a) identifiers. Combining these can generate a unique ID for each individual, which is required to merge datafiles. Below is sample R code for merging datafiles: Load data files: df1 <- read.csv("s03 me bfa2018.csv") df2 <- read.csv("s04 me bfa2018.csv") Create individual ID (iid) for df1 and df2 by concatenating grappe, menage, and s01q00a variables, then converting the new variable into a factor: df1$iid <-paste0(df1$grappe,str pad(df1$menage, 3, pad = "0"), str pad(df1$s01q00a, 2, pad = "0")) df2$iid <-paste0(df2$grappe,str pad(df2$menage, 3, pad = "0"), str pad(df2$s01q00a, 2, pad = "0")) Merge the datafiles by iid df merged <- merge(df1, df2, by = "iid") • Further considerations on the data – You are not limited to variables from two data files. By repeating the merger procedure, you can merge any number of them. – Be sure to carefully read the documentation to more fully understand how each variable is represented (i.e., missing values may be indicated as ”NA”, ”9999”, or merely left blank). Binary variables may also be stored as 1 or 2 as opposed to 0 or 1. 6 4 Further instructions for working with the data • Ensure all your regressions and descriptive statistics are obtained using the same estimation sample, a practice known as “complete case analysis”. This means all re- gressions and descriptive statistics should include the same number of observations. Due to missing values in many variables (e.g., from non-response or irrelevance), the estimation sample size may vary based on the variables included in a regression. After determining which variables you will use, select your estimation sample for the entire assignment based on the subset of observations with non-missing values across these variables. Functions like na.omit(), subset() or complete.cases() in R can assist in this process. • When selecting variables for your analysis, you may need to transform them to better fit your analysis. This could involve converting a categorical variable into dummy variables, applying logarithmic transformations to continuous variables, etc. If you transform any variables, explain why. Also, check for and exclude unreasonable values (e.g., “999” for non-response or blank values). If observations are excluded, justify your reasoning. Avoid using variables with very low response rates to maintain an estimation sample size of at least 300 observations. • Use clear and interpretable names for variables in your text and tables, such as “birth year” instead of codenames like s01q03c. Reference the original codename only once when first mentioning the variable. You should only mention the original codename briefly at first mention of the variable in your text. • Number tables and provide descriptive titles, ensuring they are self-explanatory, free of unnecessary information, and formatted simply. Tables should be understandable on their own, typically requiring 2–4 decimal places for interpretation. Include any additional notes for clarification below the table. For examples of well-formatted tables, refer to the American Economic Review. R packages such as stargazer can generate publication-quality tables; see the cheatsheet at https://www.jakeruss. com/cheatsheets/stargazer/ for a quick start. 5 Suggested reading The following chapters from the subject’s recommended references contain helpful general information on carrying out an empirical project and thinking critically about research studies: 7 Wooldridge, Jeffrey M. (2019), “How to carry out an empirical project,” Chapter 19, Introductory Econometrics: A Modern Approach, 7th Edition, Cengage. Stock James H., Watson Mark W. (2015), “Assessing Studies Based on Multiple Regres- sion,” Chapter 9, Introduction to Econometrics, 3rd Edition, Pearson. 8
学霸联盟