A4-stata代写|学霸联盟

A4-stata代写

时间：2024-04-22

A4 – Descriptive research
ECON7030
A4 – Descriptive Research
• Please find the Assignment Instructions HERE.
• Please find the Marking Rubric HERE
SUBMISSION CHECKLIST
1. Upload a PDF version of the report (mention the title of your replication paper & your SID in the
title page.
2. Upload your do file.
3. Upload your data file ( it is okay to provide a link to your data file).
Key Tasks
• Key tasks for A4
1. Step 1: Request data (OR data extract) for your selected OP (I assume all of you have already completed this
step).
2. Step 2: Clean the extracted data and create the required variables. Refer to Canvas module ‘Project
resources’ for resources on data mining/descriptive research.
3. Step 3: Describe the data following A4 guidelines.
4. Step 4: Create a table of summary statistics similar to the table of summary statistics reported in your OP.
5. Step 5: Discuss the results.
So, what are you expected to do to implement the key tasks?
1. Be ready with the Master File (raw data - HILDA, AEMO, DHS, IPUMSI, IMPUMS CPS, IPUMS US).
2. Know your data.
3. Clean and process the data before you start constructing your variables. Otherwise, your results cannot be
trusted.
4. Check summary statistics and distribution of variables to be able to identify odd patterns /abnormalities in
data.
5. Transform the data where relevant – e.g., taking log of income, wage, expenditure etc. and noting the
distribution.
Know your data
Ask yourself the following questions:
• What are the types of attributes or fields that make up your data? E.g., numeric vs string
variables in Stata.
• Which attributes are discrete, and which are continuous-valued?
• What do the data look like?
• How are the values distributed?
• Are there ways we can visualize the data to get a better sense of it all?
• Can we spot any outliers?
• Source: Han, J, Kamber, M, & Pei, J, 2011, “Getting to Know Your Data,” in Data Mining: Concepts and Techniques, Elsevier Science &
Technology, United States.
Data Processing: Note the sample restrictions
E.g., analysis is restricted to HH heads only.
Clean the data
“Real-world data tend to be incomplete, noisy and inconsistent” (Han & Pei
2011, pp. ). Unless you clean and preprocess the data, your results would
NOT be reliable.
At the least:
• Identify and fill in missing values (if relevant)
• Identify outliers (abnormal values) to smooth out noise. E.g., BMI 0 or age
-5 years.
• Deal with observations which are not in the universe (e.g., questions not
asked)/ or observations with ‘Don’t know’ response etc.
Recode variables where relevant
• E.g., convert responses such as DK to missing where relevant
• Or collapse a variable with 12 different codes to fewer
categories (four only) or to a dummy variable with 2
categories only (0 and 1).
Resources
• Visit Canvas Module: Data mining & descriptive research.
• Go the page: The value of descriptive research
• Visit Stata Master Class Resources.
Browse ALL resources ☺