1 QM2 Assignment 3 Sem 1 - 2025 Brief Having reviewed your earlier carbon footprint analysis, your client CityScape Insights is impressed with your work. Your preliminary investigation raised new questions about what factors drive individual carbon footprints in urban environments. The client now requests a deeper statistical analysis: a multiple regression model to explain differences in carbon footprint among individuals, based on the characteristics collected in the dataset. Your findings will help guide sustainable urban planning and identify key intervention points. Prepare your response in a report format targeting the Chief Data Scientist. It should present clear technical detail but also include a summary of key insights in plain English. Use the same data provided to you in Assignment 1 (CityScape_Dataset.csv). Do not modify or clean the data unless specifically required by the analysis — we wish to understand real- world variation as it exists. Word limit: 1000 words (excluding tables, graphs, appendices, and bibliography). The following page contains a set of tasks to help formulate your final report. 2 Task You will be regressing Carbon Footprint (kg CO₂) against the following explanatory variables: • Age • Gender • Mode of Transport • Work Hours • Shopping Hours • Entertainment Hours • Home Energy Consumption (kWh) • Charging Station Usage 1. Specify the regression model you intend to estimate. Use appropriate dummy variables for categorical predictors like Gender and Mode of Transport. 2. Before estimating, prepare a table indicating the expected relationships between each independent variable and carbon footprint. What signs do you anticipate for the coefficients and why? 3. Calculate Pearson correlations between each numeric independent variable and the dependent variable. Present these in a table and comment on any correlations that contradict your expectations. 4. Estimate the multiple regression model and present the output in a clean, readable table. If you encounter perfect multicollinearity take steps to fix it. Once you have your final regression: a. Provide an interpretation of R² b. Use an F-test to assess the overall validity of the model at the 5% level c. Provide an interpretation of each of the estimated coefficients. Do the signs of the coefficients match your expectations? At the 5% level of significance, test whether each of the variables makes a significant contribution to the model. 5. Conduct an analysis to determine if any problems exist (check the relevant regression assumptions). This would include: a. Checking the normality assumption and commenting on any consequences, b. Checking the homoskedasticity assumption and commenting on any consequences, c. Checking if imperfect multicollinearity might be a concern and commenting on any consequences, d. Running a Ramsey Regression Equation Specification Error Test (RESET) and if your model is mis-specified, discussing why your model might be mis- specified (you may, if needed, want to visualize bivariate relationships to understand what the underlying issues might be). Identify any problems but you do not need to re-estimate your model to fix these. 6. Show, with a single example, how you can use your model to generate individual and mean predictions respectively (for a particular citizen and the subpopulation that citizen belongs to). In doing so ensure you provide a point and interval prediction for each case. You may choose any reasonable values for your independent variables to facilitate these calculations. 7. Summarize the findings of your analysis discussing any key relationships and discuss how the results can inform CityScape’s sustainability initiatives.
学霸联盟