1 QM2 Assignment 2 Sem 1 – 2025 Brief – Subgroup Analysis Some weeks ago, you were hired to consult CityScape, a smart city consultancy interested in analysing citizen behaviour and sustainability trends. In your previous analysis, you explored whether the typical carbon footprint per person falls below the sustainability benchmark of 25 kg CO₂ per day. Now, the chief data analyst at CityScape has asked you to extend your analysis by focusing on the impact of daily transport choices on citizens’ carbon footprints. Your new task is to complete a technical report that includes the following components: 1. Subgroup Analysis of Carbon Footprint Calculate and present point and interval estimates (i.e., confidence intervals) for both the mean and standard deviation of carbon footprint for a set of subgroups of the ‘Mode of Transport’ variable. ▪ Include an interpretation of these confidence intervals. ▪ Describe any patterns or insights you observe when comparing carbon footprints across different transport modes (e.g., EV, Public Transport, Bicycle, Walking, Private Car). 2. Does Average Carbon Footprint Statistically Differ by Transport Mode? Conduct a hypothesis test to determine whether there are statistically significant differences in the mean carbon footprint across the different transport modes. Specifically: • Is there a difference in mean carbon footprint between groups defined by mode of transport (e.g., EV, Public Transport, Bicycle, Walking, Private Car)? • If the result above is statistically significant, follow up with pairwise comparisons: o Conduct a series of independent two-sample t-tests2 comparing carbon footprints across transport mode pairs. o Apply a Bonferroni correction (see next page) to control for Type I error due to multiple comparisons. Be sure to report assumptions, interpret results clearly, and explain the implications for urban sustainability strategy. 1 The Chief Data Scientist has reminded you that the ANOVA style test is robust to departures from normality, especially where the sample size is this large. So, you do not have to assess normality in this report. However, you will need to determine if an ANOVA F-test or the Welch F-test is more appropriate. To do so you may assess the relevant sub- sample statistics as well as a Levene's test (see tutorial 6) and explain your conclusion. 2 You may carry through whatever assumptions you have made in footnote 1 above. o Conduct a parametric hypothesis test1 to evaluate whether mean carbon footprint differs across these groups. 2 The Bonferroni correction The Bonferroni correction is a method used to adjust the significance level (alpha level) of statistical tests when multiple comparisons are made simultaneously. In situations where you're conducting multiple hypothesis tests simultaneously, there's an increased chance of obtaining at least one false positive result (Type I error) simply due to chance. The Bonferroni correction adjusts the threshold for statistical significance to account for this increased risk. It divides the desired significance level by the number of comparisons to reduce the chance of false positives. For example, if you're conducting 5 tests and want a 0.05 overall Type I error rate, each test's significance level would be adjusted to 0.05/5 = 0.01 (so alpha is now 0.01 after the adjustment, and you can compare your p-value to this directly). It is a conservative method but helps control the overall Type I error rate. PLEASE READ THE FOLLOWING CAREFULLY. You will need to re-form your groups for Assignment 2. Further instructions are provided on the CANVAS page. In this assignment: • Make sure you read the whole assignment. • Use the same dataset provided in assignment 1. • Use a level of significance of 0.05 for all calculations. In applying the Bonferroni correction above, start with a significance level of 0.05 and divide this value by the number of t-tests you are conducting. • You do not need to manually calculate statistics in this assignment. Do all calculations in R using relevant packages and functions. For hypothesis tests please use the p-value approach. • State assumptions and list all necessary steps and decision rules. Present analysis in language suitable for a technical audience (the chief data scientist) but also provide a non-technical summary of results. • Make sure to present clean tables and format text neatly in keeping with what would be suitable for your target audience (in this case your consultancy client). • When deciding what to do, remember that this is an assessment, and you are encouraged to show off all your relevant learnings from topics introduced in this course. Use R for all calculations and provide all code used in your appendix. Your task will be evaluated on a scale of TEN points, with an equal distribution of weight for data setup, preliminary analysis, test selection, test execution, and overall presentation. The TOTAL WORD LIMIT for your answers is 1000 WORDS excluding tables and graphs. You are not required to use up the whole word limit. You may include any code used in the Appendix. There is no word limit for the Appendix or Bibliography. Feel free to ask clarifying questions on EdDiscussion. GOOD LUCK!
学霸联盟