QBUS2810-无代写|学霸联盟

QBUS2810-无代写

时间：2023-05-16

QBUS2810
Statistical Modelling for Business
Semester 1, 2023
Group Assignment
This group assignment is worth 20% of your final result in the unit, of
which 5% is your team’s assessment of your contribution and the rest is the
marker’s assessment of your team submission.
The deadline is Wednesday of Week 13, May 24 by 11:59pm. Submission is
on Canvas via Turnitin.
This assignment must be completed in your Canvas group.
Maximum Length: There is no maximum page length for this assignment. If you
have something interesting and worthwhile to include, then please do so without worry-
ing about a page limit. However, irrelevant or overly long-winded material will reduce
your overall mark. As a guideline, in previous runs of this class the typical report had
between 20-25 pages, excluding Python code.
Notes on Marking:
• The assignment will initially be marked out of 55.
• Up to an additional five (5) marks will be awarded based on the overall pre-
sentation quality of your report. Thus, you will receive a total mark for this
assignment out of 60. You will lose some of these 5 presentation marks for poor,
inefficient, unclear and/or unprofessional presentation. You will be rewarded for
professional, efficient and clear presentation methods. I expect your final report
to be done in a professional editing package and to be submitted in pdf only.
Html files of jupyter notebooks are not suitable.
• You must use Python for this assignment. You are being assessed on how
well you can use Python to complete the assignment tasks. NB: You can use
2Excel for simple data manipulations and clean-up; but Python is better at these
tasks too! All plots and statistical output in the assignment must have been
produced in Python, though you can of course make nicer tables in a text editor
to include in your assignment. Please include an appendix in your assignment
that contains the Python code your group used to produce ALL outputs in your
assignment. A heavy penalty will apply if the Python code is not supplied (or
the code supplied does not run or work when the marker tries to run it).
Pre-analysis instructions for data:
Please include the python code from the Jupyter notebook file “grp assnt gendata.ipynb”
in your Jupyter notebook file to input and clean the data. Collect the student ID num-
bers for the members of your group and then add these numbers together. Input the
result into the python code where instructed. Run the subsequent code to generate
two datasets: “train” and “test”. Most analysis you do will only use the “train”data
set. Any forecasting your group does will only use the “test” dataset. The purpose
of these commands is to ensure that each group receives different randomly selected
datasets for “train”ing and “test”ing purposes. Two other python codes are included
in case you need it: forward selection.py and backword selection.py
Business problem:
The US Department of Energy Office of Energy Efficiency runs a website www.fueleconomy.
gov which is the official source for fuel economy information for consumers and organ-
isations in the US. The US government is interested in understanding the drivers of
fuel economy in a large range of vehicles for private consumer, organisational and gov-
ernment use in the US. In particular, they are very interested in the effect of a variable
called engine displacement, which is the total volume of all the cylinders in an engine,
on fuel economy in vehicles. They wish to build a model that can accurately predict
the level of fuel economy for the cars in their database, so they can improve their un-
derstanding and communicate this, and also make better recommendations, on their
website. Your group has been commissioned to research on and analyse the data pro-
vided and then report back to the Department of Energy Office of Energy Efficiency,
principally regarding the major goals they are interested in.
3Data and Description:
Please see the file Fueleconomy.pdf for information on the variables and data collected.
The data used here are from a wide range of cars manufactured in the years 1984-
2023 and is available at https://www.fueleconomy.gov/feg/ws/index.shtml. The
dataset at this site is in the file “vehicles.csv”. Please see Fueleconomy.pdf for de-
scriptions of the variables in the study and for more information. The measure of fuel
economy to be used is the average miles per gallon MPG achieved over various tested
journeys for each car, labelled comb08 in the dataset.
Goals and primary questions:
There are three primary goals that the Department of Energy Office of Energy Effi-
ciency would like your group to focus on:
(a) Understand the relationship between fuel economy and primarily engine dis-
placement, as well as that between fuel economy and any other useful explana-
tory variables;
(b) Develop a causal model for fuel economy, that includes engine displacement;
(c) Develop an optimal model for predicting fuel economy.
The focus is on vehicles that use either only a single fuel, being only petrol or only
diesel: cars that employ electricity or gas to power them (solely or hybrid) are not to be
considered in your analysis. Only cars made in the years 1984-2021 should be included.
As in many real data sets, there are many extraneous variables here, including other
potential response variables, all of which are not suitable to be included as explanatory
variables in any predictive or causal models for fuel economy. This includes several
variables to do with electric or gas or hybrid cars, and many others, all of which should
be ignored. This is done in “grp assnt gendata.ipynb”.
Tasks:
1. Conduct a suitable exploratory analysis on this dataset that is relevant to the goals
of this study (5 marks).
2. Analyse the relationship between MPG and displacement and test the significance
of this relationship using an SLR. Include a discussion of whether the assumptions of
4your analysis and test could hold for this data and whether and how strongly the data
actually fits the model. (5 marks)
3. Discuss which variables in the dataset could be causing omitted variable bias in
your analysis in task 2, and justify clearly why you think that. Include these omitted
variables, together with displacement, in an MLR model, without any transformations
or interactions or nonlinear effects; then fit the model. Again, test for a relationship
between MPG and displacement. Also include a discussion of whether the assumptions
of your test could hold for this data and whether and how well the data actually fits
the model. Also discuss the level and sources of multi-collinearity present and whether
you think this is problematic, or not, and why; and if so, problematic for what? (10
marks)
4. Conduct a variable and model selection exercise, including at least two potential in-
teraction effects as potential predictors and also at least two transformations/nonlinear
effects on regressors and/or response variable. You must properly motivate and dis-
cuss your choices here. Then, report a summary of the comparison of fit over at least
4 different models/transformations/variable sets that you tried, all while forcing dis-
placement to stay in the model in some form. The goal is to find an optimal model that
is highly accurate, but also parsimonious, to predict and explain MPG. Finally, fully
report and give diagnostics on the final optimal model, as well as briefly discussing any
collinearity issues it may have. Also, if there are any nonlinear effects in this model,
clearly discuss and illustrate their effects on MPG. (15 marks)
5. Discuss your results and conclusions regarding the overall goals of this study, in
light of the results from your overall analysis of the “train” dataset. Be technical but
clear here. Also, include a prediction of what would result if displacement is increased
by one unit, using at least the optimal model so far (5 marks)
6. Using (at least) the 2 best model specifications considered so far (and any others you
think relevant), generate forecast predictions in the“test” dataset for MPG. Present
a summary table, and suitable plot(s), of the forecasts and their accuracy for these
models, using the forecast measures RMSE, MAD and forecast R2. Re-discuss your
5results and conclusions regarding the overall goals of this study, in light of these results
and your overall analysis. Be technical but clear here. (10 marks)
7. Write a final report, in as close to plain English as is practical and possible, that
discusses and summarises your analysis above and gives conclusions on the overall
goals of this study. Address the report to, and write it at a level appropriate for,
the Department of Energy Office of Energy Efficiency who may not be that savvy in
business analytics. Include in your report a recommendation for what the Department
shoudl spend money on in order to increase effectively of transport; plus any suggestions
for future studies they should do better achieve the goals they have. (5 marks)