COMM1190-无代写
时间:2022-12-02
C
R
IC
O
S
P
ro
vi
d
er
C
o
d
e
0
0
0
9
8
G

COMM1190: DATA, INSIGHTS, AND DECISION
TERM 3 2022
FINAL EXAM PRACTICE QUESTIONS
QUESTION 1
You have been brought in as a Data Science consultant on a court case. A chemical company
has been found negligent after a chemical spill at one of their plants. All that remains in the
court case is to decide on the extent of the damages for which the company is liable. One way
the court has been deciding on this amount is to look at the impact the spill has had on the
value of houses located near to the chemical plant where the spill occurred.
As the expert witness, you have been asked to evaluate some alternative strategies to
estimate the impact on housing prices (price). Strategy A involves taking a sample of sales
that occurred after the spill where the houses are classified as either being close to the plant
or not. This feature was designated by a variable near that was equal to 1 if the house was
deemed to be close to the chemical plant and zero otherwise. Then a regression analysis is
performed using the following model (MA):
: = 0 + 1 + .
Strategy B involves taking a sample of sales for houses near to the plant but where some
sales occurred before the spill and some after. The variable after is equal to 1 if the house
was sold after the spill and zero if the sale was before. Then a regression analysis is
performed using the following model (MB):
: = 0 + 1 + .
Part A.
Explain A and B as strategies to estimate the impact of the chemical spill and critically
evaluate each of them. Is either preferable to the other?
Part B.
Suggest an alternative regression model that is preferable to given that you only have data
from after the spill. Does this address all your criticisms of Strategy A that you outlined in part
(a)?
Part C.
Using housing data models MA and MB are estimated, and the results given below. How do
you interpret these results? (Note that is expressed in $1000)
� = 131.9 (4.0)− 40.0(7.6)
= 142, 2 = .165, ( . )
� = 63.7 (5.9)+ 28.3(9.1)
= 96, 2 = .094, ( . )
Part D.
Suppose you have sales both near and not near to the plant as well as sales before and after
the spill. Suggest an alternative strategy to estimate the effect of the oil spill on housing prices
that is preferable to both MA and MB?
Word Limit: 800 words for entire question (i.e., all subparts).
QUESTION 2
Imagine you work for a large department store, which highly values customer service. The
following chart shows how customers contact the customer service centres.
You begin to discuss the chart with your manager. Immediately, she has the following queries:
“I want to see the overall trends, but it is difficult to see with all the seasonal spikes in the time
series. I’d like a simpler view into the trend.” You decide to create some charts to address your
manager’s queries.
Part A.
Using the four frameworks typology, identify the type of chart you would use to address the
query and explain why.
Part B.
Sketch two alternative charts for the query. For each chart, provide a brief explanation of your
design choices. To sketch the chart, you can use any tool you want (e.g., you can use a
software tool like infogram, excel, or R). Alternatively, you can sketch the chart using pencils,
pens or markers on paper, then take a picture of the charts and paste them into your solutions
document. You can access the underlying data “customer_service.xlsx” on Ed.
Part C.
Evaluate your two charts and explain which you would select to further develop to present to
your manager.
Word Limit: 500 words for entire question (i.e., all subparts).
QUESTION 3
As part of a preventive health program, we are interested in building a model for predicting
diabetes among women. We have access to the following information for each woman:
- diabetes: Yes or No for diabetic.
- npreg: number of pregnancies.
- glu: plasma glucose concentration
- bmi: body mass index
- ped: diabetes pedigree function
- age: age in years.
We have split the available data into a training dataset with 332 cases and a test dataset with
200 cases.
With reference to the above case, please answer the following questions:
Part A.
We have run a logistic regression to predict diabetes using npreg, glu, bmi, ped, and age.
The R output from this logistic regression is as follows.
Based on the output, write down the mathematical equation of the logistic regression
associated with this R output.
(2 marks)
Part B.
Based on the output in Q 3 Part A, provide an interpretation of the coefficients associated with
bmi and npreg?
Part C.
Based on the output in Q 3 Part A, explain and justify the predictors that have a statistically
significant relationship to the response?
QUESTION 4
As an alternative to the logistic regression in Q3, we have also fitted the classification tree
below:
Part A.
Based on this decision tree, would you predict that a woman with the following characteristics
has diabetes: npreg=0, glu=140, bmi=47.9, ped=0.259, age=26? Justify your answer.
Part B.
Provide TWO facts to demonstrate the consistency of results from the classification tree above
with the findings reported in Q3 Part A.
QUESTION 5
City X government plans to roll out a crime forecasting tool to predict crime hotspots using 20
years of crime data. The tool focuses on using several types of data sources:
* Historical records: the crime data recorded by police.
* Demographic, geographical, and socio-economic data: information about age, gender, home
address, marriages, and an average income of a household.
* Human mobility data: information about mobile usage.
* Social media data: information (e.g., locations, time, keywords) about tweets and Facebook
posts.
The government is aware that the project could be subject to several ethical risks. They have
invited an ethics advisory team to help them evaluate the data ethics issues of the project. As
a member of the advisory team, you are required to:
a) Identify and explain the ethical issues involved in key data analytics stages i.e.,
data collection, data analysis, and data communication using the tool.
[max 200 words]
(10 marks)
b) Provide your suggestions on how to prevent or mitigate the ethical issues you
identified in Question 4a).
[max 200 words]
(10 marks)
a) Identify and explain the ethical issues involved in key data analytics stages i.e., data
collection, data analysis, and data communication using the tool.
b) Provide your suggestions on how to prevent the ethical issues you identified
in a).


essay、essay代写