程序代写案例-MSCI521
时间:2021-01-21
1 MSCI521: Statistics and Descriptive Analytics | Department of Management Science
MSCI521: Statistics and Descriptive Analytics
Coursework
Guidelines
 Answer both questions. Equal marks are available for both questions. The % marks indicated
for sections of questions are approximate.
 You can include any graphs and outputs that you find relevant to the problem in the report,
but make sure that they are properly referenced and interpreted in the text. Remember that it
is up to you to interpret what a figure or table shows, not for the marker to infer.
 It is also important to explain how you came to one or the other conclusion. For example, you
should explain why you think that there is or there isn’t an effect in the model. Simply stating
“there is” or “there is not” is not acceptable.
 Do not include any appendices (they will be discarded during the marking).
 You must submit an electronic version of your report through Moodle along with a
coursework declaration form.
 Do not include your name anywhere in the work, but please do include your library id!
 Your report must be between 2000 and 3000 words and should not exceed 20 pages.
Question 1 – 50 marks
An analyst has collected the sample of 534 respondents, measuring the wages of people in the UK and
other information about them:
 wage – wage in GBP per hour,
 education – number of years of education, starting from elementary school (across all
programmes),
 experience – number of years of work experience (which is calculated as
),
 age – age in years,
 ethnicity – categorical variable, indicating, whether the respondent is Caucasian or of another
ethnicity,
 region – categorical variable, showing, whether the respondent is from South or not,
 gender – gender of the respondent,
 occupation – categorical variable, indicating the occupation of person, can be:
o worker – tradesperson or assembly line worker,
o technical – technical or professional worker,
o services – service worker,
o office – office and clerical worker,
o sales – sales worker,
o management – management and administration;
 sector – the sector of work of respondent, which can be manufacturing, construction, other,
 union – nominal variable, indicating whether there is a union, related to the job,
In order to study the impact of variables on wage, you need to build a regression model. Do the
following steps in order to fulfil the task:
1. Analyse the data and explain what you observe, and discuss its possible causes. [25% mark]
2. Based on (1) and your understanding of the problem, propose an appropriate regression
formulation for the problem. Explain all the transformations that you propose (if any), and
which variables should be included and why. [30% mark]
3. Do regression diagnostics of the model from (2) and fix any problems you find. Explain what
you do and why [20% mark]:
a. Are there any apparent issues in the residuals (any patterns)?
b. Does the variance of the error appear to be constant?
c. Do the errors appear to be normally distributed?
d. Are there outliers?
4. Assuming that the standard regression assumptions hold, use your model from (3) to answer
the following questions: [25% mark]
2 MSCI521: Statistics and Descriptive Analytics | Department of Management Science
a. How do the years of experience impact the wage?
b. What is the meaning of the intercept in your model?
c. What is the average effect of the presence of a union on the wage?
d. What is the interpretation of the 99% confidence interval of the parameter for the
education?
e. What is the impact of age on wage?
Question 2 – 50 marks
In order to determine the main factors, influencing the price of cars, a company collected a sample of
82 cars with 27 variables, measuring different characteristics:
 Manufacturer;
 Model;
 Type – categorical variable with levels "Small", "Sporty", "Compact", "Midsize", "Large" and
"Van";
 Price – price of a standard version of a car (in $1,000);
 Min.Price – price for a basic version (in $1,000);
 Max.Price – price for “a premium version” (in $1,000);
 MPG.city – fuel consumption in the city, miles per US gallon;
 MPG.highway – fuel consumption on a highway;
 AirBags – Air Bags standard: none, driver only, or driver & passenger;
 DriveTrain – either rear wheel, or front wheel, or 4WD;
 Cylinders – number of cylinders;
 EngineSize – engine size in litres;
 Horsepower – maximum horsepower;
 RPM – revs per minute at maximum horsepower;
 Rev.per.mile – engine revolutions per mile (in highest gear);
 Man.trans.avail – is a manual transmission version of the car available?
 Fuel.tank.capacity – fuel tank capacity in US gallons;
 Passengers – passenger capacity in persons;
 Length – length in inches;
 Wheelbase – wheelbase in inches;
 Width – width inches;
 Turn.circle – U-turn space in feet;
 Rear.seat.room – rear seat room in inches;
 Luggage.room – luggage capacity in cubic feet;
 Weight – weight in pounds;
 Origin – a non-USA or USA company origins;
 Make – combination of Manufacturer and Model.
The company is not sure whether all the variables are useful or not, so you will need to select the
useful ones. They want you to explain what impacts the price of a standard version of a car. They are
also interested in insights about the prices for some specific cars, but they are risk averse and want to
be sure about those insights on a high confidence level of 99%.
You will need to do the following steps in order to help the company:
1. Undertake a careful preliminary analysis of the data. Explain, which variables seem most
useful, providing your reasons. Describe the relationships that you find, their possible causes
and their practical implications. Explain how you could incorporate these into a regression
model, including variables transformations that you think might be suitable. [30% mark]
2. Construct an appropriate model to predict the price. Explain how you have come up with that
model. Conduct regression diagnostics to see whether it can be improved further [20%
mark].
3. Construct an alternative regression model (e.g. with different sets of explanatory variables
and / or with different meaningful transformations of some of the variables). Explain why you
have proposed the second model and what idea you want to check using it. Select the best
model from the (2) and (3), justifying your choice. [25% mark]
3 MSCI521: Statistics and Descriptive Analytics | Department of Management Science
4. Using your preferred model, answer the following questions of the company, explaining how
you have obtained the values (i.e. what is the logic for getting the values. Don’t paste R
code): [25% mark]
a. How does the horsepower influence the price of cars? Give an interpretation of the
relevant parameter in the model.
b. Do airbags impact the price of cars? If yes, what is the expected effect of installing
airbags for both driver and passengers on price?
c. What would be the expected price for a hypothetical Ford 240 car that has midsize,
consumes 22 mpg in the city and 29 mpg on highway, has airbags for driver only,
front drivetrain, 6 cylinders engine of 2.3 size, 140 horsepower, 5400 RPM, 2380
revolutions per mile, manual transmission, 16 gallons tank, 4 passengers, length of
182, wheelbase of 103, width of 68, turn circle of 39, rear seat room of 27.5, luggage
room of 14, weight of 2970 and manufactured in the USA?
d. What is the 99% interval for the expected price for the car described in (c)?
e. What would be the lowest and the highest values of price for the car described in (c)
in 99% of the cases?























































































































































































































































essay、essay代写