R代写｜Assignment代写 - regression analysis
1. (24 points)
In recent times, several professionals in health-care analytics are trying to understand the key
factors that might dictate the chances of someone getting an infection while admitted at a
hospital. 113 patients were randomly sampled, and initially, to predict InfectionRisk (Y) [higher
values ⇒ higher risk and we are not extrapolating in any of the questions below], two potential
cause variables were examined:
❼ Stay: the number of days the patient remained in hospital.
❼ XRay: the number of XRays the patient received.
Fig 1. describes this regression output. Call this Model F1. The PRESS value for this model
a. (7 pts) ❼ (2 points) Check at α = 0.05 whether at least one of these two variables would be useful in
explaining variation in the response. (Need the null and alternate hypotheses, the p-value,
and the decision).
❼ (3 points) Check at α = 0.05 whether (holding Stay constant) an increase in XRay frequency leads to an increase in InfectionRisk. (Need the null and alternate hypotheses, the
p-value, and the decision).
❼ (2 points) John will have to stay for 10 days and will have 20 XRays. Estimate through a
95% interval (a range of numbers) his InfectionRisk.
b. (8 pts) Model F1 was next generalized to include a second order interaction term:
Stay2XRay. Fig 2 summarizes that fit. Call this model F2. The PRESS value for this model
❼ (2 points) Observe that we are only explaining about 38% of the total response variation. So I concluded that the variables involved must be bad at α = 0.1. Am I right?
❼ (4 points) Sam (a friend of John from part (a)) will also have to stay for 10 days, but will
have 21 XRays. Who has a higher InfectionRisk? And by how much?
❼ (2 points) Had you calculated the interval estimate from Model F2, would it be wider or
narrower than what you found in (a)? Why?
c. (4 pts) Next, a computer glitch deleted all the information on the cause variables
Stay an XRay and results from Models F1 and F2 (i.e., Figs 1 and 2) are lost too. A new person
who’ll stay for 10 days and will have 15 XRays wants to estimate his InfectionRisk. What will
it be? Also, find R2 R2a
for this new model. (Hint: You may use Fig 3.)
d. (5 pts) Hospitals often cite lack of funding as the reason behind alarming InfectionRisks. A regression line was made connecting InfectionRisk (Y) to funding (X) (in millions of
Yˆi = 2.1 1.5Xi (1)
(Data on funding are confidential. A person who had access to those made this line) The standard
error corresponding to the slope estimate was 2. Test at α = 0.10 whether an increase in funding
leads to a significant decrease in InfectionRisks. You may assume pt(-1.5,111)=0.068227, pt(-
1.5,112)=0.068214, pt(-0.75,111)=0.2274207, pt(-0.75,112)=0.2274137.
2. (36 points) (The story from question 1 gets continued here) Later, it was discovered that
several other potential cause variables might also affect the chances of getting infected. Here’s
a larger list:
❼ Stay: Length of stay at the hospital, in days.
❼ Age: The age of the patient, in years.
❼ XRay: The frequency of XRay administration.
❼ Beds: Number of available beds at the hospital.
❼ Census: Total number of patients admitted.
❼ Nurses: Number of trained nurses available.
❼ Facilities: Number of facilities available at the hospital.
a. (3 pts) Within one minute, you’ll need to choose a few of these variables as future
higher-order and/or interaction term candidates. You can use a computer that does one regression in one second. Would you implement a step-wise search or an all-possible combination (i.e.,
a brute-force) search? Justify.
b. (9 pts) A stepwise selection search among these variables generated Fig 4. while a
brute-force search generated Figs 5-9.
❼ (2 points) How many first order non-interaction regressions did the stepwise algorithm
actually do for this dataset?
❼ (3 points) If financial constraints force us to choose only two cause variables, which combination would you choose? Use both stepwise and brute force to answer this and check
whether we have agreement between the two methods.
❼ (4 points) Next, imagine we have no restrictions on the size of the model. Use one of the
two methods to propose good cause-combinations under each of the following scenarios:
– If I were to explain the maximum proportion of variation in the response after penalizing for model complexity.
– If I were to estimate the chances of a friend (that is going to be admitted) getting
c. (9 pts) “Stay”, “XRay”, and “Facilities” were ultimately chosen from the previous
variable screening steps. Next, the hospitals were classified according to their locations and two
indicator variables X4 and X5 were brought in according to the following table:
East Coast 0 0
Central America 0 1
West Coast 1 0
A first-order non-interaction model was constructed with these five variables. The fit is shown
in Fig. 10. Call this Model Q1. The PRESS value for this model is 125.1996.
❼ (3 points) Compare F2 and Q1.
❼ (3 points) Using Q1, estimate the increase in InfectionRisk if we choose a hospital in a)
the “East Coast”, b) the “West Coast”, and c) “Central America”.
❼ (3 points) Based on Fig. 10, at α = 0.05, is there a need to differentiate between hospitals
in the regions i) “East Coast” and “West Coast” ii) “East Coast” and “Central America”?
d. (15 pts)
Model Q1 was reduced next according to the classification:
East Coast/Central America 0
West Coast 1
and an interaction between “Stay” and X6 was conjectured to generate Model Q2, with a PRESS
= 123.1746. Fig. 11 describes that fit.
❼ (2 points) Test at α = 0.2 whether the interaction term is significant.
❼ (3 points) A person is considering moving from an “East Coast/Central America” hospital
to one in the “West Coast”. Regardless of the hospital he chooses, he estimates his
length of “Stay” to be 10 days (we are not extrapolating) . By how much is this person’s
“RiskofInfection” likely to change?
❼ (4 points) Use Model Q2 to discover a blind spot, if any, and interpret it.
❼ (3 points) A new person who has to stay admitted for 10 days, will need 10 XRays, wants
to choose a hospital in the “East Coast” with 30 “Facilities” (we are not extrapolating)
comes in. This person was not used to derive any of Models F1, F2, Q1, Q2. Using one
of these four models, predict (through a point estimate) the InfectionRisk value of such a
❼ (3 points) Find R2
Jack for the model you chose in the previous part, out of the four available.