R代写|Assignment代写 - regression analysis
1. (24 points) In recent times, several professionals in health-care analytics are trying to understand the key factors that might dictate the chances of someone getting an infection while admitted at a hospital. 113 patients were randomly sampled, and initially, to predict InfectionRisk (Y) [higher values ⇒ higher risk and we are not extrapolating in any of the questions below], two potential cause variables were examined: ❼ Stay: the number of days the patient remained in hospital. ❼ XRay: the number of XRays the patient received. Fig 1. describes this regression output. Call this Model F1. The PRESS value for this model was 137.53. a. (7 pts) ❼ (2 points) Check at α = 0.05 whether at least one of these two variables would be useful in explaining variation in the response. (Need the null and alternate hypotheses, the p-value, and the decision). ❼ (3 points) Check at α = 0.05 whether (holding Stay constant) an increase in XRay fre￾quency leads to an increase in InfectionRisk. (Need the null and alternate hypotheses, the p-value, and the decision). ❼ (2 points) John will have to stay for 10 days and will have 20 XRays. Estimate through a 95% interval (a range of numbers) his InfectionRisk. 3 b. (8 pts) Model F1 was next generalized to include a second order interaction term: Stay2XRay. Fig 2 summarizes that fit. Call this model F2. The PRESS value for this model was 133.78. ❼ (2 points) Observe that we are only explaining about 38% of the total response varia￾tion. So I concluded that the variables involved must be bad at α = 0.1. Am I right? Justify/criticize. ❼ (4 points) Sam (a friend of John from part (a)) will also have to stay for 10 days, but will have 21 XRays. Who has a higher InfectionRisk? And by how much? ❼ (2 points) Had you calculated the interval estimate from Model F2, would it be wider or narrower than what you found in (a)? Why? 4 c. (4 pts) Next, a computer glitch deleted all the information on the cause variables Stay an XRay and results from Models F1 and F2 (i.e., Figs 1 and 2) are lost too. A new person who’ll stay for 10 days and will have 15 XRays wants to estimate his InfectionRisk. What will it be? Also, find R2 R2a for this new model. (Hint: You may use Fig 3.) d. (5 pts) Hospitals often cite lack of funding as the reason behind alarming Infection￾Risks. A regression line was made connecting InfectionRisk (Y) to funding (X) (in millions of dollars): Yˆi = 2.1 1.5Xi (1) (Data on funding are confidential. A person who had access to those made this line) The standard error corresponding to the slope estimate was 2. Test at α = 0.10 whether an increase in funding leads to a significant decrease in InfectionRisks. You may assume pt(-1.5,111)=0.068227, pt(- 1.5,112)=0.068214, pt(-0.75,111)=0.2274207, pt(-0.75,112)=0.2274137. 5 2. (36 points) (The story from question 1 gets continued here) Later, it was discovered that several other potential cause variables might also affect the chances of getting infected. Here’s a larger list: ❼ Stay: Length of stay at the hospital, in days. ❼ Age: The age of the patient, in years. ❼ XRay: The frequency of XRay administration. ❼ Beds: Number of available beds at the hospital. ❼ Census: Total number of patients admitted. ❼ Nurses: Number of trained nurses available. ❼ Facilities: Number of facilities available at the hospital. a. (3 pts) Within one minute, you’ll need to choose a few of these variables as future higher-order and/or interaction term candidates. You can use a computer that does one regres￾sion in one second. Would you implement a step-wise search or an all-possible combination (i.e., a brute-force) search? Justify. 6 b. (9 pts) A stepwise selection search among these variables generated Fig 4. while a brute-force search generated Figs 5-9. ❼ (2 points) How many first order non-interaction regressions did the stepwise algorithm actually do for this dataset? ❼ (3 points) If financial constraints force us to choose only two cause variables, which com￾bination would you choose? Use both stepwise and brute force to answer this and check whether we have agreement between the two methods. ❼ (4 points) Next, imagine we have no restrictions on the size of the model. Use one of the two methods to propose good cause-combinations under each of the following scenarios: – If I were to explain the maximum proportion of variation in the response after penal￾izing for model complexity. – If I were to estimate the chances of a friend (that is going to be admitted) getting infected. 7 c. (9 pts) “Stay”, “XRay”, and “Facilities” were ultimately chosen from the previous variable screening steps. Next, the hospitals were classified according to their locations and two indicator variables X4 and X5 were brought in according to the following table: X4 X5 East Coast 0 0 Central America 0 1 West Coast 1 0 A first-order non-interaction model was constructed with these five variables. The fit is shown in Fig. 10. Call this Model Q1. The PRESS value for this model is 125.1996. ❼ (3 points) Compare F2 and Q1. ❼ (3 points) Using Q1, estimate the increase in InfectionRisk if we choose a hospital in a) the “East Coast”, b) the “West Coast”, and c) “Central America”. ❼ (3 points) Based on Fig. 10, at α = 0.05, is there a need to differentiate between hospitals in the regions i) “East Coast” and “West Coast” ii) “East Coast” and “Central America”? 8 d. (15 pts) Model Q1 was reduced next according to the classification: X6 East Coast/Central America 0 West Coast 1 and an interaction between “Stay” and X6 was conjectured to generate Model Q2, with a PRESS = 123.1746. Fig. 11 describes that fit. ❼ (2 points) Test at α = 0.2 whether the interaction term is significant. ❼ (3 points) A person is considering moving from an “East Coast/Central America” hospital to one in the “West Coast”. Regardless of the hospital he chooses, he estimates his length of “Stay” to be 10 days (we are not extrapolating) . By how much is this person’s “RiskofInfection” likely to change? ❼ (4 points) Use Model Q2 to discover a blind spot, if any, and interpret it. ❼ (3 points) A new person who has to stay admitted for 10 days, will need 10 XRays, wants to choose a hospital in the “East Coast” with 30 “Facilities” (we are not extrapolating) comes in. This person was not used to derive any of Models F1, F2, Q1, Q2. Using one of these four models, predict (through a point estimate) the InfectionRisk value of such a person. ❼ (3 points) Find R2 Jack for the model you chose in the previous part, out of the four available.