r代写-STAT 3022|学霸联盟

r代写-STAT 3022

时间：2021-11-17

STAT 3022 MIDTERM 2 PRACTICE SOLUTION FALL 2021
Multiple Choice
1. A dataset of size 21 is used to fit a simple linear regression model. Below are the
Cook's distance values associated with the observations. What can we say about
the Cook’s Distance of these observations? Please select all that apply. Please
use 1 as the threshold for the Cook’s Distance.

A. Observation 1 has the larger Cook’s Distance

B. Observation 21 has the larger Cook’s Distance

C. Observation 21 is influential

D. Observation 21 is not influential

2. Here are two models that are fitted using the same sample of observations:
 Model1: Y 1+X+X2
 Model2: Y 1+X
Which of the following statements are true?
Group of answer choices

A. Model1 has a larger R^2

B. Model2 has a larger R^2

C. Model1 fits the population (from which the sample was drawn) better

D. Model2 fits the population (from which the sample was drawn) better

3. There are three variables X1, X2, and Y. Using the same datasets we fit two models:
 Model1: Y 1+X1
 Model2: Y 1+X1+X2
Which of the following statements is/are correct? Please select all that apply.

A. The estimated intercepts in Model1 and Model2 will always be identical

B. Model1 will have a larger RSS than Model2

C. The estimated slope of X1 in Model1 and Model2 will always have the same sign

D. The estimated slope of X1 in Model1 and Model2 will always have the opposite
signs

4. Suppose the correlation between X1 and X2 is -0.9. What can we say about the
regression model Y∼X1+X2. Please select all that apply.
Group of answer choices

A. There is collinearity in the regression model.

B. The estimated value of the slope of X1 may be very different from the true value
of the slope.

C. The variance of the slope estimator of X1 is increased.

D. The variance of the slope estimator of X2 is decreased.

5. What is the purpose of centering the predictor variable in a polynomial regression
model? Please select the best answer.
Group of answer choices

A. To reduce the collinearity among the polynomial terms

B. To reduce the sample mean of the predictor variable

C. To reduce the sample variance of the predictor variable

D. To reduce the RSS of the regression model

Part 1
Instruction: Show your work for full credits
Let us examine a data set that explores the relationship between total
monthly earnings (MonthlyEarnings) and a number of variables on an
interval scale (i.e. numeric quantities) that may influence monthly earnings
including including each person’s IQ (IQ), a measure of knowledge of their
job (Knowledge), years of education (YearsEdu), and years of experience
(YearsExperience), years at current job (Tenure).

The data set also includes dummy variables that may explain monthly
earnings, including whether or not the person is black / African American
(Black), whether or not the person lives in a Southern U.S. state (South),
and whether or not the person lives in an urban area (Urban).
We will run this in R and obtained the following

## Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -451.0098 121.3752 -3.716 0.000215 ***
IQ 2.5966 0.9963 2.606 0.009301 **
Knowledge 6.5545 1.8142 3.613 0.000319 ***
YearsEdu 47.6530 7.1378 6.676 4.22e-11 ***
YearsExperience 12.4833 3.1746 3.932 9.04e-05 ***
Tenure 6.2910 2.4049 2.616 0.009043 **
Black -110.6660 39.2222 -2.822 0.004882 **
South -50.8222 25.7903 -1.971 0.049068 *
Urban 155.4316 26.4621 5.874 5.94e-09 ***

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 356.7 on 926 degrees of freedom
## Multiple R-squared: 0.2285, Adjusted R-squared: 0.2219
## F-statistic: 34.29 on 8 and 926 DF, p-value: < 2.2e-16
1. Interpret the coefficient -110.6660 associated with Black [5 pts]
This means that even after accounting for the effects of all the other
explanatory variables in the model (includes educational attainment,
experience, location, knowledge, and IQ), black / African American
people earn on average $110.67 less per month than non-black
people.

2.Write the estimated regression that can be used to represent relationship between
Monthly Earnings and all predictors used. [5 pts]
MonthlyEarnings)=-451.0098+ 2.5966IQ+ 6.5545 Knowledge+
47.6530YearsEdu + 12.4833YearsExperience+6.2910Tenure-110.6660
Black - 50.8222South+155.4316Urban

3.Write the estimated regression that can be used to represent relationship
between Monthly Earnings, Years of Education, Years of Experience, IQ,
tenure, and knowledge, for black worker who lives in urban area
in Southern states? [10 pts]

MonthlyEarnings)= -406.2442+ 2.5966IQ+ 6.5545 Knowledge+
47.6530YearsEdu + 12.4833YearsExperience+6.2910Tenure-
50.8222South

Part 2: Show your work for extra credits
A regression analysis was applied in order to determine the relationship
between a dependent variable and 8 independent variables. The following
information was obtained from the regression analysis.

R Square = 0.80
SSR = 4,280
Total number of observations n = 56

1. Fill in the blanks in the following ANOVA table. (4.5 pts)
2. Is the model significant at alpha = 0.05? Why or why not? (1.5 pts)
3. Show that the coefficient of Determination, R2 is 0.80 (2 pts)
4. Compute the adjusted R2 (2 pts)
Source of Degrees Sum of Mean
Variation of Freedom Squares Squares F
Regression 8 4,280 535 23.5
Error 47 1070 22.76596
Total 55 5350

1. Is the model significant at alpha = 0.05? Why or why not? (1.5 pts)
23.5 is much, much bigger than 1, so the model is significant.
OR from R, pf(23.5,8,47,lower.tail=FALSE) ≈ 0, so the model is
significant.
2. Show that the coefficient of Determination, R2 is 0.80 (2 pts)
SSR/SST = 4280/5350 = .8
3. Compute the adjusted R2 (2 pts)
1 – ((54/46)*(1-.8)) = .765

Part 3
The Hilton Hotel chain is developing a regression model to predict the operating
margin of each of its franchises (operating margin is defined as the ratio of net
profit to total revenue). Hilton plans to use this model to help identify profitable
locations to build new hotels. You run a linear regression based on 100 hotels
operated by Hilton, where the dependent variable is operating margin, and the
independent variables are the number of hotel rooms within 1 mile of the hotel,
and the amount of office space (in thousands of square feet) within 1 mile of the
hotel.

Dependent Variable:
Independent Variable:

Regression Statistics
R R Square
Adj.RSqr
MARGIN
ROOMS, OFFICE

Std.Err.
#Cases

#Missing

Deg.Free

t(2.5%,97)
0.67 0.45 0.44 8.40 100 0 97 1.985
Summar
y
Table
Variable Coeff. Std.Err. t Stat P-
value
Lower95
%
Upper95
%

Intercept 53.983 5.178 10.425 0.000 43.705 64.261
ROOMS -0.0073 0.0013 -5.615 0.000 -0.010 -0.005
OFFICE 0.0216 0.0176 1.227 0.223 -0.013 0.057

(a) State the multiple regression prediction line equation.

(b) Check whether each of the independent variables is significantly
related to operating margin using p- values at α = 0.05.

(c) Check whether each of the independent variables is significantly
related to operating margin using confidence intervals.

(d) Two possible sites are being considered for a new hotel. Site A is
near an office complex with 400 thousand square feet of office
space, but there is a competing hotel chain within 1 mile that has
2,000 hotel rooms. Site B is more remote and has only 50 thousand
square feet of office space nearby, but also less hotel competition
with only 300 rooms nearby. Which hotel site has a higher predicted
operating margin according to this regression?

学霸联盟