Cars
STAT-S301
SP 22
Question Set 1
1. Get to know your scientific question (Module 1)
(a) Identify the variable of interest.
(b) Identify the population(s) and sample(s).
(c) Identify the parameter(s) and statistic(s).
(d) What is the scientific question? Is this Descriptive Statistics or Inferential Statistics?
2. Get to know your data (Module 1)
(a) Identify the types of your data: nominal data, ordinal data or quantitative data.
(b) Identify the types of your data: time series data or cross-sectional data.
(c) Identify the source of your data: primary data or secondary data. Do you think the data is
reliable? Are there possible issues with your data?
3. Calculate descriptive statistics in Excel (Module 3)
(a) Calculate the statistics for your variable of interest, such as sample mean (x¯), median, mode,
variance (s2), and standard deviation (s).
(b) Use your qualitative variable to split your full data into two subsets: one set per category on
your qualitative variable (Hint: The categorical variable has two categories/labels). Calculate
the above statistics for each subset to compare.
4. Display your data with charts and graphs in Excel (Module 2 + M5 / M6 / M10)
(a) Construct displays that best describe ONLY your qualitative variable (e.g. bar chart, pie
chart); and describe the distribution (Hint: How do we display qualitative data? which of all
your variables are qualitative? don’t forget to describe what you display.)
i. Use this analysis to write “a made up" Binomial Problem (state the full problem, necessary
values using (a) above and let n = 25.)
(b) Construct displays that best describe your variable of interest and describe its distribution.
(Use: Frequency distribution tables, histograms, etc)
i. Use the empirical rule to check normality
ii. Discuss normality, symmetry and skewness.
1
(c) Construct displays that best describe the relationship/association between two quantitative
variables (the variable of interest as the dependent variable, y, and another quantitative
variable as the independent variable, x); and describe the relationship. (M10)
Question Set 2
1. Construct a confidence interval for a population mean (Module 8.2. (Review M7/M8.1 if needed
but M8.2 is what you need to Caclulate/Construct intervals))
(a) Do you need to make assumptions in order to perform the procedure of constructing a
confidence interval? If so, what assumptions need to be made? If not, why?
(b) Construct a confidence interval for the average highway MPG.
i. Should you use a z-interval or a t-interval? Why?
ii. Compute the necessary sample statistics for constructing a confidence interval.
iii. Find the margin of error of the confidence interval at confidence levels of 91% and 96%,
respectively.
iv. Calculate these two confidence intervals.
(c) Someone believes that the average highway MPG is 31 . Does the sample support the claim?
Explain if you have different conclusions using the above two confidence intervals. (You must
discuss in terms of accuracy and precision.)
2. Conduct a hypothesis test for a population mean (Module 9.2. (Review M7/M9.1 if needed but
M9.2 is what you need to Caclulate/Construct HT))
(a) Do you need to make assumptions in order to perform the procedure of conducting a hypothesis
test? If so, what assumptions need to be made? If not, why?
(b) Using α = 0.01 perform a hypothesis test to determine if the average highway MPG is higher
than 30 .
i. Write down the hypotheses.
ii. Calculate the test statistic, critical values and p-value.
iii. Describe your decision of the test and make a conclusion based on the context.
Question Set 3
1. Building a Simple Linear Regression Model: Preprocess. (Module 10/M11 Covariance,
Correlation, & Coefficient of Determination + Simple Linear Regression (Review M10
Cov, Corr, & Coef. of Det. for (a), (b), (c) M11 SLR for (d))
(a) Identify all quantitative variables from the dataset.
(b) Construct a Scatter Plot to show the relationship between Highway MPG (y) and each
independent variable. Calculate the sample correlation coefficients for all pairs. Describe the
association.
(c) Which pair has the strongest linear association?
2
(d) Write down the general formula for the Simple Linear Regression Model between y and x.
(Write the formula using general parameters notation β0 and β1, what should be capitalize or
lowercase ? what should be added, if any? )
2. Describe the linear relationship between Highway MPG (y) and the variable you answered in 1(c)
(above) as x. (Module 10/M11 SLR (M11 SLR is good for (a), (b) and (c). For (c) you can also
look at M10 Cov, Corr, & Coef. of Det.)
(a) Calculate the slope and y-intercept of the least squares regression line using Excel. Write
down the linear equation.
(b) Interpret the regression slope.
(c) What percentage of the total variation in y can be explained by this independent variable x?
3. Use the regression model to predict Highway MPG (y). (Module 11 SLR)
(a) What is the predicted highway MPG with 3.5
? (Fill in the blank with units and name of the independent variable you chose.)
4. Is there a linear relationship between y and x? (Module 11 SLR)
(a) Test the significance of the slope of the regression equation. Use α = 0.01.
i. Write down the hypotheses.
ii. What is the p-value?
iii. Describe your decision.
(b) Develop a 94% confidence interval for the population slope. Does this confidence interval
include 0? (c) State your conclusion.(Hint: You may need to re-calculate Regression analysis:
Data → Data Analysis → Regression → Confidence level.)
5. Check the assumptions for regression analysis. Make necessary plots in Excel to justify and
include them in your answers.(Module 11 SLR)
(a) Is the relationship between the dependent and independent variables linear? Which plot
should you check?
(b) Do the residuals exhibit some pattern across values for the independent variable? Which plot
should you check?
(c) Is the variation of the dependent variable the same across all values of the independent
variable? Which plot should you check?
(d) Do the residuals follow the normal probability distribution? Which plot should you check?
(e) Conclusion: Are the results from the regression analysis reliable?
Question Set 4 (M11 MLR)
1. Model 1: Develop a multiple regression model to predict the Highway MPG (y) using all the
other variables of interest as listed above. (Round all numerical answers to two decimal places as
needed.)
3
(a) Identify qualitative variable(s) from the list of variables of interest, if there is any, and create
a dummy variable in Excel. (Note: use Excel function =IF() and use alphabetical order
to assign values 0 and 1)
(b) Perform a multiple regression with the Data Analysis Toolpak in Excel, and write down the
regression equation for Model 1. (Enter in Excel the confidence level given in question 1(e).
Note: Excel requires that the independent variables be located in adjacent columns)
(c) Explain the variation of the dependent variable after accounting for the effects of the other
independent variables:
i. What percentage of total variation in the Highway MPG (y) can be explained by Model 1?
ii. What is the value of the adjusted multiple coefficient of determination, R2A?
(d) Is the overall regression model significant using α = 0.01? State the hypotheses and your
conclusion.
(e) Which independent variables are signifcant predictors using α = 0.185 or confidence level
81.5%? Which are not significant? (After accounting for the effects of the other independent
variables)
2. Develop a second multiple regression model (Model 2) using ONE step of the “backward
elimination method”. (Remember: variables should be removed one at the time and regression
analysis i.e. coefficients, R2, p-values, etc must be re-calculated at each step) (Round all numerical
answers to two decimal places as needed.)
(a) Which variable should you remove from Model 1? Why?
(b) Perform a multiple regression with the Data Analysis Toolpak in Excel, and write down the
regression equation for Model 2. (Enter in Excel the confidence level given in question 2(e).
Note: Excel requires that the independent variables be located in adjacent columns)
(c) Explaining the variation of the dependent variable:
i. What percentage of total variation in the Highway MPG (y) can be explained by Model 2?
How does this compare with the percentage you obtained with Model 1?
ii. What is the value of the adjusted multiple coefficient of determination, R2A? How does this
compare with the one you obtained with Model 1?
(d) Is the overall regression model (Model 2) significant using α = 0.04?
(e) Are all the independent variables in Model 2 significant predictors using α = 0.08 or confidence
level 92 % after accounting for the effects of the other independent variables?
(f) Prediction:
i. Is Model 2 better than Model 1?
ii. Predict the highway MPG(y) with Vehicle Type = Truck; Weight (pounds) = 8465; Displace-
ment (liters) = 3.9; Horsepower() = 424; Number of Cylinders() = 10 using “the best” model
(between Model 1 and Model 2). NOTE: you may or may not need to use all given values.
(g) Interpret regression coefficients.
i. Interpret the coefficient of Vehicle TypeTruck.
4
3. Check the assumptions for regression analysis for the model you have chosen. Make necessary
plots in Excel to justify.
(a) Is the relationship between the dependent and independent variables linear?
(b) Do the residuals exhibit some patterns across values of the independent variables?
(c) Are the variations of the dependent variable the same across all values of the independent
variables?
(d) Do the residuals follow the normal probability distribution?
(e) Conclusion: Are the results from the regression analysis reliable?
5