STAR534 - Homework 7
This is more open-ended assignment than some of the previous assignments. In general, use good data
analysis practices. For example, when you are required to fit a GAM model, decide whether the default
smoothing parameters seem appropriate. If not, change that setting and refit your model. Turn in your final
model and describe the settings you used. In your answers, don’t provide a diary of all the things you tried
and didn’t work. Instead, write a succinct summary describing your final model for each problem.
Recall problem 2 from Homework 2: You are the head data scientist for a bank. Your goal is to predict
whether or not the bank should loan money to an applicant.
The data are based on the article: Min Li, Amy Mickel & Stanley Taylor (2018) “Should This Loan be
Approved or Denied?”: A Large Dataset with Class Assignment Guidelines, Journal of Statistics Education,
26:1, 5-66, https://doi.org/10.1080/10691898.2018.1434342 In Li et al. (2018) the authors explain various
aspects of the data.
Your goal is to build a model to predict the loan status. Loan status is the MIS_Status variable. For the
training data, recode this variable so that loans that were paid off are coded 1 and 0 otherwise. For the test
data, this column has already been re-coded. Use the test data only for evaluating predictive performance.
Please don’t use the test data for any model building as that’s cheating.
I recommend that you do some exploratory data analysis to re-familiarize yourself with these data. Use the
training data for fitting the models (problem 1-3) and the test data for predictive performance (problems 4-5).
1. Fit a logistic regression model to predict using predictors 26-31. Provide the summary table for your
model.
2. Fit a GAM model to these data using smoothing splines. Note: you’ll need to think about this one a
bit. You can’t just run the command gam(y~.)
a. Report the summary table of the results. Write a few sentences comparing the GAM results to
the results from the logistic regression model.
b. Plot the model results (e.g., plot(my.gam)) Discuss the plots - are the smoothed results interpretable?
How do you interpret the plots for the binary predictors?
3. Fit a third type of model to these data. Use whatever model you think is appropriate. The model may
be one we considered in this class or some other model. Report the results in a way that is appropriate
for your model (table and/or plots).
4. For the test data, compare the predictive performance of the models.
a. What an appropriate measure of predictive performance for these data and why did you choose it?
b. Report a table of the results.
c. Which model provides the best predictions?
5. Now consider the larger dataset. Fit a model with more of the predictors. Try to improve on the
predictive performance of your best model from question 4. Describe your final model as appropriate
(plots and/or table, as appropriate).
6. What did you learn with this assignment? If your answer is ‘nothing’, then go back and redo problem 3
by fitting a model that is new to you.
1 