xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

stata代写-ECON6113/STAT6113

时间：2020-12-13

ECON6113/STAT6113: Term Project Guidelines

Tom Mayock

Fall 2020

The standards and requirements set forth in these guidelines may be modified at any time by the course

instructor. Notice of such changes will be by announcement in class and/or by email.

Due Date: December 9, 2020 at 5:00PM.

NO LATE ASSIGNMENTS WILL BE ACCEPTED.

ALL ASSIGNMENTS MUST BE SUBMITTED VIA THE COURSE CANVAS SITE.

In order for your assignment to be graded, you must

1. Submit your Stata “do file” and Stata log file to Canvas.

2. Submit your written project report to Canvas.

3. Submit your written project report to the plagiarism detection software via Canvas.

Overview

30 percent of your grade in this course will depend on the completion of an empirical project.

I will provide you with the data for this project, which will be extracted from the Freddie Mac

Single Family Loan-Level Dataset. The goal of the project is to develop a “scorecard” to predict

mortgage defaults. The analysis for the project must be conducted in Stata for you to receive

a grade. Furthermore, THE PROJECT MUST BE COMPLETED INDEPENDENTLY. After the

code (or “do-file” in Stata parlance) for the projects is submitted, I will test the performance of

your model on an out-of-time sample. The student that builds the model that exhibits the best

model as measured by the Kolmogorov-Smirnov statistic on the out-of-time sample will have 5

additional points added to her/his final course grade. For example, if your total course grade

based on all other assignments is an 85% and you build the best mortgage scorecard, your final

course grade will be a 90%.

Objective

One of the goals of this course is to develop students’ facility to analyze data to study economic

problems using econometric methods. To that end, you will complete an empirical project that

demonstrates your ability to work with statistical software and data and interpret the results of

1

ECON 6113 Term Project Guidelines

econometric models.

Your “client” for this project is a mortgage lender that wishes to engage in risk-based pricing

for its mortgage loans. The first step in establishing risk-based pricing is the construction of a

mortgage “scorecard” that predicts the probability that a loan defaults. You will develop such a

scorecard using data that I provide you derived from the Freddie Mac Single Family Loan-Level

Dataset; the full version of this data “covers approximately 22.19 million fixed-rate mortgages

originated between January 1, 1999 and June 30, 2015 [that were purchased or guaranteed by

Freddie Mac].” I will be providing you with a small random sample of loans from a particular

origination cohort. You are tasked with developing an econometric model that predicts the

probability that a given loan defaults, where a default is defined as going 60 days-past-due or

entering foreclosure at any point within 4 years of origination. In what follows a “good” account

is one that does not default, whereas a “bad” account is one that does default.

The Model

Let Di be an indicator variable that takes a value of one if loan i defaults within 4 years of

origination and is zero otherwise. This variable is called “BAD_OVER_48_60” in the data file.

Additionally, let Xi be a vector of variables that describe the characteristics of the borrower and

the loan that are available at the time of origination. For the project, you will use the data that I

provide you (the “development data”) to estimate

Pr[Di = 1|Xi] = G(Xiβ) (1)

Let β̂ denote the vector of estimated parameters for Equation 1. After you estimate β̂, your model

will be evaluated on its ability to distinguish between “good” and “bad” accounts as measured

by the Komolgorov-Smirnov (KS) statistic calculated on an out-of-time sample (the “OOT data”).

The student that builds the model with the highest KS statistic on the OOT sample will have 5

points added to her or his final grade.

Since you will all be working with the same development data, variation in the performance of

the models between students will be driven entirely by differences in how you build your mod-

els. In building your model, you may need to create new variables, such as interaction variables

and transformations. All of your data cleaning and model specification decisions, however,

must be explained and justified in your written report.

When building your model, the User Guide for the mortgage data will be of critical importance;

this guide can be found in the “Files-Term Project-Documentation” folder on the course Canvas

page. This file provides an overview of the loan-level data, describes the file layout, and defines

all of the variables that are included in the data.

Data Cleaning

While Freddie Mac has cleaned up the source files significantly, no dataset is perfect. As an

econometrician it is your responsibility to make sure that the data that you are using is correct;

2/7

ECON 6113 Term Project Guidelines

using data that is riddled with errors can have a seriously detrimental impact on your analysis.

As they say: “garbage in, garbage out.”

To make sure that you are working with a “clean” sample, you will likely want to remove some

observations from the data that appear to have been coded incorrectly or that are missing key

data elements. Some suggestions for cleaning your data are listed below.

• How frequently is a variable missing in the data? Will including the variable result in you

losing a large fraction of the overall sample?

• Plot a histogram of all of the variables you are considering for use in your model. Can

you identify any observations that are likely data entry errors? If so, you should consider

removing those observations from the data.

• When merging datasets, always verify that the merge was performed correctly.

• Calculate the summary statistics for all of the variables that you are considering for use in

your models. Do the means, minima, and maxima “make sense?” For example, economists

often work with variables that must logically be positive (e.g., prices) or bounded (e.g.,

fractions that must be between 0 and 1). If your data do not conform to expectations, you

should inspect the data more closely to understand what is going on. Note that in some

databases, values that appear to be bizarre (e.g., -99999 for a price) actually have a specific

meaning that can be gleaned from the codebook.

• ALWAYS READ THE CODEBOOK! In our case, the codebook for the Freddie Mac data is

called the “User Guide.” I cannot stress this enough. The easiest way to run into trouble

when building an econometric model is to just start throwing variables into a model before

you have any understanding of what those variables actually measure.

Data Sampling

Before building your model, you must use random sampling to split your data into two distinct

pieces: a development sample (70%) and a holdout sample (30%). The development sample is

the set of observations that will be used to build your model, and the holdout sample is the set

of observations on which you are to test the performance of your model.

Model Specification

You are free to specify your model as you please. I have, however, included some things you may

want to consider below when specifying your final model.

• The final specification of the model should be based on sound statistical reasoning. Since

you are building a predictive model, this means that you should only include variables in

your models that improve the models’ ability to differentiate between defaulting and non-

defaulting loans. You can identify these variables through traditional hypothesis testing,

an analysis of the model’s overall predictive performance, or some combination of the two

approaches.

3/7

ECON 6113 Term Project Guidelines

• You should think about how to incorporate non-linearities and interactions into your model

to improve model performance.

The Report

To receive credit for the project, you must submit a report via Canvas that contains the follow-

ing elements. The report should be written in clear, concise English as if it was aimed at a

professional audience. DO NOT SIMPLY TURN IN A LIST OF BULLET POINTS.

Data Description

In this portion of the report, you should describe the data that you are using to estimate your

model. Detailed information on the nature of the data can be found in the User Guide. If you

imposed any filters to remove likely outliers, those filters should be described in detail in this

section. For example, if you dropped observations with DTIs in excess of 90 because these are

likely data entry errors, you should state this exclusion in this part of the paper.

The Data Description section should include a table with summary statistics (such as the mean,

median, range, and standard deviation) for any of the variables that you include in your final

model. Please note that this section should only contain information on the variables that you

used to build and test your models; you do not need to discuss variables that are included in the

mortgage data that you do not utilize in your analysis.

You must use random sampling to split your data into two distinct pieces: a development sample

(70%) and a holdout sample (30%). The development sample is the set of observations that will

be used to build your model, and the holdout sample is the set of observations on which you

are to test the performance of your model. Describe how these samples were constructed in this

portion of the report.

Model Description

In this portion of the report, you should write down your model in mathematical notation. For

example, if Di is the default indicator and Xi is the vector of regressors, you should write down

your default model as

Pr[Di = 1|Xi = 1] = G(Xiβ)

For this project, you will use the logistic link function for G().

Lastly, and perhaps most importantly, this section should clearly state what your model is

designed to do, define the dependent variable, and describe your expectations for the rela-

tionship between each of the variables and mortgage default. For example, if you include DTI

in the model, you need to explain whether or not you expect DTI to increase default risk and

why you expect such a relationship to hold.

4/7

ECON 6113 Term Project Guidelines

Commentary on Initial Model Specification

Discuss in this section how you initially specified your model. Was your specification informed

by economic theory? Do you have expectations for the signs of any of the variables?

Next, discuss any initial testing that you conducted that you used to arrive. For example, if you

initially included DTI in your model but found that it was statistically insignificant, state this in

the report.

Final Model Output and Commentary

You must include a regression table that includes the estimated coefficients, standard errors,

and corresponding levels of statistical significance for your final specification. Below that table,

you should discuss and interpret your results. For example, if you included DTI in the model

because you expected higher DTIs to be associated with higher default rates, you should discuss

whether the final model results were consistent with expectations. In this section you should

emphasize the ceteris paribus interpretation of the regression coefficients. Turning back to the

DTI example, if you have a positive and statistically significant coefficient on the DTI term, you

should emphasize that, all else equal, borrowers with higher DTIs are more likely to default. In

this section you also must perform a marginal effects analysis and discuss the impact of the

variables in your model on the probability – not the log-odds – of default.

Model Performance Testing

After developing your final model using the development data, you must perform an analysis of

out-of-sample performance and compare this against in-sample performance. For this analysis,

the “score” is simply the predicted probability that a loan defaults based on your model. This

analysis must contain the following elements.

• An assessment of the discriminatory power of your model based on the KS statistic. If s

denotes the score, then the KS statistic is defined as

KS ≡ max

s

(F (s|B)− F (s|G))

where F (s|B) and F (s|G) are the cumulative distribution functions for the scores condi-

tional on the account being bad and good, respectively. Calculate KS using your develop-

ment data, and then calculate KS using your holdout data. Interpret the results. Does the

ability of your model to differentiate between good and bad accounts appear to be stable

out of sample?

• An assessment of the model’s predictive accuracy. To conduct this assessment, first partition

your scores into ten groups based on the deciles of the PD distribution. Within each decile,

sum over all of the predicted PDs in the decile to calculate the expected number of bads

within the decile. Calculate the expected number of good accounts similarly by summing

over the values of PD of the accounts that are within the decile. Formally, this expected bad

calculation can be written as

B̂d =

N

∑

i=1

PDi1id

5/7

ECON 6113 Term Project Guidelines

Expected Actual Expected Actual

Band Range Bad (B̂d) Bad (Bd) Good (Ĝd) Good (Gd)

1 (0, p1] ∑Ni=1 PDi1i1 B1 ∑

N

i=1 (1− PDi) 1i1 N1 − B1

2 (p1, p2] ∑Ni=1 PDi1i2 B2 ∑

N

i=1 (1− PDi) 1i2 N2 − B2

3 (p2, p3] ∑Ni=1 PDi1i3 B3 ∑

N

i=1 (1− PDi) 1i3 N3 − B3

4 (p3, p4] ∑Ni=1 PDi1i4 B4 ∑

N

i=1 (1− PDi) 1i4 N4 − B4

5 (p4, p5] ∑Ni=1 PDi1i5 B5 ∑

N

i=1 (1− PDi) 1i5 N5 − B5

6 (p5, p6] ∑Ni=1 PDi1i6 B6 ∑

N

i=1 (1− PDi) 1i6 N6 − B6

7 (p6, p7] ∑Ni=1 PDi1i7 B7 ∑

N

i=1 (1− PDi) 1i7 N7 − B7

8 (p7, p8] ∑Ni=1 PDi1i8 B8 ∑

N

i=1 (1− PDi) 1i8 N8 − B8

9 (p8, p9] ∑Ni=1 PDi1i9 B8 ∑

N

i=1 (1− PDi) 1i9 N9 − B9

10 (p9, 1] ∑Ni=1 PDi1i10 B10 ∑

N

i=1 (1− PDi) 1i10 N10 − B10

Table 1: An “Expected-Versus-Actual” Table.

where PDi is the estimated probability of default for account i, N denotes the total number

of accounts in the sample, and 1id is an indicator variable that is equal to 1 if account i is

included in decile or “band” d.

The number of expected goods in decile d can be written as

Ĝd =

N

∑

i=1

(1− PD)i 1id

• Construct a table that compares the expected goods (Ĝd) and actual goods (Gd) and the

expected bads (B̂d) and actual bads (Bd) for each of the score bands. Repeat this calculation

for the development sample and the holdout sample. Table 1 is a template for these

calculations. Based on these results, how well does your model predict default within the

bands? Is the accuracy of these predictions stable out-of-sample?

Code Submission

A key portion of the project is running your code on an out-of-time data sample. IF YOUR CODE

DOES NOT RUN ON THIS OUT-OF-TIME SAMPLE, YOU WILL NOT RECEIVE CREDIT

FOR THE PROJECT. To ensure that your code can be run on this holdout sample, your code must

be written in a manner that satisfies the requirements listed below. You will find sample code in

the “Files-Term-Project-StataCode” folder entitled “Basic_Regression_OutOfSample_LOGIT_W_TESTS.do”

that you can use to structure your code so that it conforms with these expectations.

• At the beginning of your do-file, define the directory that contains the data using the fol-

lowing local macro syntax: local data_dir “ZZZ” where ZZZ is the directory that contains

the data for the project. When I run your code on the out-of-time data, this directory will

be swapped to the directory on my computer where the out-of-time data resides.

6/7

ECON 6113 Term Project Guidelines

• All variable transformations and other data cleaning steps must be performed by calling a

separate do-file from within your main do-file. This do-file must be named “data_cleaning.do.”

To properly score the observations in the out-of-time data, the same variable transforma-

tions and data cleaning steps that were used in the construction of the initial model must

also be applied on the out-of-time observations. When I run your code to score the out-of-

time data, “data_cleaning.do” will be called to perform the necessary steps.

Plagiarism Detection

As a condition of taking this course, all required papers may be subject to submission for textual

similarity review to Turnitin.com via Canvas for the detection of plagiarism. All submitted

papers will be included as source documents in the Turnitin.com reference database solely for

the purpose of detecting plagiarism of such papers. No student papers will be submitted to

Turnitin.com without a student’s written consent and permission. If a student does not provide

such written consent and permission, the instructor may: (i) require a short reflection paper on

research methodology; (ii) require a draft bibliography prior to submission of the final paper; or

(iii) require the cover page and first cited page of each reference source to be photocopied and

submitted with the final paper

Grading

The grade that you receive for your term project will depend on 4 components: content (20%),

execution (20%), interpretation (30%), and writing (30%). The term project grading rubric, which

can be found on Canvas in the “Files-Term Project” folder, contains detailed information on how

scores for each of these components are determined.

7/7

Tom Mayock

Fall 2020

The standards and requirements set forth in these guidelines may be modified at any time by the course

instructor. Notice of such changes will be by announcement in class and/or by email.

Due Date: December 9, 2020 at 5:00PM.

NO LATE ASSIGNMENTS WILL BE ACCEPTED.

ALL ASSIGNMENTS MUST BE SUBMITTED VIA THE COURSE CANVAS SITE.

In order for your assignment to be graded, you must

1. Submit your Stata “do file” and Stata log file to Canvas.

2. Submit your written project report to Canvas.

3. Submit your written project report to the plagiarism detection software via Canvas.

Overview

30 percent of your grade in this course will depend on the completion of an empirical project.

I will provide you with the data for this project, which will be extracted from the Freddie Mac

Single Family Loan-Level Dataset. The goal of the project is to develop a “scorecard” to predict

mortgage defaults. The analysis for the project must be conducted in Stata for you to receive

a grade. Furthermore, THE PROJECT MUST BE COMPLETED INDEPENDENTLY. After the

code (or “do-file” in Stata parlance) for the projects is submitted, I will test the performance of

your model on an out-of-time sample. The student that builds the model that exhibits the best

model as measured by the Kolmogorov-Smirnov statistic on the out-of-time sample will have 5

additional points added to her/his final course grade. For example, if your total course grade

based on all other assignments is an 85% and you build the best mortgage scorecard, your final

course grade will be a 90%.

Objective

One of the goals of this course is to develop students’ facility to analyze data to study economic

problems using econometric methods. To that end, you will complete an empirical project that

demonstrates your ability to work with statistical software and data and interpret the results of

1

ECON 6113 Term Project Guidelines

econometric models.

Your “client” for this project is a mortgage lender that wishes to engage in risk-based pricing

for its mortgage loans. The first step in establishing risk-based pricing is the construction of a

mortgage “scorecard” that predicts the probability that a loan defaults. You will develop such a

scorecard using data that I provide you derived from the Freddie Mac Single Family Loan-Level

Dataset; the full version of this data “covers approximately 22.19 million fixed-rate mortgages

originated between January 1, 1999 and June 30, 2015 [that were purchased or guaranteed by

Freddie Mac].” I will be providing you with a small random sample of loans from a particular

origination cohort. You are tasked with developing an econometric model that predicts the

probability that a given loan defaults, where a default is defined as going 60 days-past-due or

entering foreclosure at any point within 4 years of origination. In what follows a “good” account

is one that does not default, whereas a “bad” account is one that does default.

The Model

Let Di be an indicator variable that takes a value of one if loan i defaults within 4 years of

origination and is zero otherwise. This variable is called “BAD_OVER_48_60” in the data file.

Additionally, let Xi be a vector of variables that describe the characteristics of the borrower and

the loan that are available at the time of origination. For the project, you will use the data that I

provide you (the “development data”) to estimate

Pr[Di = 1|Xi] = G(Xiβ) (1)

Let β̂ denote the vector of estimated parameters for Equation 1. After you estimate β̂, your model

will be evaluated on its ability to distinguish between “good” and “bad” accounts as measured

by the Komolgorov-Smirnov (KS) statistic calculated on an out-of-time sample (the “OOT data”).

The student that builds the model with the highest KS statistic on the OOT sample will have 5

points added to her or his final grade.

Since you will all be working with the same development data, variation in the performance of

the models between students will be driven entirely by differences in how you build your mod-

els. In building your model, you may need to create new variables, such as interaction variables

and transformations. All of your data cleaning and model specification decisions, however,

must be explained and justified in your written report.

When building your model, the User Guide for the mortgage data will be of critical importance;

this guide can be found in the “Files-Term Project-Documentation” folder on the course Canvas

page. This file provides an overview of the loan-level data, describes the file layout, and defines

all of the variables that are included in the data.

Data Cleaning

While Freddie Mac has cleaned up the source files significantly, no dataset is perfect. As an

econometrician it is your responsibility to make sure that the data that you are using is correct;

2/7

ECON 6113 Term Project Guidelines

using data that is riddled with errors can have a seriously detrimental impact on your analysis.

As they say: “garbage in, garbage out.”

To make sure that you are working with a “clean” sample, you will likely want to remove some

observations from the data that appear to have been coded incorrectly or that are missing key

data elements. Some suggestions for cleaning your data are listed below.

• How frequently is a variable missing in the data? Will including the variable result in you

losing a large fraction of the overall sample?

• Plot a histogram of all of the variables you are considering for use in your model. Can

you identify any observations that are likely data entry errors? If so, you should consider

removing those observations from the data.

• When merging datasets, always verify that the merge was performed correctly.

• Calculate the summary statistics for all of the variables that you are considering for use in

your models. Do the means, minima, and maxima “make sense?” For example, economists

often work with variables that must logically be positive (e.g., prices) or bounded (e.g.,

fractions that must be between 0 and 1). If your data do not conform to expectations, you

should inspect the data more closely to understand what is going on. Note that in some

databases, values that appear to be bizarre (e.g., -99999 for a price) actually have a specific

meaning that can be gleaned from the codebook.

• ALWAYS READ THE CODEBOOK! In our case, the codebook for the Freddie Mac data is

called the “User Guide.” I cannot stress this enough. The easiest way to run into trouble

when building an econometric model is to just start throwing variables into a model before

you have any understanding of what those variables actually measure.

Data Sampling

Before building your model, you must use random sampling to split your data into two distinct

pieces: a development sample (70%) and a holdout sample (30%). The development sample is

the set of observations that will be used to build your model, and the holdout sample is the set

of observations on which you are to test the performance of your model.

Model Specification

You are free to specify your model as you please. I have, however, included some things you may

want to consider below when specifying your final model.

• The final specification of the model should be based on sound statistical reasoning. Since

you are building a predictive model, this means that you should only include variables in

your models that improve the models’ ability to differentiate between defaulting and non-

defaulting loans. You can identify these variables through traditional hypothesis testing,

an analysis of the model’s overall predictive performance, or some combination of the two

approaches.

3/7

ECON 6113 Term Project Guidelines

• You should think about how to incorporate non-linearities and interactions into your model

to improve model performance.

The Report

To receive credit for the project, you must submit a report via Canvas that contains the follow-

ing elements. The report should be written in clear, concise English as if it was aimed at a

professional audience. DO NOT SIMPLY TURN IN A LIST OF BULLET POINTS.

Data Description

In this portion of the report, you should describe the data that you are using to estimate your

model. Detailed information on the nature of the data can be found in the User Guide. If you

imposed any filters to remove likely outliers, those filters should be described in detail in this

section. For example, if you dropped observations with DTIs in excess of 90 because these are

likely data entry errors, you should state this exclusion in this part of the paper.

The Data Description section should include a table with summary statistics (such as the mean,

median, range, and standard deviation) for any of the variables that you include in your final

model. Please note that this section should only contain information on the variables that you

used to build and test your models; you do not need to discuss variables that are included in the

mortgage data that you do not utilize in your analysis.

You must use random sampling to split your data into two distinct pieces: a development sample

(70%) and a holdout sample (30%). The development sample is the set of observations that will

be used to build your model, and the holdout sample is the set of observations on which you

are to test the performance of your model. Describe how these samples were constructed in this

portion of the report.

Model Description

In this portion of the report, you should write down your model in mathematical notation. For

example, if Di is the default indicator and Xi is the vector of regressors, you should write down

your default model as

Pr[Di = 1|Xi = 1] = G(Xiβ)

For this project, you will use the logistic link function for G().

Lastly, and perhaps most importantly, this section should clearly state what your model is

designed to do, define the dependent variable, and describe your expectations for the rela-

tionship between each of the variables and mortgage default. For example, if you include DTI

in the model, you need to explain whether or not you expect DTI to increase default risk and

why you expect such a relationship to hold.

4/7

ECON 6113 Term Project Guidelines

Commentary on Initial Model Specification

Discuss in this section how you initially specified your model. Was your specification informed

by economic theory? Do you have expectations for the signs of any of the variables?

Next, discuss any initial testing that you conducted that you used to arrive. For example, if you

initially included DTI in your model but found that it was statistically insignificant, state this in

the report.

Final Model Output and Commentary

You must include a regression table that includes the estimated coefficients, standard errors,

and corresponding levels of statistical significance for your final specification. Below that table,

you should discuss and interpret your results. For example, if you included DTI in the model

because you expected higher DTIs to be associated with higher default rates, you should discuss

whether the final model results were consistent with expectations. In this section you should

emphasize the ceteris paribus interpretation of the regression coefficients. Turning back to the

DTI example, if you have a positive and statistically significant coefficient on the DTI term, you

should emphasize that, all else equal, borrowers with higher DTIs are more likely to default. In

this section you also must perform a marginal effects analysis and discuss the impact of the

variables in your model on the probability – not the log-odds – of default.

Model Performance Testing

After developing your final model using the development data, you must perform an analysis of

out-of-sample performance and compare this against in-sample performance. For this analysis,

the “score” is simply the predicted probability that a loan defaults based on your model. This

analysis must contain the following elements.

• An assessment of the discriminatory power of your model based on the KS statistic. If s

denotes the score, then the KS statistic is defined as

KS ≡ max

s

(F (s|B)− F (s|G))

where F (s|B) and F (s|G) are the cumulative distribution functions for the scores condi-

tional on the account being bad and good, respectively. Calculate KS using your develop-

ment data, and then calculate KS using your holdout data. Interpret the results. Does the

ability of your model to differentiate between good and bad accounts appear to be stable

out of sample?

• An assessment of the model’s predictive accuracy. To conduct this assessment, first partition

your scores into ten groups based on the deciles of the PD distribution. Within each decile,

sum over all of the predicted PDs in the decile to calculate the expected number of bads

within the decile. Calculate the expected number of good accounts similarly by summing

over the values of PD of the accounts that are within the decile. Formally, this expected bad

calculation can be written as

B̂d =

N

∑

i=1

PDi1id

5/7

ECON 6113 Term Project Guidelines

Expected Actual Expected Actual

Band Range Bad (B̂d) Bad (Bd) Good (Ĝd) Good (Gd)

1 (0, p1] ∑Ni=1 PDi1i1 B1 ∑

N

i=1 (1− PDi) 1i1 N1 − B1

2 (p1, p2] ∑Ni=1 PDi1i2 B2 ∑

N

i=1 (1− PDi) 1i2 N2 − B2

3 (p2, p3] ∑Ni=1 PDi1i3 B3 ∑

N

i=1 (1− PDi) 1i3 N3 − B3

4 (p3, p4] ∑Ni=1 PDi1i4 B4 ∑

N

i=1 (1− PDi) 1i4 N4 − B4

5 (p4, p5] ∑Ni=1 PDi1i5 B5 ∑

N

i=1 (1− PDi) 1i5 N5 − B5

6 (p5, p6] ∑Ni=1 PDi1i6 B6 ∑

N

i=1 (1− PDi) 1i6 N6 − B6

7 (p6, p7] ∑Ni=1 PDi1i7 B7 ∑

N

i=1 (1− PDi) 1i7 N7 − B7

8 (p7, p8] ∑Ni=1 PDi1i8 B8 ∑

N

i=1 (1− PDi) 1i8 N8 − B8

9 (p8, p9] ∑Ni=1 PDi1i9 B8 ∑

N

i=1 (1− PDi) 1i9 N9 − B9

10 (p9, 1] ∑Ni=1 PDi1i10 B10 ∑

N

i=1 (1− PDi) 1i10 N10 − B10

Table 1: An “Expected-Versus-Actual” Table.

where PDi is the estimated probability of default for account i, N denotes the total number

of accounts in the sample, and 1id is an indicator variable that is equal to 1 if account i is

included in decile or “band” d.

The number of expected goods in decile d can be written as

Ĝd =

N

∑

i=1

(1− PD)i 1id

• Construct a table that compares the expected goods (Ĝd) and actual goods (Gd) and the

expected bads (B̂d) and actual bads (Bd) for each of the score bands. Repeat this calculation

for the development sample and the holdout sample. Table 1 is a template for these

calculations. Based on these results, how well does your model predict default within the

bands? Is the accuracy of these predictions stable out-of-sample?

Code Submission

A key portion of the project is running your code on an out-of-time data sample. IF YOUR CODE

DOES NOT RUN ON THIS OUT-OF-TIME SAMPLE, YOU WILL NOT RECEIVE CREDIT

FOR THE PROJECT. To ensure that your code can be run on this holdout sample, your code must

be written in a manner that satisfies the requirements listed below. You will find sample code in

the “Files-Term-Project-StataCode” folder entitled “Basic_Regression_OutOfSample_LOGIT_W_TESTS.do”

that you can use to structure your code so that it conforms with these expectations.

• At the beginning of your do-file, define the directory that contains the data using the fol-

lowing local macro syntax: local data_dir “ZZZ” where ZZZ is the directory that contains

the data for the project. When I run your code on the out-of-time data, this directory will

be swapped to the directory on my computer where the out-of-time data resides.

6/7

ECON 6113 Term Project Guidelines

• All variable transformations and other data cleaning steps must be performed by calling a

separate do-file from within your main do-file. This do-file must be named “data_cleaning.do.”

To properly score the observations in the out-of-time data, the same variable transforma-

tions and data cleaning steps that were used in the construction of the initial model must

also be applied on the out-of-time observations. When I run your code to score the out-of-

time data, “data_cleaning.do” will be called to perform the necessary steps.

Plagiarism Detection

As a condition of taking this course, all required papers may be subject to submission for textual

similarity review to Turnitin.com via Canvas for the detection of plagiarism. All submitted

papers will be included as source documents in the Turnitin.com reference database solely for

the purpose of detecting plagiarism of such papers. No student papers will be submitted to

Turnitin.com without a student’s written consent and permission. If a student does not provide

such written consent and permission, the instructor may: (i) require a short reflection paper on

research methodology; (ii) require a draft bibliography prior to submission of the final paper; or

(iii) require the cover page and first cited page of each reference source to be photocopied and

submitted with the final paper

Grading

The grade that you receive for your term project will depend on 4 components: content (20%),

execution (20%), interpretation (30%), and writing (30%). The term project grading rubric, which

can be found on Canvas in the “Files-Term Project” folder, contains detailed information on how

scores for each of these components are determined.

7/7