ECON2300-无代写
时间:2023-09-04
ECON 2300 Introductory Econometrics
Lecture 2: Linear Regression with One Regressor
Rodney Strachan
July 2023
1 / 29
Overview of the topic:
▶ In this lecture we will learn how to investigate the relationship between
two variables (X and Y )
▶ A simple relationship is a straight line
▶ A line has a slope and an intercept
▶ We will learn how to estimate the slope and intercept and conduct
inference on these quantities
2 / 29
Outline:
▶ The population linear regression model (LRM) for i = 1, . . . , n
Yi = β0 + β1Xi + ui
▶ The ordinary least squares (OLS) estimator and the sample regression
line for i = 1, . . . , n
Yi = β̂0 + β̂1Xi + ûi
Ŷi = β̂0 + β̂1Xi
ûi = Yi − Ŷi
▶ Measures of fit of the sample regression
▶ The least squares assumptions
▶ The sampling distribution of the OLS estimator
3 / 29
Linear Regression
▶ Linear regression lets us estimate the slope of the population regression
line.
▶ The slope of the population regression line is the expected effect on Y of
a unit change in X .
▶ Ultimately our aim is to estimate the causal effect on Y of a unit change
in X – but for now, just think of the problem of fitting a straight line to data
on two variables, Y and X .
4 / 29
Linear Regression
▶ The problem of statistical inference for linear regression is, at a general
level, the same as for estimation of the mean or of the differences
between two means.
▶ Statistical, or econometric, inference about the slope entails:
▶ Estimation:
How should we draw a line through the data to estimate the population
slope? Answer: ordinary least squares (OLS).
What are advantages and disadvantages of OLS?
▶ Hypothesis testing:
How to test if the slope is zero?
▶ Confidence intervals:
How to construct a confidence interval for the slope?
5 / 29
The Linear Regression Model SW Section 4.1
Does the number of students in a class affect how well the students learn?
▶ The population regression line:
E [TestScorei |STRi ] = β0 + β1 STRi
▶ β1 = slope of population regression line
= change in test score for a unit change in student-teacher ratio (STR)
▶ Why are β0 and β1 “population” parameters?
▶ We would like to know the population value of β1.
▶ We don’t know β1, so must estimate it using data.
6 / 29
The Population Linear Regression Model
Consider
Yi = β0 + β1Xi + ui
for i = 1, . . . , n
▶ We have n observations, (Xi ,Yi), i = 1, .., n.
▶ X is the independent variable or regressor or right-hand-side variable
▶ Y is the dependent variable or left-hand-side variable
▶ β0 = intercept
▶ β1 = slope
▶ ui = the regression error
▶ The regression error consists of omitted factors. In general, these
omitted factors are other factors that influence Y , other than the variable
X . The regression error also includes error in the measurement of Y .
7 / 29
The population regression model in a picture
▶ Observations on Y and X (n = 7); the population regression line; and
the regression error (the “error term"):
8 / 29
The Ordinary Least Squares Estimator (SW Section 4.2)
▶ How can we estimate β0 and β1 from data?
▶ Recall that the least squares estimator of µY is Y , which solves
min
m
n∑
i=1
(Yi −m)2
▶ By analogy, we will focus on the least squares (“ordinary least squares”
or “OLS”) estimator of the unknown parameters β0 and β1. The OLS
estimator solves,
min
b0,b1
n∑
i=1
[Yi − (b0 + b1Xi)]2
▶ In fact, we estimate the conditional expectation function E [Y |X ] under
the assumption that E [Y |X ] = β0 + β1X
9 / 29
Mechanics of OLS
▶ The population regression line:
E [TestScore|STR] = β0 + β1 STR
10 / 29
Mechanics of OLS
▶ The OLS estimator minimizes the average squared difference between
the actual values of Yi and the prediction (“predicted value”, Ŷi ) based
on the estimated line.
▶ This minimization problem can be solved using calculus (Appendix 4.2).
▶ The result is the OLS estimators of β0 and β1.
11 / 29
OLS estimator, predicted values, and residuals
▶ The OLS estimators are
β̂1 =
∑n
i=1(Xi − X )(Yi − Y )∑n
i=1(Xi − X )2
β̂0 = Y − β̂1X
▶ The OLS predicted (fitted) values Ŷi and residuals ûi are
Ŷi = β̂0 + β̂1Xi
ûi = Yi − Ŷi
▶ The estimated intercept, β̂0, and slope, β̂1, and residuals ûi are
computed from a sample of n observations (Xi ,Yi) i = 1, . . . , n.
▶ These are estimates of the unknown population parameters β0 and β1.
12 / 29
OLS regression: R output
̂TestScore = 698.93− 2.28× STR
We will discuss the rest of this output later
13 / 29
Predicted values & residuals
▶ One of the districts in the data set is Antelope, CA, for which
STR = 19.33 and TestScore = 657.8
predicted value: = 698.9− 2.28× 19.33 = 654.8
residual: = 657.8− 654.8 = 3.0
14 / 29
Measures of fit Section 4.3
How good is this estimated line? How well does this line fit the data?
▶ Two regression statistics provide complementary measures of how well
the regression line “fits” or explains the data:
▶ The regression R2 measures the fraction of the variance of Y that is
explained by X ; it is unit free and ranges between zero (no fit) and one
(perfect fit)
▶ The standard error of the regression (SER) measures the magnitude of
a typical regression residual in the units of Y .
15 / 29
Regression R2
▶ The sample variance of Yi = 1n
∑n
i=1(Yi − Y )2
The sample variance of Ŷi = 1n
∑n
i=1(Ŷi − Ŷ )2, where in fact Ŷ = Y .
R2 is simply the ratio of those two sample variances.
▶ Formally, we define R2 as follows (two equivalent definitions);
R2 :=
Explained Sum of Squares (ESS)
Total Sum of Squares (TSS)
=
∑n
i=1(Ŷi − Y )2∑n
i=1(Yi − Y )2
R2 := 1− Residual Sum of Squares (RSS)
Total Sum of Squares
= 1−
∑n
i=1 û
2
i∑n
i=1(Yi − Y )2
▶ R2 = 0 ⇐⇒ ESS = 0 and R2 = 1 ⇐⇒ ESS = TSS. Also, 0 ≤ R2 ≤ 1
▶ For regression with a single X ,
R2 = the square of the sample correlation coefficient between X and Y
16 / 29
The Standard Error of the Regression (SER)
▶ The SER measures the spread of the distribution of u. The SER is
(almost) the sample standard deviation of the OLS residuals:?
SER :=
√√√√ 1
n − 2
n∑
i=1
û2i
▶ The SER:
▶ has the units of ui , which are the units of Yi
▶ measures the average “size” of the OLS residual (the average “mistake”
made by the OLS regression line)
▶ The root mean squared error (RMSE) is closely related to the SER:
RMSE :=
√√√√1
n
n∑
i=1
û2i
▶ When n is large, SER ≈ RMSE.1
1Here, n − 2 is the degrees of freedom – need to subtract 2 because there are two parameters
to estimate. For details, see section 18.4.
17 / 29
Example of the R2 and the SER
▶ TestScore = 698.9− 2.28× STR, R2 = 0.05, SER ≈ RMSE = 18.6
▶ STR explains only a small fraction of the variation in test scores.
▶ Does this make sense?
▶ Does this mean the STR is unimportant in a policy sense?
▶ Note that RMSE can be computed in R
18 / 29
Least Squares Assumptions (SW Section 4.4)
▶ What, in a precise sense, are the properties of the sampling distribution
of the OLS estimator? When will it be unbiased? What is its variance?
▶ To answer these questions, we need to make some assumptions about
how Y and X are related to each other, and about how they are collected
(the sampling scheme)
▶ These assumptions – there are three – are known as the Least Squares
Assumptions.
19 / 29
Least Squares Assumptions (SW Section 4.4)
Yi = β0 + β1Xi + ui , i = 1, . . . , n
LSA 1. The conditional distribution of u given X has mean zero, that is,
E(u|X = x) = 0.
▶ This implies that OLS estimators are unbiased
LSA 2. (Xi ,Yi), i = 1, · · · , n, are i.i.d.
▶ This is true if (X ,Y ) are collected by simple random sampling
▶ This delivers the sampling distribution of β̂0 and β̂1
LSA 3. Large outliers in X and/or Y are rare.
▶ Technically, X and Y have finite fourth moments
0 < E(X4i ) <∞ and 0 < E(Y 4i ) <∞
▶ Outliers can result in meaningless values of β̂1
20 / 29
Least squares assumption #1: E(u|X = x) = 0.
For any given value of X , the mean of u is zero. Example,
TestScorei = β0 + β1STRi + ui ,
where ui = other factors
▶ What are some of these “other factors”?
▶ teacher quality of district i ,
▶ quality of textbook,
▶ proportion of Native English speakers,
▶ wealth, income, or additional resources
▶ Is “E(u|X = x) = 0" plausible for these other factors?
21 / 29
Least squares assumption #1: E(u|X = x) = 0 (continued)
▶ A benchmark for thinking about this assumption is to consider an ideal
randomized controlled experiment:
▶ X is randomly assigned (students randomly assigned to different size
classes; patients randomly assigned to medical treatments).
Randomization is done by computer – using no information about the
individual.
▶ Because X is assigned randomly, all other individual characteristics –
the things that make up u – are distributed independently of X , so u and
X are independent
▶ Thus, in an ideal randomized controlled experiment, E(u|X = x) = 0
(that is, LSA #1 holds)
▶ In actual experiments, or with observational data, we will need to think
hard about whether E(u|X = x) = 0 holds.
22 / 29
Least squares assumption #2: (Xi ,Yi), i = 1, · · · ,n are i.i.d.
▶ This arises automatically if the entity (individual, district) is sampled by
simple random sampling:
▶ The entities are selected from the same population, so (Xi ,Yi ) are
identically distributed for all i = 1, . . . , n.
▶ The entities are selected at random, so the values of (X ,Y ) for different
entities are independently distributed.
▶ The main place we will encounter non-i.i.d. sampling is when data are
recorded over time for the same entity (panel data and time series data)
– we will deal with that complication when we cover panel data.
23 / 29
OLS can be sensitive to an outlier:
▶ Is the lone point an outlier in X or Y?
▶ In practice, outliers are often data glitches (coding or recording
problems). Sometimes they are observations that really shouldn’t be in
your data set. Plot your data before running regressions!
24 / 29
Least squares assumption #3: Large outliers are rare
Technical statement: E(X 4) <∞ and E(Y 4) <∞
▶ A large outlier is an extreme value of X or Y
▶ On a technical level, if X and Y are bounded, then they have finite fourth
moments. (Standardized test scores automatically satisfy this; STR,
family income, etc. satisfy this too.)
▶ The substance of this assumption is that a large outlier can strongly
influence the results – so we need to rule out large outliers.
▶ Look at your data! If you have a large outlier, is it a typo? Does it belong
in your data set? Why is it an outlier?
25 / 29
The Sampling Distribution of the OLS Estimator (SW Section 4.5)
The OLS estimator is computed from a sample of data. A different sample
yields a different value of β̂1. This is the source of the “sampling uncertainty”
of β̂1. We want to:
▶ quantify the sampling uncertainty around β̂1,
▶ use β̂1 to test hypotheses such as β1 = 0, and
▶ construct a confidence interval for β1
▶ All these require figuring out the sampling distribution of the OLS
estimator.
26 / 29
Sampling Distribution of β̂1
▶ We can show that β̂1 is unbiased, i.e., E [β̂1] = β1. Similarly for β̂0.
▶ We do not derive V (β̂1), as it requires some tedious algebra. Moreover,
we do not need to memorize the formula of it. Here, we just emphasize
two aspects of V (β̂1).
▶ First, V (β̂1) is inversely proportional to n, just like V (Y n). Combining
E [β̂1] = β1, it is then suggested that β̂1
p−→ β1, i.e., β̂1 is consistent.
That is, as sample size grows, β̂1 gets closer to β1.
▶ Second, V (β̂1) is inversely proportional to the variance of X ; see the
graphs below.
27 / 29
Sampling Distribution of β̂1
Low x variation High x variation
⇒low precision ⇒ high precision
▶ Intuitively, if there is more variation in X , then there is more information
in the data that you can use to fit the regression line.
28 / 29
Sampling Distribution of β̂1
▶ The exact sampling distribution is complicated – it depends on the
population distribution of (Y ,X ) – but when n is large we get some
simple (and good) approximations:
▶ Let SE(β̂1) be the standard error (SE) of β̂1, i.e., a consistent estimator
for the standard deviation of β̂1 which is

V (β̂1)
▶ Then, it turns out that
β̂1 − β1
SE(β̂1)
approx∼ N (0, 1)
▶ Using this approximate distribution, we can conduct statistical inference
on β̂1, i.e., hypothesis testing, confidence interval ⇒ Ch5.
essay、essay代写