ECMT1020-ecmt1020代写|学霸联盟

ECMT1020-ecmt1020代写

时间：2023-03-20

ECMT1020 Introduction to Econometrics Week 3, 2023S1
Lecture 3: Simple Regression Analysis
Instructor: Ye Lu
Please read Chapter 1 of the textbook.
Contents
1 The simple linear regression model 1
2 The fitted regression model 2
2.1 Criteria to fit the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Ordinary least squares (OLS) regression . . . . . . . . . . . . . . . . . . . . . 3
2.3 Two algebraic results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 The goodness of fit: R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Interpretation of a regression equation 9
3.1 Changes in the units of measurement . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Demeaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Exercises 13
1 The simple linear regression model
Let X and Y be two random variables. We hypothesize the relationship between Y and X
is of the form
Y = β1 + β2X + u (1)
where
• Y is called the dependent variable or regressand ;
• X is called the independent variable, explanatory variable, or regressor ;
• β1 and β2 are fixed numbers which are unknown;
• u is called the disturbance term or error term, and it is also a random variable. The
reasons why a disturbance term exists include:
- omission of other explanatory variables;
- aggregation of variables;
- model misspecification;
- functional misspecification;
- measurement error.
1
The hypothesized mathematical relationship between Y and X given in (1) is known as the
regression model, and β1 and β2 are regarded as the (unknown) parameters of the regression
model.
Note that in the regression model, Y and X are observable, while the disturbance term
u is not observable. Assume that we collect a random sample of n observations for both
random variables X and Y . We denote our observations as
a sample of X : X1, X2, . . . , Xn,
a sample of Y : Y1, Y2, . . . , Yn.
Then the regression model (1) written in terms of (pre-)sampled random variables is
Yi = β1 + β2Xi + ui, i = 1, . . . , n. (2)
In fact, more assumptions on the distributional properties of the random variables in the
regression model (1) or (2) are necessary to make sure that the parameters β1 and β2 (i)
exist, (ii) can be uniquely identified, and (iii) have meaningful interpretations. The discussion
around this will come later in the next lecture.
2 The fitted regression model
Given n observations of Y and X, a researcher is then asked to fit the relationship between
Y and X specified in a regression model (1) or (2). Practically, this means we need to
‘estimate’ the values of the unknown parameters β1 and β2 using our data. Suppose we
(based on certain rules which will be made clear soon) decide that we will
• use some number b1 as an estimate for β1, and
• use some number b2 as an estimate for β2.
Then the fitted regression model is written as
Yˆ = b1 + b2X or Yˆi = b1 + b2Xi. (3)
Note the variables with or without a “hat” are completely different!! You need to be very
careful to follow the notations and understand when or when not to put a hat on a variable.
Please compare the ‘fitted model’ (3) with the ‘true model’ in (1) or (2), and notice
1. The difference between Yi and the fitted value Yˆi:
Yi = β1 + β2Xi + ui
Yˆi = b1 + b2Xi
2. The difference between the disturbance term ui and the so-called residual uˆi := Yi− Yˆi:
ui = Yi − β1 − β2Xi
uˆi = Yi − b1 − b2Xi
2
3. The difference between the two decompositions of the dependent variable:
theoretical: Yi = β1 + β2Xi + ui
operational: Yi = b1 + b2Xi︸︷︷︸
Yˆi
+uˆi
2.1 Criteria to fit the model
A natural question you may now ask is: how to find b1 and b2? Figure 1 gives a visual
illustration of how the realized values of a sample may suggest a “fitted line” where
• b1 is the intercept of the line, and
• b2 is the slope of the line.
Given such a fitted line, the realized residual uˆi is clearly the deviation of the realized Yi
from the value on the line corresponding to the realized value of Xi.
Figure 1: A fitted line (Figure 1.2 in the textbook)
The next smart questions is: what values of b1 and b2 give the “best” fitted line? Well,
to answer this question, we need to first devise a criterion by which we can judge how good a
potential fitted line is. In general, we may want the overall distance between the observations
of Y and the fitted line to be as small as possible. What quantity should we use to measure
such overall distance?
• Does it make sense to look at the sum of residuals, i.e.,
∑n
i=1 uˆi?
• How about the sum of the absolute values of residuals, i.e.,
∑n
i=1 |uˆi|?
• How about the sum of the squared residuals, i.e.,
∑n
i=1 uˆ
2
i ?
2.2 Ordinary least squares (OLS) regression
The most popularly used criterion to determine the best fitted line is known as the ordinary
least squares (OLS) criterion, which requires the choice of b1 and b2 to minimize the RSS ,
3
where RSS is the abbreviation of residual sum of squares (or, equivalently, sum of squared
residuals):
RSS =
n∑
i=1
uˆ2i .
The regression analysis using the OLS criterion to estimate the unknown parameters in the
regression model is known as the OLS regression.
In the following, we show how to estimate the unknown parameters β1 and β2 based
on the OLS criterion, or, in other words, how to obtain the OLS estimators of β1 and β2,
denoted as βˆ1 and βˆ2.
1
First, note that given the observations Yi and Xi and some tentative choices b1 and b2,
RSS =
n∑
i=1
uˆ2i =
n∑
i=1
(Yi − b1 − b2Xi)2
=
n∑
i=1
(Y 2i + b
2
1 + b
2
2X
2
i − 2b1Yi − 2b2XiYi + 2b1b2Xi)
=
n∑
i=1
Y 2i + nb
2
1 + b
2
2
n∑
i=1
X2i − 2b1
n∑
i=1
Yi − 2b2
n∑
i=1
XiYi + 2b1b2
n∑
i=1
Xi.
Next, since the choice variables here are b1 and b2, let’s consider RSS as a function of b1
and b2, and write
RSS(b1, b2) = nb
2
1 − 2b1
n∑
i=1
Yi + 2b1b2
n∑
i=1
Xi − 2b2
n∑
i=1
XiYi + b
2
2
n∑
i=1
X2i +
n∑
i=1
Y 2i . (4)
Now our problem boils down to a typical problem of minimizing a function with two argu-
ments (input variables): we want to find particular values of b1 and b2 such that RSS(b1, b2)
defined in (4) takes the minimum value2. That is, the OLS estimators for β1 and β2 are
given by
(βˆ1, βˆ2) = arg min
(b1,b2)
RSS(b1, b2).
The below “first-order conditons” can help us solve the ‘optimal’ values of b1 and b2
which give the minimum of RSS(b1, b2):
∂RSS(b1, b2)
∂b1
∣∣∣∣
b1=βˆ1,b2=βˆ2
= 0 and
∂RSS(b1, b2)
∂b2
∣∣∣∣
b1=βˆ1,b2=βˆ2
= 0. (5)
To derive the explicit form of these conditions, we need to take the partial derivatives of
1Note that we might want to write βˆOLS1 and βˆ
OLS
2 with the superscript ‘OLS’ to differentiate the OLS
estimators from estimators obtained using other criteria. But since the OLS estimators are the only estimators
we are concerned at present, we suppress the superscript for expositional simplicity. Later when we will
consider other estimators, we shall add the superscript back.
2Note that this process is not affected by X1, . . . , Xn and Y1, . . . , Yn as they are considered as given.
4
RSS(b1, b2) with respect to b1 and b2 separately:
∂RSS(b1, b2)
∂b1
= 2nb1 − 2
n∑
i=1
Yi + 2b2
n∑
i=1
Xi,
∂RSS(b1, b2)
∂b2
= 2b1
n∑
i=1
Xi − 2
n∑
i=1
XiYi + 2b2
n∑
i=1
X2i .
Then the first-order conditions in (5) imply
nβˆ1 −
n∑
i=1
Yi + βˆ2
n∑
i=1
Xi = 0 (6)
βˆ1
n∑
i=1
Xi −
n∑
i=1
XiYi + βˆ2
n∑
i=1
X2i = 0 (7)
which form a system of two equations with two unknowns (βˆ1 and βˆ2). Solving this system
of equations yields3
βˆ1 = Y − βˆ2X (8)
βˆ2 =
∑n
i=1XiYi − nX Y∑n
i=1X
2
i − nX
2 =
∑n
i=1(Xi −X)(Yi − Y )∑n
i=1(Xi −X)2
, (9)
where X = 1n
∑n
i=1Xi and Y =
1
n
∑n
i=1 Yi.
Given the OLS estimators βˆ1 and βˆ2, the fitted regression model is written as
Yˆi = βˆ1 + βˆ2Xi,
and the fitted residuals are
uˆi = Yi − Yˆi = Yi − βˆ1 − βˆ2Xi.
Lastly, as a practice of the above procedure for solving the OLS estimator(s), consider a
simpler case when there is no intercept term (β1 = 0) in the regression model:
Yi = β2Xi + ui.
In this case, write RSS as a function of b2 only and solve the OLS estimator βˆ2 which
minimizes the RSS. See pages 96–97 of the textbook.
2.3 Two algebraic results
Just to repeat: our true/population model with unknown population parameters β1 and β2
is
Yi = β1 + β2Xi + ui,
3The full derivation can be found on pages 93-94 of the textbook. Also, although we omit the process
here, to verify that βˆ1 and βˆ2 indeed minimize RSS(b1, b2) we need also check the second-order condition
which requires the second-order derivative to be positive.
5
and the fitted model with OLS estimators βˆ1 and βˆ2 is
Yˆi = βˆ1 + βˆ2Xi,
with the fitted residuals given by
uˆi = Yi − Yˆi = Yi − βˆ1 − βˆ2Xi. (10)
We can prove two purely mechanical results using simple algebra:
1. The sample mean of the residuals is always zero:
uˆ :=
1
n
n∑
i=1
uˆi = 0, (11)
which immediately implies that the sum of the residuals is always zero:
n∑
i=1
uˆi = 0, (12)
and
Yˆ :=
1
n
n∑
i=1
Yˆi
(11)
==
1
n
n∑
i=1
Yˆi +
1
n
n∑
i=1
uˆi =
1
n
n∑
i=1
(Yˆi + uˆi)
(10)
==
1
n
n∑
i=1
Yi =: Y . (13)
2. The sum of the products of Xi and uˆi is always zero:
n∑
i=1
Xiuˆi = 0, (14)
which, together with (11), implies that the sample covariance of X and uˆ is also always
zero:
1
n− 1
n∑
i=1
(Xi −X)(uˆi − uˆ) = 0. (15)
This also implies that the sample correlation coefficient for X and uˆ is zero (assuming
the denominator is nonzero).
Let’s prove (11) by first looking at
n∑
i=1
uˆi
(10)
==
n∑
i=1
(Yi − βˆ1 − βˆ2Xi) =
n∑
i=1
Yi − nβˆ1 − βˆ2
n∑
i=1
Xi,
and then dividing by n on both sides to get
1
n
n∑
i=1
uˆi =
1
n
n∑
i=1
Yi︸︷︷︸
Y
−βˆ1 − βˆ2 1
n
n∑
i=1
Xi︸︷︷︸
X
= Y − βˆ2X − βˆ1 (8)== βˆ1 − βˆ1 = 0. Done.
6
Next, we prove (14):
n∑
i=1
Xiuˆi
(10)
==
n∑
i=1
Xi(Yi − βˆ1 − βˆ2Xi) =
n∑
i=1
XiYi − βˆ1
n∑
i=1
Xi − βˆ2
n∑
i=1
X2i
(7)
== 0. Done.
Lastly, to see how we derive (15) from (14) and (11), note that
1
n
n∑
i=1
(Xi −X)(uˆi − uˆ) (11)== 1
n
n∑
i=1
(Xi −X)uˆi
=
1
n
n∑
i=1
Xiuˆi −X 1
n
n∑
i=1
uˆi︸︷︷︸
=0
=
1
n
n∑
i=1
Xiuˆi
(15)
== 0. Done.
Exercise: Can you prove
n∑
i=1
Yˆiuˆi = 0 (16)
as a consequence of (14), and then demonstrate that the sample covariance of Yˆ and uˆ is
zero?
2.4 The goodness of fit: R2
We have seen that the regression analysis decompose the dependent variable to a fitted value
component and a residual component:
Yi = Yˆi + uˆi. (17)
How to quantify the so-called “goodness of fit”?
• Recall the discussion in Sections 2.1 and 2.2, the residual sum of squares (RSS ) can
certainly considered as a measure for the goodness of fit. The smaller RSS is, the
better the model fit is.
• Here we introduce a ‘variance analysis’ of regression models by looking at the variance
decomposition of the fitted model. The goal is to measure how much variance/variation
in the observations of the dependent variance can be explained by the fitted model
Yˆi = b1+b2Xi. The more variation explained by the fitted model, the better the model
fit is.
We first look at the left-hand side of (17), and define
TSS :=
n∑
i=1
(Yi − Y )2,
which is (n − 1) times the sample variance of Y . We called it the total sum of squares
(TSS), as it is the sum of the squared deviations of the sample observations Yi about the
sample mean. We sometimes say that TSS characterizes the ‘total variation’ in the sampled
dependent variable.
7
How about the two terms on the right-hand side of (17)? We define analogously
ESS :=
n∑
i=1
(
Yˆi − Yˆ
)2 (13)
==
n∑
i=1
(Yˆi − Y )2,
n∑
i=1
(uˆi − uˆ)2 (11)==
n∑
i=1
uˆ2i =: RSS
where ESS stands for the ‘explained’ sum of squares by the fitted model and the RSS is
exactly the residual (or ‘unexplained’) sum of squares defined in Section 2.2. In fact, it is
not difficult to show that
TSS = ESS+ RSS. (18)
We can try:
TSS =
n∑
i=1
(Yi − Y )2
=
n∑
i=1
(Yi − Yˆi︸︷︷︸
uˆi
+Yˆi − Y )2 (subtract and add the same quantity)
=
n∑
i=1
(uˆi + Yˆi − Y )2
=
n∑
i=1
[
uˆ2i + (Yˆi − Y )2 + 2uˆi(Yˆi − Y )
]
=
n∑
i=1
uˆ2i︸︷︷︸
RSS
+
n∑
i=1
(Yˆi − Y )2︸︷︷︸
ESS
+2
n∑
i=1
uˆi(Yˆi − Y )︸︷︷︸
=0
where the last term is zero because
n∑
i=1
uˆi(Yˆi − Y ) =
n∑
i=1
Yˆiuˆi − Y
n∑
i=1
uˆi
(16),(12)
== 0. (19)
So, we are done.
A well-known R2 measure for the goodness of fit of regression model is defined as
R2 :=
ESS
TSS
.
and it measures the proportion of the total sum of squares (total variation) explained by the
fitted model. Clearly, the larger R2 is, the better the goodness of fit is.
Figure 2 shows an example of variance analysis with only four observations. Can you
calculate the R2 of this fitted regression based on the information in the table?
A few more properties of R2:
• 0 ≤ R2 ≤ 1
• R2 = TSS−RSSTSS = 1− RSSTSS → Minimizing RSS is equivalent to maximizing R2.
8
Figure 2: Example of the analysis of variance (Table 1.5 in the textbook)
• R2 = ρˆ2
Y Yˆ
: R2 is equal to the squared sample correlation coefficent of Y and Yˆ .4
To show R2 = ρˆ2
Y Yˆ
, note that
ρˆY Yˆ :=
σˆY Yˆ
σˆY σˆYˆ
=
1
n−1
∑n
i=1(Yi − Y )
(
Yˆi − Yˆ
)
√
1
n−1
∑n
i=1(Yi − Y )2
√
1
n−1
∑n
i=1
(
Yˆi − Yˆ
)2
(13)
==
∑n
i=1(Yi − Y )(Yˆi − Y )√∑n
i=1(Yi − Y )2
√∑n
i=1(Yˆi − Y )2
=
∑n
i=1(Yi − Y )(Yˆi − Y )√
TSS
√
ESS
,
where in the numerator
n∑
i=1
(Yi − Y )(Yˆi − Y ) =
n∑
i=1
(Yi − Yˆi︸︷︷︸
uˆi
+Yˆi − Y )(Yˆi − Y )
=
n∑
i=1
uˆi(Yˆi − Y )︸︷︷︸
=0 by (19)
+
n∑
i=1
(Yˆi − Y )2 =
n∑
i=1
(Yˆi − Y )2 = ESS.
Therefore, we have
ρˆY Yˆ =
ESS√
TSS
√
ESS
=
√
ESS
TSS
=
√
R2,
or, equivalently, R2 = ρˆ2
Y Yˆ
.
3 Interpretation of a regression equation
Now let’s consider a real-life example of a fitted regression (or regression equation) with the
application to labor economics in understanding the return to education. The upper panel
of Figure 3 shows a sample of 500 observations for
• Y = EARNINGS : hourly earnings measured in dollars
4Note that in (1.81) of the textbook (p. 110), the sample correlation coefficient of Y and Yˆ is denoted as
rY,Yˆ . Here I follow the notation used in the previous lecture and also in (R.59) of the textbook (p. 34).
9
• X = S : schooling measured by the highest grade completed
from a survey dataset EAWE (Educational Attainment and Wage Equations, see data set
description in Appendix B, p. 565–566, of the textbook). The hypothetical true model is
EARNINGS = β1 + β2S+ u.
We have learned how the OLS estimators of the unknown parameters β1 and β2 are
constructed using sample observations (see (8) and (9)). In practice, a statistical software
can help us generate the results based on the formulas and conduct further analysis. The
lower panel of Figure 3 shows how to use Stata, the statistical software we will use in this
course, to fit the model. In particular, the OLS estimators for β1 and β2 are listed under
Coef., standing for ‘coefficients’, in the lower half of the Stata output table.
Figure 3: A simple wage regression (Figure 1.7 and Table 1.2 in the textbook)
Based on the Stata output, we know that βˆ1 = 0.76 and βˆ2 = 1.27 (rounded to 2 decimal
places). Then, we can write the fitted regression equation as
ˆEARNINGS = βˆ1 + βˆ2S = 0.76 + 1.27S.
How to interpret this result?
• On βˆ2 = 1.27: the hourly earnings (is expected to) increase by 1.27 dollars for every
10
extra year of schooling.
• On βˆ1 = 0.76: an individual with no schooling (S=0) would (be expected to) have hourly
earnings of 0.76 dollars. → This sounds too low to be reasonable. Why? → S=0 is
too far from the data range, so this interpretation is not sensible in this circumstance.
Please see BOX 1.1 in the textbook (p. 100) for a foolproof way of interpreting the coefficients
of a fitted linear regression equation. Also note that we have not discussed anything about
the quality of our estimators βˆ1 and βˆ2, such as how accurate (bias? variance?) they are for
estimating the unknown parameters β1 and β2. This issue is just as important as interpreting
a fitted regression equation, if not more so, and will be discussed in detail in the next lecture.
3.1 Changes in the units of measurement
Suppose that the units of measurement of Y or X are changed. How will this affect the
fitted regression? For example, suppose the hourly earnings, Y , is not measured by dollars
but hundreds of dollars. Then we may consider defining a new variable Y ∗ = 0.01Y . How
the regression coefficient change after we replace Y by Y ∗?
Answering such a question gives you opportunities to practice the OLS formulas we
learned above. In general, we may consider
Y ∗ = λY
where λ is a constant, and a new regression model
Y ∗i = β
∗
1 + β
∗
2Xi + ui.
The question is: what is the OLS estimator for β∗2 in this new regression model? We apply
the formula in (9) with Y replaced by Y ∗ to get
βˆ∗2 :=
∑n
i=1(Xi −X)(Y ∗i − Y ∗)∑n
i=1(Xi −X)2
=
∑n
i=1(Xi −X)(λYi − λY )∑n
i=1(Xi −X)2
=
λ
∑n
i=1(Xi −X)(Yi − Y )∑n
i=1(Xi −X)2
= λβˆ2,
where βˆ2 is the OLS estimator for β2 from the original regression
Yi = β1 + β2Xi + ui.
More exercises of this kind are given in Exercise 1.12 and Exercise 1.13 in the textbook.
3.2 Demeaning
When the intercept in a regression equation has no sensible interpretation because X = 0 is
distant from the data range, we may consider ‘demeaning’ the variable X by defining a new
variable
X∗ := X −X. (20)
11
That is, for each observation i, we have
X∗i = Xi −X = Xi −
1
n
n∑
i=1
Xi,
and the regression model becomes
Yi = β
∗
1 + β
∗
2X
∗
i + ui = β
∗
1 + β
∗
2(Xi −X) + ui. (21)
The question is: what are the OLS estimators for β∗1 and β∗2 in this new regression?
• First, let’s see what the sample mean of a demeaned X defined in (20) is:
X∗ =
1
n
n∑
i=1
X∗i =
1
n
n∑
i=1
(Xi −X) = 1
n
n∑
i=1
Xi − 1
n
nX = X −X = 0! (22)
So we have
X∗i −X∗
(22)
== X∗i
(20)
== Xi −X. (23)
• Next, to derive the formula for the OLS estimator of β∗2 in regression (21), we replace
X by X∗ in (9) to get
βˆ∗2 :=
∑n
i=1(X
∗
i −X∗)(Yi − Y )∑n
i=1(X
∗
i −X∗)2
(23)
==
∑n
i=1(Xi −X)(Yi − Y )∑n
i=1(Xi −X)2
= βˆ2,
which is the same as the OLS estimator for β2 from the original regression
Yi = β1 + β2Xi + ui.
• What about the OLS estimator for β∗1 in regression (21)? We replace X by X∗, and
βˆ2 by βˆ
∗
2 in formula (8) to get
βˆ∗1 = Y − βˆ∗2X∗
(22)
== Y . (24)
That is, the value of βˆ∗1 is obtained as the sample mean of the dependent variable. The
interpretation of βˆ∗1 is the expected value of Y when X∗ = X −X = 0, that is, when
X takes value at the sample mean level.
Figure 4 shows the output of the wage regression when the schooling variable is demeaned.
We can compare the reported coefficient estimates here with those in the table in Figure 3.
It is clear that slope coefficients in the two regressions are the same (βˆ2 = βˆ
∗
2 = 1.27), while
the intercepts are totally different:
βˆ1 = 0.76 while βˆ
∗
1 = 19.58.
From the above derivation, we know that
βˆ∗1 = 19.58 = Y .
12
And the interpretation is that the hourly earnings for an individual whose years of schooling
at the mean level is (expected to be) 19.58 dollars. This interpretation is much more sensible
than the interpretation of βˆ1 in the original regression.
Figure 4: Wage regression with demeaned schooling (S ) (Table 1.3 in the textbook)
4 Exercises
The below questions are from Exercises 1.1, 1.4–8, 1.10, 1.12–13, 1.19–23 in the textbook.
1.1 Given a sample of data on Y and X, suppose that the fitted line is Yˆ = βˆ1+ βˆ2X with
βˆ1 and βˆ2 defined as in (8) and (9). Demonstrate that the fitted line must pass through the
point (X,Y ) representing the mean of the observations in the sample.
1.4 Demonstrate from first-oder conditions that the least squares estimator of β1 in the
primitive model where Y consists simply of a constant plus a disturbance term,
Yi = β1 + ui
is βˆ1 = Y , the sample mean of Y . (First define RSS and then differentiate.)
1.5 The table shows the average annual percentage rates of growth of employment, e, and
real GDP, g, for 31 OECD countries for the period 2002–2007. The regression output shows
the results of regressing e on g. Provide an interpretation of the coefficients.
13
1.6 In Exercise 1.5, e = 1.3603, g = 3.6508,∑
(ei − e)(gi − g) = 21.5935∑
(gi − g)2 = 90.7639.
Calculate the regression coefficients and check that they are the same as in the regression
output.
1.7 Does education attainment depend on intellectual ability? In the United States, as in
most countries, there is a positive correlation between educational attainment and cognitive
ability. S (highest grade completed by 2011) is the number of years of schooling of the
respondent. ASVABC is a composite measure of numerical and verbal ability scaled to have
mean 0 and standard deviation 1 (both approximately; for further details of the measure, see
Appendix B). Perform a regression of S on ASVABC and interpret the regression results.
1.8 Do earnings depend on education? using your EAWE data set, fit a wage equation
parallel to that in Table 1.2, regressing EARNINGS on S, and give an interpretation of the
coefficients.
1.10 The output shows the result of regressing the number of children in the family on the
years of schooling of the mother, using EAWE Data Set 21. Provide an interpretation of
the coefficients. (The data set contains data on siblings, the number of brothers and sisters
14
of the respondent. Therefore the total number of children in the family is the number of
siblings plus one.)
1.12 Demonstrate that, if the units of Y are changed so that Y ∗i = λ2Y , the new intercept
will be λ2βˆ1, where βˆ1 is the intercept in a regression of Y on X.
1.13∗ Suppose that the units of measurement of X are changed so that the new measure,
X∗i , is related to the original one by X
∗
i = µ2Xi. Show that the new estimate of the slope
coefficient is βˆ2/µ2, where βˆ2 is the slope coefficient in the original regression.
1.19 Using the data in Table 1.5 (Figure 2 above), calculate the correlation between Y and
Yˆ and verify that its square is equal to the value of R2.
1.20 What was the value of R2 in the education attainment regression fitted by you in
Exercise 1.7? Comment on it.
1.21 What was the value of R2 in the wage equation fitted by you in Exercise 1.8? Comment
on it.
1.22 Demonstrate that, in a regression with an intercept, a regression of Y ∗ on X must
have the same R2 as a regression of Y on X, where Y ∗ = λ2Y .
1.23∗ Demonstrate that, in a regression with an intercept, a regression of Y on X∗ must
have the same R2 as a regression of Y on X, where X∗ = µ2X.