EC226 (Term 1: Handout 1) 1 INTRODUCTION
Two variable linear regression analysis
(Readings: Stock + Watson: Ch 4 + 5, Dougherty: Ch 2, Wooldridge: Ch2)
1 Introduction
Econometrics literally means economic measurement. Hendry1 described econometrics as “An anal-
ysis of the relationship between economic variables ... by abstracting the main phenomena of interest
and stating theories thereof in mathematical form” , and Samuelson et al.2 stated that “Econometrics
may be defined as the quantitative analysis of actual economic phenomena” Other authors have been
less complementary, Leamer3 believes that econometrics “is practised at the computer terminal (and)
involves fitting many, perhaps thousands, of statistical models. One or several that the researcher
finds pleasing are selected for reporting purposes” .
Econometrics has three major uses:
1. Describing economic reality.
2. Testing hypotheses about economic theory.
3. Forecasting future economic activity.
Econometricians attempt to quantify economic relationships that had previous been only theoretical.
To undertake this requires 3 steps:
1. Specifying/identifying theoretical economic relationship between the variables.
2. Collecting the data on those variables identified by the theoretical model.
3. Obtaining estimates of the parameters in the theoretical relationship.
1D. F. Hendry. (1980). Econometrics-Alchemy or Science? Economica, 47(188), 387–406
2Samuelson, P. A., T. C. Koopmans, and J. R. N. Stone. Report of the Evaluative Committee for Econometrica.
Econometrica 22, no. 2 (1954): 141–46
3Leamer, E. E. (1983). Let’s Take the Con Out of Econometrics. The American Economic Review, 73(1), 31–43
1
EC226 (Term 1: Handout 1) 2 CORRELATION VS REGRESSION ANALYSIS
2 Correlation vs Regression analysis
2.1 Correlation
In Economics we are interested in the relation between 2 or more random variables, for example:
• Sales and advertising expenditure
• Personal consumption and disposable income
• Investment and interest rates
• Earnings and schooling
While there are many ways in which these pairs of variables might be related – a linear relationship
is often a useful first approximation and this can be detected via a scatter plot, of one variable against
the other.
A measure of linear association between two random variables x and y is the covariance, which
for a sample of n pairs of observations (x1, y1) . . . (xn, yn) is calculated as:
cov(x, y) =
n∑
i=1
(xi − x)(yi − y)
n− 1
The covariance measures the average cross product of deviations of x, around its mean, with y,
around its mean. If high (low) values of x - relative to its mean - are associated with high (low)
values of y - relative to its mean – then we get a high positive covariance (see Figure 1). Conversely
if high (low) values of x are associated with low (high) values of y we get a negative covariance (see
Figure 2). A zero covariance occurs when there is no predominant association between the x and y
values (see Figure 3). The covariance is a linear association between x and y values and would be
approximately zero for a quadratic association (see Figure 4).
2
EC226 (Term 1: Handout 1) 2 CORRELATION VS REGRESSION ANALYSIS
3
EC226 (Term 1: Handout 1) 2 CORRELATION VS REGRESSION ANALYSIS
The covariance statistic is not scale free and multiplying the x variable by 100 multiplies the
covariance by 100. A scale free measure is a correlation, defined as:
corr(x, y) ≡ ρ(x, y) = cov(x, y)√
V (x)V (y)
as
corr(ax, y) =
cov(ax, y)√
V (ax)V (y)
=
a.cov(x, y)√
a2V (x)V (y)
=
a.cov(x, y)
a
√
V (x)V (y)
= corr(x, y)
ρ is a population parameter of association between the random variables x and y, and:
1. −1 ≤ ρ(x, y) ≤ 1
2. ρ(x, y) = −1 ⇒ perfect negative association
3. ρ(x, y) = 1 ⇒ perfect positive linear association
4. ρ(x, y) = 0 ⇒ no linear association
5. As|ρ(x, y)| increases ⇒ stronger association.
6. ρ(x, y) = ρ(y, x)
2.2 Regression
By contrast linear regression looks at the linear causal association between the random variables x
and y. In particular, we talk about the variable, x, taking a specific value and we are interested in
the response of y to a change in this value of x. So in our examples above we might be interested in
• Changes in sales caused by increased advertising expenditure
• Changes in personal consumption caused by increased disposable income
• Changes in investment caused by increased interest rates
4
EC226 (Term 1: Handout 1) 2 CORRELATION VS REGRESSION ANALYSIS
• Changes in earnings caused by increased schooling
In the simplest type of linear regression analysis we model the relationship between 2 variables y and
x and this is assumed to be a linear relationship. In particular, we are interested in the expected
value of the random variable, y, given a specific value for x. Given linearity this is 4:
E(y|x) = α+ βx
when
E(y|x = 0) = α⇒ expected value of y when x=0 (invariably do not interpret this)
E(y|x+ 1) = α+ β(x+ 1)
therefore,
β = E(y|x+ 1)− E(y|x)⇒ change in the expected value of y for a unit increase in x.
y – is known as the dependent variable (endogenous variable or regressand)
x – is known as the independent variable (exogenous variable, explanatory variable or
regressor).
The actual values of the dependent variable, y, will not be the same as the expected value and we
denote the discrepancy (error or disturbance) between the actual and expected value by εi, where:
εi = yi − E(yi|xi) = yi − α− βxi
Rearranging we have
yi = α+ βxi + εi i = 1, 2, . . . , n (1)
and this is the TRUE (but unknown) relationship between y and x and is made up of two components:
1. α+ βxi - the systematic part
2. εi- the random (non-systematic) component.
4Appendix 1 has some rules on expectations and variances.
5
EC226 (Term 1: Handout 1) 3 CLRM ASSUMPTIONS
3 CLRM assumptions
To complete the model we need to specify the statistical properties of x and ε and these are known
as the Classical Linear Regression Model (CLRM) assumtions:
1. E(εi|xi) = E(εi) = 0 ∀i (so the error term is independent of xi).
2. V (εi|xi) = σ2 ∀i (error variance is constant (homoscedastic) – points are distributed around the
true regression line with a constant spread)
3. cov(εi, εj |xi) = 0 for i ̸= j (the errors are serially uncorrelated over observations)
4. εi|xi ∼ N(0, σ2) ∀i
Figure 5 for a plot of a TRUE regression and the distribution of points around this which are
consistent with the CLRM assumptions. Figure 6 plots the actual TRUE disturbance terms from
Figure 5, which are consistent with the CLRM assumptions.
6
EC226 (Term 1: Handout 1) 3 CLRM ASSUMPTIONS
7
EC226 (Term 1: Handout 1) 4 IMPORTANT QUESTIONS TO BE ANSWERED
4 Important questions to be answered
1. How do we estimate the population parameters (α, β, σ2) of this statistical model?
2. What are the properties of the estimators?
3. How do we test hypotheses about these parameters?
4. How strong is the relationship?
5. Is the model adequate?
6. Can the model be used for forecasting?
4.1 Estimation of the sample regression line
The statistical model is:
yi = α+ βxi + εi i = 1, 2, . . . , n
1. E(εi|xi) = 0.
2. V (εi|xi) = σ2.
3. cov(εi, εj |xi) = 0
4. εi|xi ∼ N(0, σ2).
We are interested in estimating the population parameters, α, β, σ2, we will call the estimates, a, b
and s2.
Note: for ease of notation we are not going to explicitly write conditional on xi.
For n pairs of observations (x1, y1) . . . (xn, yn) we want to find the straight line that best fits these
points, denote:
yˆi = a+ bxi
as the predicted values of y from the model. In which case we can define
yi = yˆi + ei = a+ bxi + ei
and e are the residuals (difference between the actual value of y and its predicted value, yˆi), such
that,
ei = yi − yˆi
see Figure 7 for an illustration of the data points (black diamonds), the predicted line (black line)
and the residuals (vertical distance between the diamonds and the predicted line) and highlighted
for e8 and e30.
8
EC226 (Term 1: Handout 1) 4 IMPORTANT QUESTIONS TO BE ANSWERED
For any given values of a and b we can define a regression line (in Figure 8 we plot three alternative
regression lines for ai and bi i = 1, 2, 3). But we want a and b to have some desirable properties.
The best line is that which makes the residuals, ei, as small as possible. However, as residuals can
be both positive and negative, obtaining lines such that
n∑
i=1
ei = 0 can yield a variety of equally good
lines (in Figure 8 all three lines have this property). The optimal solution is to minimise the RSS
(Residual Sum of Squares)
n∑
i=1
e2i =
n∑
i=1
(yi − a− bxi)2
with respect to the two unknown parameters a and b.
9
EC226 (Term 1: Handout 1) 4 IMPORTANT QUESTIONS TO BE ANSWERED
To achieve this we must differentiate the RSS expression with respect to a and b and set the
resultant expressions equal to zero – this process of minimising the RSS to obtain the parameter
estimates is known as Ordinary Least Squares (OLS).
Differentiating the RSS with respect to the parameter, a, and setting the expression to zero:
∂(
∑
i=1
(yi − a− bxi)2
∂a
=
∂
[
(y1 − a− bx1)2 + ...+ (yn − a− bxn)2
]
∂a
= 0 (2)
This entails differentiating each of the n terms in equation (2) with respect to a:
∂(
∑
i=1
(yi − a− bxi)2
∂a
= −2(y1 − a− bx1)− . . .− 2(yn − a− bxn) = 0
= −2
n∑
i=1
(yi − a− bxi) = −2
n∑
i=1
ei = 0⇒
n∑
i=1
ei = 0
(3)
Differentiating the RSS with respect to the parameter, b, and setting the expression to zero:
∂(
∑
i=1
(yi − a− bxi)2
∂b
=
∂
[
(y1 − a− bx1)2 + ...+ (yn − a− bxn)2
]
∂b
= 0 (4)
differentiating each of these n terms in equation (4) with respect to b:
∂(
∑
i=1
(yi − a− bxi)2
∂b
= −2x1(y1 − a− bx1)− ...− 2xn(yn − a− bxn) = 0
= −2
n∑
i=1
xi(yi − a− bxi) = −2
n∑
i=1
xiei = 0⇒
n∑
i=1
xiei = 0
(5)
NOTE
(i) Equation (3) implies that the residuals always sum to zero (providing there is an intercept in
the model)
(ii) Equation (5) implies that the covariance (and hence correlation) between the residuals and the
x’s is zero, that is, they are orthogonal.
The two equations (3) and (5) are called NORMAL equations. Appendix E has a small
empirical example of estimating a two variable regression model by OLS, where we show that
n∑
i=1
ei =
0 and
n∑
i=1
xiei = 0.
Solving equation (3) for a we have:∑
i=1
ei ≡
∑
i=1
yi −
∑
i=1
a−
∑
i=1
bxi =
∑
i=1
yi − na− b
∑
i=1
xi = 0
which implies:
a = y − bx (6)
Substituting equation (6) in equation (5):
⇒
n∑
i=1
xi(yi − (y − bx)− bxi) =
∑
i=1
xi(yi − y)− b
∑
i=1
xi(xi − x) =0
10
EC226 (Term 1: Handout 1) 4 IMPORTANT QUESTIONS TO BE ANSWERED
b
∑
i=1
xi(xi − x) =
∑
i=1
xi(yi − y)
b =
∑
i=1
xi(yi − y)∑
i=1
xi(xi − x) ≡
∑
i=1
(xi − x)(yi − y)∑
i=1
(xi − x)2
(7)
We need to estimate σ2 (variance of the disturbance term, εi), this is estimated as:
s2 =
n∑
i=1
e2i
DoF
=
RSS
DoF
(8)
where DoF are the degrees of freedom for the residuals. The DoF is the number of observations, n,
less the number of restrictions we have on these residuals, that is,
n∑
i=1
ei = 0 and
n∑
i=1
xiei = 0 (which
must be equal to the number of estimated parameters, a and b), i.e. n− 2.
These OLS estimates have a number of desirable properties (known as the Gauss Markov Theorem)
in that the estimators are BLUE:
1. Best - have the minimum variance, such that V (b|x) ≤ V (b∗|x), where b* is any alternative
unbiased estimator.
2. Linear – linear function of the error term.
3. Unbiased - E(a|x) = α and E(b|x) = β
4. Estimators.
4.2 Properties of the OLS estimators
From equation (1) taking averages we have that y = α+βx+ ε, in which case we can write equation
(1) as:
yi − y = β(xi − x) + εi − ε (9)
substituting equation (9) into equation (7) we have:
b =
∑
i=1
(xi − x){β(xi − x) + (εi − ε)}∑
i=1
(xi − x)2
= β
1︷ ︸︸ ︷∑
i=1
(xi − x)2∑
i=1
(xi − x)2
+
∑
i=1
(xi − x)(εi − ε)∑
i=1
(xi − x)2
(10)
From (10) we see that b as a linear function of the error term. Taking equation (10) we can show
that our OLS estimator is unbiased:
b = β +
∑
i=1
ωiεi, where ωi =
(xi − x)∑
i=1
(xi − x)2
(11)
Digression
Some of the properties of the variable ωi.
(i)
∑
i=1
ωi =
∑
i=1
(xi−x)∑
i=1
(xi−x)2 = 0
11
EC226 (Term 1: Handout 1) 4 IMPORTANT QUESTIONS TO BE ANSWERED
(ii)
∑
i=1
ω2i =
∑
i=1
(xi−x)2(∑
i=1
(xi−x)2
)2 = 1∑
i=1
(xi−x)2
4.2.1 Unbiasedness
Consider the slope coefficient, b:
E(b|x) = E(β|x) + E
(∑
i=1
ωiεi|x
)
as β is a constant, E(β|x) = β
E(b|x) = β + E
(∑
i=1
ωiεi|x
)
= β + E(ω1ε1 + ω2ε2 + ...+ ωnεn|x)
As ωi is made up of xi, (then E(ωiεi|xi) = ωiE(εi|xi), consequently:
E(b|x) = β + ω1E(ε1|x) + ω2E(ε2|x) + ...+ ωnE(εn|x) = β
as E(εi|x) = 0 ∀ i, from CLRM assumption 1.
Therefore the slope coefficient is unbiased estimator, that is, E(b|x) = β, that is, in repeated
regressions of y on x the average of the coefficient estimates of b will be equal to the true coefficient.
Now looking at the intercept, a,
E(a|x) = E(y|x)− E(bx|x) = E(α+ βx+ ε|x)− xE(b|x) = α+ βx+ E(ε|x)− βx = α
as E(ε|x) = 0. Therefore the intercept is an unbiased estimator, that is, E(a|x) = α.
To try and give the concept of unbiasedness some meaning, consider the (unrealistic) case in which
we know the true underlying relationship:
E(yi|xi) = 1 + 2xi
such that
yi|xi = 1 + 2xi + εi (12)
that is α = 1 and β = 2. We then observe some realisations of y1i based on the known values of xi
and some random shocks (ε1i ∼ N(0, 1)) according to equation (12) – these are plotted in Figure 9a.
Applying OLS to this data yields the dashed line, which compared to the true regression line (solid
line), yields an intercept which is too small (a=0.307) and a slope which is too large (b=2.322).
However, these realisations of y (y1i ) are only one of an infinite number that could have arisen
depending upon what shocks arose.
Suppose we can start the world off again and for the same values of xi we observe some new random
shocks (ε2i ) and this yields some new values realisations of y (y
2
i ) according to equation (12). These
new data are plotted in Figure 9b. Applying OLS to this new data yields the dashed line, which
compared to the true regression line (solid line), yields an intercept which is slightly too small
(a=0.651) and a slope which is slightly too large (b=2.077).
12
EC226 (Term 1: Handout 1) 4 IMPORTANT QUESTIONS TO BE ANSWERED
13
EC226 (Term 1: Handout 1) 4 IMPORTANT QUESTIONS TO BE ANSWERED
We start the world off again and for the same values of xi (again) we observe some new random
shocks (ε3i ) and this yields some new values realisations of y (y
3
i ) according to equation (12). These
new data are plotted in Figure 9c. Applying OLS to this new data yields the dashed line, which
compared to the true regression line (solid line), yields an intercept which is too large (a=1.570) and
a slope which is too small (b=1.738).
We start the world off again and for the same values of xi (again) we observe some new random
shocks (ε4i ) and this yields some new values realisations of y (y
4
i ) according to equation (12). These
new data are plotted in Figure 9d. Applying OLS to this new data yields the dashed line, which
compared to the true regression line (solid line), yields an intercept which is too large (a=1.315) and
a slope which is too small (b=1.825).
14
EC226 (Term 1: Handout 1) 4 IMPORTANT QUESTIONS TO BE ANSWERED
Imagine we start the world off 1000 times, we would get 1000 estimates of a and 1000 estimates of b.
We plot the distribution of b as a histogram (Figure 10), we see that the mean of the distribution of
b is 2.007 (≃ β and is unbiased).
4.2.2 Variances
Firstly looking at the variance of the slope coefficient, b,
V (b|x) = E[(b|x− E(b|x))2] = E[(b|x− β)2] = E
[
(
∑
i=1
ωiεi|x)
2
]
(13)
V (b|x) =E[(ω1ε1 + ω2ε2 + ...+ ωnεn|x)2]
=E(ω21ε
2
1 + ω
2
2ε
2
2 + . . .+ ω
2
nε
2
n
+ 2ω1ω2ε1ε2 + 2ω1ω3ε1ε3 + · · ·+ 2ω1ωnε1εn
+ 2ω2ω3ε2ε3 + . . .+ 2ω2ωnε2εn + . . .+ 2ωn−1ωnεn−1εn|x)
As ω2i is made up of xi, (then E(ω
2
i ε
2
i |xi) = ω2iE(ε2i |xi), consequently:
V (b) =ω21E(ε
2
1|x) + ω22E(ε22|x) + . . .+ ω2nE(ε2n|x)
+ 2ω1ω2E(ε1ε2|x) + 2ω1ω3E(ε1ε3|x) + . . .+ 2ω1ωnE(ε1εn|x)
+ 2ω2ω3E(ε2ε3|x) + . . .+ 2ω2ωnE(ε2εn|x) + . . .+ 2ωn−1ωnE(εn−1εn|x)
As V (εi|x) = E(ε2i |x) = σ2 and cov(εi, εj |x) = E(εiεj |x) = 0, i ̸= j, all of the cross-product terms
are zero and we have
V (b|x) = ω21σ2 + ω22σ2 + . . .+ ω2nσ2 = σ2
∑
i=1
ω2i =
σ2∑
i=1
(xi − x)2
(14)
15
EC226 (Term 1: Handout 1) 4 IMPORTANT QUESTIONS TO BE ANSWERED
Now we want the variance of the intercept, a,
V (a|x) = V (y − bx|x) = V (y|x) + x2V (b|x)− 2xcov(y, b|x)
We know,
V (y|x) = V (α+ βx+ ε|x) = V (ε|x) = σ2/n
and cov(y, b|x) = 0. Therefore,
V (a|x) = σ2
1
n
+
x2∑
i=1
(xi − x)2
= σ2
∑
i=1
x2i − nx2 + nx2
n
∑
i=1
(xi − x)2
= σ2
∑
i=1
x2i
n
∑
i=1
(xi − x)2
(15)
Returning to our example in equation (12) the theoretical variance of b using equation (14) is
σ2
n∑
i=1
(xi−x)2
= 115.083 = 0.065, compared to the estimated variance from the 1000 estimates of b
(plotted in Figure 10) which is V (b)=0.066. Appendix D also shows an empirical calculation of these
variances.
In addition, it is possible to work out the cov(a, b|x) = − σ2xn∑
t=1
(xt−x)2
(see Appendix B for a proof of
this).
Note: Information regarding the variances and covariances can be presented in a variance-covariance
matrix as: [
V (a|x) cov(a, b|x)
cov(a, b|x) V (b|x)
]
where cov(a, b|x) = cov(b, a|x) As both b and a are a linear function of the error term, ε, (see
equation (10)) and as ε is normally distributed (by assumption), then both a and b will follow a
normal distribution (as can be seen in Figure 10 the distribution of b is Normal).
b|x ∼ N
β, σ2n∑
i=1
(xt − x)2
,
a|x ∼ N
α, σ2
∑
i=1
x2i
n
∑
i=1
(xi − x)2
Additionally as the error variance, s2, is the sum of squared normally distributed residuals, then
(n− 2)s2
σ2
∼ χ2n−2
4.3 Hypothesis testing
All hypothesis testing follows a 5-step procedure:
1. H0 : β = β0
2. H0 : β ̸= β0
3. Choose some appropriate significance level of c, and find the corresponding value from the t-
16
EC226 (Term 1: Handout 1) 4 IMPORTANT QUESTIONS TO BE ANSWERED
distribution, denoted −tc/2DoF and tc/2DoF , where DoF is the degrees of freedom of the model, that
is, the number of observation, n, minus the number of restrictions on the residuals, 2, (or number
of estimated parameters in the model, a and b).
4. t = b−β0sb ∼ tDoF , where sb =
√
s2
n∑
i=1
(xi−x)2
= standard error of b (in replacing σ2 by s2 in the
standard error of b, we are essentially scaling a N(0,1) by a
√
χ2DoF /DoF and this yields a tDoF
distribution.
5. If t is either less than −tc/2DoF , or greater than tc/2DoF , then we have observed an event which occurs
with a probability of less than c and should therefore reject H0. The decision rule is: Reject H0
if t = | b−β0sb | > t
c/2
DoF ; Do not reject H0 if t = | b−β0sb | < t
c/2
DoF .
4.4 Measure of goodness of fit
Does the estimated OLS model fit the data well? OLS minimises the RSS, this means that the
estimates, a and b produce a smaller RSS than any alternative pair of estimates. But is the model
“good” at explaining movements in yi? Our OLS regression has:
yi︸︷︷︸
Actual values
= a+ bxi︸ ︷︷ ︸
yˆi
+ ei︸︷︷︸
Residuals
taking averages
y = yˆ + e︸︷︷︸
0
⇒ y = yˆ
and subtracting these two equations we have:
yi − y = (yˆi − yˆ) + ei
squaring both sides we get:
(yi − y)2 = (yˆi − yˆ)2 + e2i + 2(yˆi − yˆ)ei
Now taking sums we have
n∑
i=1
(yi − y)2︸ ︷︷ ︸
TSS
=
n∑
i=1
(yˆi − yˆ)2︸ ︷︷ ︸
ESS
+
n∑
i=1
e2i︸ ︷︷ ︸
RSS
+2
n∑
i=1
(yˆi − yˆ)ei︸ ︷︷ ︸
0
The last term is zero, as from equation (5) we have
n∑
i=1
xiei = 0 in which case:
n∑
i=1
yˆiei. Consequently,
we have: TSS=ESS+RSS, where TSS = Total sum of squares (the total amount of variation in
the dependent variable, y), ESS = Explained sum of squares (the amount of variation the model
explained, i.e. the amount of variation in yˆ and RSS= Residual sum of squares (amount of variation
the model did NOT explain. i..e the amount of variation in e.
Our measure of goodness of fit, is the coefficient of determination or R2 and this is calculated as:
R2 = 1− RSS
TSS
=
ESS
TSS
17
EC226 (Term 1: Handout 1) 4 IMPORTANT QUESTIONS TO BE ANSWERED
and so it can be interpreted as the proportion of the total variation in the dependent variable that
the model (xi) can explain.
Use of R2
1. Goodness of fit measure for simple regression model
2. It can be used to compare (choose) between different linear models (as long as the dependent
variable is identical in all models).
3. It is NOT correct to say that a model with a “high” R2 is a good model.
(see Appendix D to see a Stata output, which is labelled and Appendix E for an example calculation
of R2).
4.5 Model adequacy
A better way to detect model adequacy than the use of R2 is to ensure that the residuals, ei, are
consistent with the assumptions that you make about the disturbance term, εi. That is, we would
require that:
1. Testing the the error term is unrelated to xi (CLRM assumption 1)
2. The residuals have a constant error variance (CLRM assumption 2)
3. The residuals are serially uncorrelated (CLRM assumption 3)
4. The residuals are normally distributed (CLRM assumption 4)
We will look at these issues later in the module.
4.6 Forecasting/Prediction
One important use of regression is the computation of predictions, or forecasts, for the dependent
variable, conditional on an assumed value for the explanatory variable. Suppose that the explanatory
variable is equal to xn+1 and that the linear relationship continues to hold, then:
yn+1 = α+ βxn+1 + εn+1
Often we are interested in forecasting the actual value that will result for yn+1. Our estimator is
written:
yˆn+1 = a+ bxn+1
and we can think of defining the prediction/forecast error as en+1 = yn+1 − yˆn+1. Where
E(en+1|x) = E(yn+1 − yˆn+1|x) = E[(α− a) + (β − b)xn+1 + εn+1|x]
E(en+1|x) = E(yn+1 − yˆn+1|x) = E(α− a|x) + xn+1E(β − b|x) + E(εn+1|x) = 0
The variance of the prediction error V (en+1|x) = V (yn+1 − yˆn+1|x) is:
V (en+1|x) = V (yn+1 − yˆn+1|x) = σ2
1 + 1n + (xn+1 − x)2n∑
t=1
(xt − x)2
18
EC226 (Term 1: Handout 1) 4 IMPORTANT QUESTIONS TO BE ANSWERED
(see Appendix C).
Given that εn+1|x ∼ N(0, σ2), we can write that
en+1|x ∼ N
0, σ2
1 + 1n + (xn+1 − x)2n∑
t=1
(xt − x)2
and as σ2 is unknown and as we use of s2 in its place, this means the distribution will be: en+1−0
Vˆ (en+1)
∼
tn−2 and hence
P (−tc/2n−2 ≤
yn+1 − yˆn+1√
Vˆ (en+1)
≤ tc/2n−2) = 1− c⇒ P (−tc/2n−2
√
Vˆ (en+1) ≤ yn+1 − yˆn+1 ≤ tc/2n−2
√
Vˆ (en+1))
Or arranging further:
P (yˆn+1 − tc/2n−2
√
Vˆ (en+1) ≤ yn+1 ≤ yˆn+1 + tc/2n−2
√
Vˆ (en+1)) = 1− c
i.e. the (1− c)% CI for yn+1 can be written:
yˆn+1 ± tc/2n−2
√√√√√√√s2
1 + 1n + (xn+1 − x)2n∑
i=1
(xi − x)2
and the CI is wider the further xn+1 is from x. Appendix D works through an example in which we
calculate the 95% confidence interval for our forecast of yn+1.
19
EC226 (Term 1: Handout 1) 5 INTERPRETING COEFFICIENTS
5 Interpreting coefficients
OLS is appropriate for all model that are linear in parameters and each of the following models is
linear in parameters (although the model maybe non-linear in the variables y and x):
1. Given the model yi = α+ βxi + εi, we have:
∂yi
∂xi
= β =
Change in yi
1 ↑ in xi
2. Given the model yi = α+ β ln(xi) + εi, we have:
∂yi
∂xi
= β =
Change in yi
1 ↑ in ln(xi)
Digression
∂ ln(x)
∂x
=
1
x
⇒ ∂ ln(x) = ∂x
x
= Proportionate change in x
Alternatively, define the growth rate of the variable x as g,
g =
x+ − x
x
=
x+
x
− 1⇒ 1 + g = x
+
x
Now taking nature logs of both sides we get: ln(1+ g) = ln(xt/xt−1) = ln(x+)− ln(x) ≡ ∆ ln(x) ≈ g
as:
g ln(1 + g) g ln(1 + g) g ln(1 + g)
0.01 0.0099 0.10 0.0953 -0.03 -0.0305
0.02 0.0198 0.20 0.1823 -0.06 -0.0619
0.03 0.0296 -0.01 -0.0101 -0.10 -0.1054
0.06 0.0583 -0.02 -0.0202 -0.20 -0.2231
But what is a unit increase in ln(x)?
ln(x+/x) = 1⇒ x+/x = exp(1) = 2.718⇒ g = 1.718
∂yi
∂ ln(xi)
= β =
Change in yi
171.8% ↑ in xi
However, if we increase ln(x) by 0.01 then
β
100
=
Change in yi
1% ↑ in xi
Alternatively consider our equation for some value x0
E(y0|x0) = α+ β ln(x0)
Which then increases to x1
E(y1|x1) = α+ β ln(x1)
20
EC226 (Term 1: Handout 1) 5 INTERPRETING COEFFICIENTS
Subtracting these equations we have:
E(y1|x1)− E(y0|x0) = β ln(x1)− β ln(x0) = β ln(x1/x0)
And hence βln(x1/x0) = βg is the expected change in y when
g =
x1 − x0
x0
=
x1
x0
− 1⇒ 1 + g = x1
x0
⇒ ln(x1/x0) = ln(1 + g) = g
(if g is small).
3 Given the model ln(yi) = α+ βxi + εi, we have
∂ ln(yi)
∂xi
= β ≈ Proportionate change in yi
1 ↑ in xi
and
100β ≈ % change in yi
1 ↑ in xi
or
100(exp(β)− 1) = % change in yi
1 ↑ in xi
3 Given the model ln(yi) = α+ β ln(xi) + εi, we have
∂ ln(yi)
∂ ln(xi)
= β =
Proportionate change in yi
1 proportionate ↑ in xi =
% change in yi
1% ↑ in xi
21
EC226 (Term 1: Handout 1) A RULES ON EXPECTATIONS AND VARIANCES
A Rules on expectations and variances
Define E(w) =
k∑
i=1
piwi as the expected value of the random variable w and V (w) = E[w − E(w)]2 =
E(w2)− E(w)2 =
k∑
i=1
pi(wi − E(w))2. Then:
1. E(a+ x) =
k∑
i=1
pi(a+ xi) = a+
k∑
i=1
pixi = a+ E(x).
2. E(ax) =
k∑
i=1
pi(axi) = a
k∑
i=1
pixi = aE(x).
3. V (a+X) =
k∑
i=1
pi[(a+ xi)− E(a+X)]2 =
k∑
i=1
pi[(a+ xi)− a− E(X)]2
=
k∑
i=1
pi[xi − E(X)]2 = V (X)
4. V (aX) =
k∑
i=1
pi[axi − E(aX)]2 =
k∑
i=1
pi[axi − aE(X)]2
= a2
k∑
i=1
pi[xi − E(X)]2 = a2V (X)
5. cov(a+ x, y) = cov(x, y)
6. cov(ax, y) = acov(x, y).
7. V (x+ y) = V (x) + V (y) + 2cov(x, y).
8. V (x− y) = V (x) + V (y)− 2cov(x, y).
9. V (x+ y + z) = V (x) + V (y) + V (z) + 2cov(x, y) + 2cov(x, z) + 2cov(y, z).
10. V (x− y − z) = V (x) + V (y) + V (z)− 2cov(x, y)− 2cov(x, z) + 2cov(y, z).
22
EC226 (Term 1: Handout 1) B CALCULATING THE COVARIANCE BETWEEN a AND b
B Calculating the covariance between a and b
cov(a, b) = E [(a− E(a))(b− E(b))] = E [(a− α)(b− β)]
Now from equation (6) we have a = y−bx and from equation (11) we have b = β+∑
i=1
ωiεi. Therefore:
b− β =
∑
i=1
ωiεi
and given
y = α+ βx+ ε
we can rewrite equation (6) our equation for a as
a = α+ βx+ ε− bx = α− (b− β)x
This implies:
a− α = −(b− β)x
In which case, we can write:
E [(a− α)(b− β)] = E [−(b− β)x(b− β)] = −xE
[
(b− β)2
]
E [(a− α)(b− β)] = −xE
(∑
i=1
ωiεi
)2
From equation (13) and (14) we know:
E
(∑
i=1
ωiεi
)2 = σ2∑
i=1
(xi − x)2
in which case:
cov(a, b) = E [(a− α)(b− β)] = −xσ
2∑
i=1
(xi − x)2
.
23
EC226 (Term 1: Handout 1) C VARIANCE FOR yn+1 IN 2-VARIABLE MODEL
C Variance for yn+1 in 2-variable model
(yn+1 − yˆn+1) =V (yn+1) + V (yˆn+1)− 2cov(yn+1, yˆn+1)
=V (α+ βxn+1 + εn+1) + V (yˆn+1)− 2cov(α+ βxn+1 + εn+1, yˆn+1)
the last term is zero as (α+ βxn+1) is a constant and the εn+1 is unrelated to everything.
Additionally, we know that
V (α+ βxn+1 + εn+1) = V (εn+1) = σ
2
In which case:
V (yn+1 − yˆn+1) =σ2 + V (
yˆn+1︷ ︸︸ ︷
a+ bxn+1) = σ
2 + [V (a) + x2n+1V (b) + 2xn+1cov(a, b)]
=σ2 +
σ2
∑
i
x2i /n
n∑
i=1
(xi − x)2
+
σ2x2n+1
n∑
i=1
(xi − x)2
− 2 σ
2xn+1x
n∑
i=1
(xi − x)2
Taking out the common factor of σ
2
n∑
i=1
(xi−x)2
from the three terms we get:
V (yn+1 − yˆn+1) = σ2 + σ
2
n∑
i=1
(xi − x)2
[
x2n+1 − 2xn+1x+
∑
i
x2i /n
]
Adding in (and subtracting out) the term x
2
n∑
i=1
(xi−x)2
, we get:
V (yn+1 − yˆn+1) = σ2 + σ
2
n∑
i=1
(xi − x)2
[
x2n+1 − 2xn+1x+ x2 − x2 +
∑
i=1
x2i /n
]
The first three terms in [] can be written more succinctly as (xn+1 − x)2 and the last 2 terms are
1
n
n∑
i=1
(xi − x)2 so we can write the V (yn+1 − yˆn+1) as:
V (yn+1 − yˆn+1) = σ2 + σ
2
n∑
i=1
(xi − x)2
(xn+1 − x)2 +
n∑
i=1
(xi − x)2
n
= σ2
1 + 1n + (xn+1 − x)2n∑
i=1
(xi − x)2
24
EC226 (Term 1: Handout 1) D STATA OUTPUT FOR TWO-VARIABLE REGRESSION
D Stata output for two-variable regression
25
EC226 (Term 1: Handout 1) E TWO VARIABLE REGRESSION EXAMPLE
E Two variable regression example
Given the data on the variables x and y:
x 7 14 8 10 11 6 14
y 27 41 27 32 34 26 38
(a) Plot y against x.
(b) Obtain the least squares regression line for a regression of y on x.
(c) Calculate the residuals from the estimated regression and check that∑
i=1
ei = 0
and ∑
i=1
xiei = 0
.
(d) Calculate R2, standard error of the regression and the standard error of the coefficient estimates.
(e) Test the hypothesis that the slope coefficient is zero.
(f) Given x8 = 15, obtain a prediction for y8. Obtain a 95% confidence interval for y8.
Answer
(a)
(b) ∑
i
xi = 70,
∑
i
yi = 225,
∑
i
x2i = 762,
∑
i
y2i = 7439,
∑
i
xiyi = 2361
b =
∑
i
xiyi − nxy∑
i
x2i − nx2
=
2361− 7(10)(32.14)
762− 7(100) = 1.79
a = 32.14− 1.79(10) = 14.24
(c)
et = yt − (a+ bxt) = yt − (14.24 + 1.790xt)
1 2 3 4 5 6 7 Σ
e 0.2281 1.6959 -1.5622 -0.1429 0.0668 1.0184 -1.3042 -0.0001
x× e 1.5968 23.742 -12.498 -1.4286 0.7350 6.1106 -18.258 -0.0002
(Sum not exactly zero due to rounding in coefficient estimates for a and b).
(d) RSS = 8.131; R2 = 1− RSSTSS = 1− 8.131206.857 = 0.961; s = σˆ =
√
8.131
5 = 1.275
se(b) =
√√√√ σˆ2∑
i
x2i − nx2
=
1.275√
62
= 0.162
26
EC226 (Term 1: Handout 1) E TWO VARIABLE REGRESSION EXAMPLE
se(a) =
√√√√√√
σˆ2
∑
i
x2i
n
(∑
i
x2i − nx2
) =√1.2752(762)
7(62)
= 1.689
(e) H0 : β = 0, H0 : β ̸= 0 t0.0255 = ±2.571
t =
1.790− 0
0.162
= 11.05⇒ Reject H0
(f) yˆ8 = 14.24 + 1.79(15) = 41.09
V (y) = s2
1 + 1
n
+
(xn+1 − x)2∑
i
(xi − x)2
= 1.626[1 + 1
7
+
(15− 10)2
62
]
= 2.514
y8 ∈ (41.09± 2.571
√
2.514)⇒ y8 ∈ (37.01, 45.17)
27