ME3255-001-无代写|学霸联盟

ME3255-001-无代写

时间：2023-04-25

Lecture 10 – Least Squares
Julia´n Norato
Associate Professor
Department of Mechanical Engineering
University of Connecticut
julian.norato@uconn.edu
ME3255-001 – Computational Mechanics
Spring 2023
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 1 / 20
Outline
1 Data Fitting
2 The Least Squares Problem
3 Linear Regression
4 Error Estimates
5 General Linear Least-Squares
6 Nonlinear Problems
7 NumPy / SciPy
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 2 / 20
Data Fitting
Data fitting: Find relation between F and v from data points.
For this example, if we assume F = cdv
2, find the coefficient of drag cd
that “best fits the data”.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 3 / 20
Data Fitting
Suppose we have n data points (ai, bi), and we want to fit a
quadratic polynomial bi = x0 + x1ai + x2a
2
i to these data points.
Ideally, if the quadratic polynomial fits perfectly the data, we would
have
x0 + x1a1 + x2a
2
1 = b1
x0 + x1a2 + x2a
2
2 = b2
...
...
x0 + x1an + x2a
2
n = bn
We can write these equations in matrix form as
1 a1 a
2
1
1 a2 a
2
2
...
...
1 an a
2
n

︸︷︷︸
An×m

x0
x1
x2
︸︷︷︸
xm×1
=

b1
b2
...
bn
︸︷︷︸
bn×1
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 4 / 20
Data Fitting
Suppose we have n data points (ai, bi), and we want to fit a
quadratic polynomial bi = x0 + x1ai + x2a
2
i to these data points.
Ideally, if the quadratic polynomial fits perfectly the data, we would
have
x0 + x1a1 + x2a
2
1 = b1
x0 + x1a2 + x2a
2
2 = b2
...
...
x0 + x1an + x2a
2
n = bn
We can write these equations in matrix form as
1 a1 a
2
1
1 a2 a
2
2
...
...
1 an a
2
n

︸︷︷︸
An×m

x0
x1
x2
︸︷︷︸
xm×1
=

b1
b2
...
bn
︸︷︷︸
bn×1
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 4 / 20
Data Fitting
Suppose we have n data points (ai, bi), and we want to fit a
quadratic polynomial bi = x0 + x1ai + x2a
2
i to these data points.
Ideally, if the quadratic polynomial fits perfectly the data, we would
have
x0 + x1a1 + x2a
2
1 = b1
x0 + x1a2 + x2a
2
2 = b2
...
...
x0 + x1an + x2a
2
n = bn
We can write these equations in matrix form as
1 a1 a
2
1
1 a2 a
2
2
...
...
1 an a
2
n

︸︷︷︸
An×m

x0
x1
x2
︸︷︷︸
xm×1
=

b1
b2
...
bn
︸︷︷︸
bn×1
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 4 / 20
Data Fitting

1 a1 a
2
1
1 a2 a
2
2
...
...
1 an a
2
n

︸︷︷︸
An×m

x0
x1
x2
︸︷︷︸
xm×1
=

b1
b2
...
bn
︸︷︷︸
bn×1
Note that in this notation, the x’s are the unknown coefficients of the
equation, whereas the components of the matrix A and of the right-hand
side vector b are evaluated from the data points (ai, bi).
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 5 / 20
Data Fitting
Note the important differences with systems of linear equations:
We are finding one equation that best fits multiple data points,
instead of one point that satisfies multiple equations.
The unknowns are the coefficients of the equation instead of the
values of a data point.
Typically, there are more data points than coefficients (n > m), and
so the system is overdetermined.
Consequently, the matrix A is rectangular.
The equation may be nonlinear in the variables ai, yet it is linear in
the unknown coefficients x.
In general, there is no unique solution that satisfies all equations, so
we must find the ‘best solution’.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 6 / 20
Data Fitting
Note the important differences with systems of linear equations:
We are finding one equation that best fits multiple data points,
instead of one point that satisfies multiple equations.
The unknowns are the coefficients of the equation instead of the
values of a data point.
Typically, there are more data points than coefficients (n > m), and
so the system is overdetermined.
Consequently, the matrix A is rectangular.
The equation may be nonlinear in the variables ai, yet it is linear in
the unknown coefficients x.
In general, there is no unique solution that satisfies all equations, so
we must find the ‘best solution’.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 6 / 20
Data Fitting
Note the important differences with systems of linear equations:
We are finding one equation that best fits multiple data points,
instead of one point that satisfies multiple equations.
The unknowns are the coefficients of the equation instead of the
values of a data point.
Typically, there are more data points than coefficients (n > m), and
so the system is overdetermined.
Consequently, the matrix A is rectangular.
The equation may be nonlinear in the variables ai, yet it is linear in
the unknown coefficients x.
In general, there is no unique solution that satisfies all equations, so
we must find the ‘best solution’.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 6 / 20
Data Fitting
Note the important differences with systems of linear equations:
We are finding one equation that best fits multiple data points,
instead of one point that satisfies multiple equations.
The unknowns are the coefficients of the equation instead of the
values of a data point.
Typically, there are more data points than coefficients (n > m), and
so the system is overdetermined.
Consequently, the matrix A is rectangular.
The equation may be nonlinear in the variables ai, yet it is linear in
the unknown coefficients x.
In general, there is no unique solution that satisfies all equations, so
we must find the ‘best solution’.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 6 / 20
Data Fitting
Note the important differences with systems of linear equations:
We are finding one equation that best fits multiple data points,
instead of one point that satisfies multiple equations.
The unknowns are the coefficients of the equation instead of the
values of a data point.
Typically, there are more data points than coefficients (n > m), and
so the system is overdetermined.
Consequently, the matrix A is rectangular.
The equation may be nonlinear in the variables ai, yet it is linear in
the unknown coefficients x.
In general, there is no unique solution that satisfies all equations, so
we must find the ‘best solution’.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 6 / 20
Data Fitting
Note the important differences with systems of linear equations:
We are finding one equation that best fits multiple data points,
instead of one point that satisfies multiple equations.
The unknowns are the coefficients of the equation instead of the
values of a data point.
Typically, there are more data points than coefficients (n > m), and
so the system is overdetermined.
Consequently, the matrix A is rectangular.
The equation may be nonlinear in the variables ai, yet it is linear in
the unknown coefficients x.
In general, there is no unique solution that satisfies all equations, so
we must find the ‘best solution’.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 6 / 20
The Least Squares Problem
In finding the ‘best’ solution to Ax = b, what is a good definition of
‘best’?
Consider the residual
r := Ax− b
If the fit were perfect, then ‖r‖ = 0 (in any norm).
Therefore, we can define the goodness of a fit in terms of a norm of
the residual.
In particular, we wish to minimize the square of the Euclidean
norm of r:
min
x
f(x) := ‖r‖22
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 7 / 20
The Least Squares Problem
In finding the ‘best’ solution to Ax = b, what is a good definition of
‘best’?
Consider the residual
r := Ax− b
If the fit were perfect, then ‖r‖ = 0 (in any norm).
Therefore, we can define the goodness of a fit in terms of a norm of
the residual.
In particular, we wish to minimize the square of the Euclidean
norm of r:
min
x
f(x) := ‖r‖22
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 7 / 20
The Least Squares Problem
In finding the ‘best’ solution to Ax = b, what is a good definition of
‘best’?
Consider the residual
r := Ax− b
If the fit were perfect, then ‖r‖ = 0 (in any norm).
Therefore, we can define the goodness of a fit in terms of a norm of
the residual.
In particular, we wish to minimize the square of the Euclidean
norm of r:
min
x
f(x) := ‖r‖22
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 7 / 20
The Least Squares Problem
In finding the ‘best’ solution to Ax = b, what is a good definition of
‘best’?
Consider the residual
r := Ax− b
If the fit were perfect, then ‖r‖ = 0 (in any norm).
Therefore, we can define the goodness of a fit in terms of a norm of
the residual.
In particular, we wish to minimize the square of the Euclidean
norm of r:
min
x
f(x) := ‖r‖22
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 7 / 20
The Least Squares Problem
In finding the ‘best’ solution to Ax = b, what is a good definition of
‘best’?
Consider the residual
r := Ax− b
If the fit were perfect, then ‖r‖ = 0 (in any norm).
Therefore, we can define the goodness of a fit in terms of a norm of
the residual.
In particular, we wish to minimize the square of the Euclidean
norm of r:
min
x
f(x) := ‖r‖22
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 7 / 20
The Least Squares Problem
Recall the first-order necessary optimality condition for a scalar
function of a scalar f : R→ R : f ′(x) = 0.
Recall also the first-order Taylor series approximation of a scalar
function of a vector f : Rn → R :
f(x+ ∆x) ≈ f(x) +∇f(x)T∆x
Therefore, the first-order necessary optimality condition for a scalar
function of a vector is
∇f(x) = 0
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 8 / 20
The Least Squares Problem
Recall the first-order necessary optimality condition for a scalar
function of a scalar f : R→ R : f ′(x) = 0.
Recall also the first-order Taylor series approximation of a scalar
function of a vector f : Rn → R :
f(x+ ∆x) ≈ f(x) +∇f(x)T∆x
Therefore, the first-order necessary optimality condition for a scalar
function of a vector is
∇f(x) = 0
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 8 / 20
The Least Squares Problem
Recall the first-order necessary optimality condition for a scalar
function of a scalar f : R→ R : f ′(x) = 0.
Recall also the first-order Taylor series approximation of a scalar
function of a vector f : Rn → R :
f(x+ ∆x) ≈ f(x) +∇f(x)T∆x
Therefore, the first-order necessary optimality condition for a scalar
function of a vector is
∇f(x) = 0
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 8 / 20
The Least Squares Problem
Hence, if we want to minimize f(x) = ‖r‖22, we need to ensure
∇f(x) = 0.
We can readily show that this condition is satisfied when the so-called
normal equations are satisfied:
ATAx = ATb
Note that the system above is square.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 9 / 20
The Least Squares Problem
Hence, if we want to minimize f(x) = ‖r‖22, we need to ensure
∇f(x) = 0.
We can readily show that this condition is satisfied when the so-called
normal equations are satisfied:
ATAx = ATb
Note that the system above is square.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 9 / 20
The Least Squares Problem
Hence, if we want to minimize f(x) = ‖r‖22, we need to ensure
∇f(x) = 0.
We can readily show that this condition is satisfied when the so-called
normal equations are satisfied:
ATAx = ATb
Note that the system above is square.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 9 / 20
The Least Squares Problem
Hence, if we want to minimize f(x) = ‖r‖22, we need to ensure
∇f(x) = 0.
We can readily show that this condition is satisfied when the so-called
normal equations are satisfied:
ATAx = ATb
Note that the system above is square.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 9 / 20
Linear Regression
In the case where we are trying to fit a straight line to the data (i.e. we
only have two coefficients x0 and xi, so that bi = x0 + xiai), we can show
that the normal equations reduce to
x1 =
n
∑
aibi −
∑
ai
∑
bi
n
∑
a2i − (
∑
ai)2
x0 = b¯− x1a¯
where all the sums run from i = 1 to n, and b¯ and a¯ are the mean values
of bi and ai respectively.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 10 / 20
Linear Regression
Example
Fit a straight line to the values in the following table:
i ai bi
1 10 25
2 20 70
3 30 380
4 40 550
5 50 610
6 60 1220
7 70 830
8 80 1450
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 11 / 20
Descriptive Statistics
Recall the following statistics of a discrete set of n values yi:
Definition
Mean:
y¯ :=
∑
yi
n
Standard deviation:
sy :=
√∑
(yi − y¯)2
n− 1
Variance:
s2y :=
∑
(yi − y¯)2
n− 1 =
∑
y2i − (
∑
yi)
2 /n
n− 1
All sums run from i = 1 to n. The mean has n and the standard deviation
and variance have n− 1 degrees of freedom.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 12 / 20
Error Estimates
In the case of regression to a straight line, we have:
Definition
Assuming the components of the residual vector are of similar magnitude
and that the distribution of points around the regression curve is normal,
then we define the standard error estimate as
sy/x :=
√
‖r‖22
n− 2
Note that this estimate has n− 2 degrees of freedom since we
estimated two coefficients (slope and intercept).
Also note that sy/x has the same units as y.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 13 / 20
Error Estimates
In the case of regression to a straight line, we have:
Definition
Assuming the components of the residual vector are of similar magnitude
and that the distribution of points around the regression curve is normal,
then we define the standard error estimate as
sy/x :=
√
‖r‖22
n− 2
Note that this estimate has n− 2 degrees of freedom since we
estimated two coefficients (slope and intercept).
Also note that sy/x has the same units as y.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 13 / 20
Error Estimates
Note that the standard error estimate sy/x quantifies the
spread around the regression line (b), whereas the standard deviation sy
quantifies the spread around the mean (a):
If sy/x < sy, there is an improvement in reducing the spread and the linear
regression model ‘has merit’.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 14 / 20
Error Estimates
In the case of regression to a straight line, we have:
Definition
A normalized measure of error is the coefficient of determination
r2 :=
∑
(yi − y¯)2 − ‖r‖22∑
(yi − y¯)2
with r =
√
r2 called the correlation coefficient.
For a perfect fit, ‖r‖22 = 0 and consequently r = 1.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 15 / 20
Error Estimates
In the case of regression to a straight line, we have:
Definition
A normalized measure of error is the coefficient of determination
r2 :=
∑
(yi − y¯)2 − ‖r‖22∑
(yi − y¯)2
with r =
√
r2 called the correlation coefficient.
For a perfect fit, ‖r‖22 = 0 and consequently r = 1.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 15 / 20
Error Estimates
Example
Find the standard deviation sy, standard error estimate sy/x, coefficient of
determination r2 and correlation coefficient r for the previous example:
i ai bi
1 10 25
2 20 70
3 30 380
4 40 550
5 50 610
6 60 1220
7 70 830
8 80 1450
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 16 / 20
General Linear Least-Squares
Let us consider the more general case where we want to fit a
polynomial of order m− 1 to n data points (ai, bi),
i.e. bi = x0 + x1ai + x2a
2
i + . . .+ xma
m−1
i . The ensuing system
Ax = b looks like
1 a1 a
2
1 · · · am−11
1 a2 a
2
2 · · · am−12
...
...
. . .
...
1 an a
2
n · · · am−1n

︸︷︷︸
An×m

x0
x1
x2
...
xm
︸︷︷︸
xm×1
=

b1
b2
b3
...
bn
︸︷︷︸
bn×1
The matrix A above is called the Vandermonde matrix.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 17 / 20
General Linear Least-Squares
Possible strategy: use Cholesky factorization to solve the square
normal equation ATAx = ATb
ATA, which is called the cross-product matrix, is symmetric.
Note that solution strategy in this case is: rectangular → square →
triangular.
Note, however, that ATA can be very ill-conditioned, even when the
residual is small and there is a good fit. Therefore, other robust
methods (but less efficient) methods like QR-factorization are used.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 18 / 20
General Linear Least-Squares
Possible strategy: use Cholesky factorization to solve the square
normal equation ATAx = ATb
ATA, which is called the cross-product matrix, is symmetric.
Note that solution strategy in this case is: rectangular → square →
triangular.
Note, however, that ATA can be very ill-conditioned, even when the
residual is small and there is a good fit. Therefore, other robust
methods (but less efficient) methods like QR-factorization are used.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 18 / 20
General Linear Least-Squares
Possible strategy: use Cholesky factorization to solve the square
normal equation ATAx = ATb
ATA, which is called the cross-product matrix, is symmetric.
Note that solution strategy in this case is: rectangular → square →
triangular.
Note, however, that ATA can be very ill-conditioned, even when the
residual is small and there is a good fit. Therefore, other robust
methods (but less efficient) methods like QR-factorization are used.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 18 / 20
General Linear Least-Squares
Possible strategy: use Cholesky factorization to solve the square
normal equation ATAx = ATb
ATA, which is called the cross-product matrix, is symmetric.
Note that solution strategy in this case is: rectangular → square →
triangular.
Note, however, that ATA can be very ill-conditioned, even when the
residual is small and there is a good fit. Therefore, other robust
methods (but less efficient) methods like QR-factorization are used.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 18 / 20
Nonlinear Problems
Some nonlinear models can be solved using linear least-squares by
using transformations:
Exponential:
y = α1e
β1x → ln y = lnα1 + β1x
Power:
y = α2x
β2 → log y = logα2 + β2 log x
Saturation-growth-rate:
y = α3
x
β3 + x
→ 1
y
=
1
α3
+
β3
α3
1
x
When it is not possible to linearize a problem (with respect to the
unknown coefficients) and write it as Ax = b, we must use
multi-dimensional optimization to minimize ‖r‖22.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 19 / 20
Nonlinear Problems
Some nonlinear models can be solved using linear least-squares by
using transformations:
Exponential:
y = α1e
β1x → ln y = lnα1 + β1x
Power:
y = α2x
β2 → log y = logα2 + β2 log x
Saturation-growth-rate:
y = α3
x
β3 + x
→ 1
y
=
1
α3
+
β3
α3
1
x
When it is not possible to linearize a problem (with respect to the
unknown coefficients) and write it as Ax = b, we must use
multi-dimensional optimization to minimize ‖r‖22.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 19 / 20
Nonlinear Problems
Some nonlinear models can be solved using linear least-squares by
using transformations:
Exponential:
y = α1e
β1x → ln y = lnα1 + β1x
Power:
y = α2x
β2 → log y = logα2 + β2 log x
Saturation-growth-rate:
y = α3
x
β3 + x
→ 1
y
=
1
α3
+
β3
α3
1
x
When it is not possible to linearize a problem (with respect to the
unknown coefficients) and write it as Ax = b, we must use
multi-dimensional optimization to minimize ‖r‖22.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 19 / 20
Nonlinear Problems
Some nonlinear models can be solved using linear least-squares by
using transformations:
Exponential:
y = α1e
β1x → ln y = lnα1 + β1x
Power:
y = α2x
β2 → log y = logα2 + β2 log x
Saturation-growth-rate:
y = α3
x
β3 + x
→ 1
y
=
1
α3
+
β3
α3
1
x
When it is not possible to linearize a problem (with respect to the
unknown coefficients) and write it as Ax = b, we must use
multi-dimensional optimization to minimize ‖r‖22.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 19 / 20
Nonlinear Problems
Some nonlinear models can be solved using linear least-squares by
using transformations:
Exponential:
y = α1e
β1x → ln y = lnα1 + β1x
Power:
y = α2x
β2 → log y = logα2 + β2 log x
Saturation-growth-rate:
y = α3
x
β3 + x
→ 1
y
=
1
α3
+
β3
α3
1
x
When it is not possible to linearize a problem (with respect to the
unknown coefficients) and write it as Ax = b, we must use
multi-dimensional optimization to minimize ‖r‖22.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 19 / 20
NumPy / SciPy
In Python, least squares problems can be solved with the following
functions.
Linear LSQ:
numpy.linalg.lstsq
scipy.linalg.lstsq (can impose bounds xi ≤ xi ≤ x¯i)
scipy.optimize.lsq linear
Generalized LSQ:
scipy.optimize.least squares
scipy.optimize.nnls to impose non-negativity constraints on the
variables (xi ≥ 0)
Finally, you can always define the least squares function f(x) := ‖r‖22
and minimize it using an optimization routine such as
scipy.optimize.minimize.
J. Norato (UConn) Least Squares ME3255-001 – Spring 2023 20 / 20