|
Linear least squares is a mathematical optimization technique to find an approximate solution for a system of linear equations that has no exact solution. This usually happens if the number of equations is bigger than the number of variables. (See also linear regression.)
Definition
In mathematical terms, we want to find a solution for the "equation"
- <math>A\mathbf{x} \approx \; \mathbf{b}<math>,
where A is an m-by-n matrix (m > n) and x and b are n- resp. m-dimensional column vectors. More precisely, we want to minimize the Euclidean norm squared of the residual Ax − b, that is, the quantity
- <math>\left([A\mathbf{x}]_1-\mathbf{b}_1\right)^2
+\left([A\mathbf{x}]_2-\mathbf{b}_2\right)^2
+\dots+
\left([A\mathbf{x}]_n-\mathbf{b}_n\right)^2
<math>
where <math>[A\mathbf{x}]_i<math> denotes the i-th component of the vector Ax. Hence the name "least squares".
It turns out that an x that minimizes this expression also solves the normal equation
- <math>A^TA\mathbf{x} = A^T\mathbf{b}<math>,
where AT means A transposed. Note that this just corresponds to a usual system of linear equations. The solution is unambigious if the rank of A is n.
Computation
The normal equation can be solved like any other equation system, yet an efficient and numerically stable method can be obtained by first computing the QR decomposition of the matrix A.
Then, with A = QR, where Q is an orthogonal matrix and R is an upper triangular matrix, the normal equation simplifies to
- Rx = QTb.
Applications
The linear least squares method can be used to find an affine function Rn → R that best fits a given set of data (see general least squares method). (It is widely and erroneously thought that the word linear in the term linear regression refers to the linear or affine nature of the fitted function, but see that article.)
We write the linear function we try to find as a 1-by-n matrix xT (so x is actually a column vector, see also linear transformation).
The set of data consists of m (n+1)-tuples (x1, ..., xn, y). Those can be written into an m-by-n matrix A and a vector b, where every tuple corresponds to a row of A, the y becoming the corresponding entry in b.
Then,
- Ax ≈ b
yields the function x we seek.
Example
Consider the points (0, 3), (2, 3), (4, 4), (−1, 2). We seek a solution of the form ax + b = y, that is,
- <math>\begin{pmatrix}x & 1 \end{pmatrix}\begin{pmatrix} a \\ b\end{pmatrix} = y<math>
We can form then the matrix A:
- <math>A=\begin{pmatrix}
0 & 1 \\
2 & 1 \\
4 & 1 \\
-1 & 1 \\ \end{pmatrix}<math>
- <math>A^T=\begin{pmatrix}
0 & 2 & 4 & -1 \\
1 & 1 & 1 & 1 \end{pmatrix}<math>
- <math>A^TA=\begin{pmatrix}
21 & 5 \\
5 & 4 \end{pmatrix}<math>
and the vector b
- <math>\mathbf{b} = \begin{pmatrix}
3 \\
3 \\
4 \\
2 \end{pmatrix}<math>
and then
- <math>A^T\mathbf{b}=\begin{pmatrix}
20 \\
12 \end{pmatrix}<math>
So, the normal equation is
- <math>A^TA\begin{pmatrix} a \\ b\end{pmatrix} = A^T\mathbf{b}<math>
- <math>\begin{pmatrix}
21 & 5 \\
5 & 4 \end{pmatrix}.\begin{pmatrix} a \\ b\end{pmatrix}=\begin{pmatrix}
20 \\
12 \end{pmatrix}<math>
Then,
- <math>(A^TA)^{-1}={1\over 59}\begin{pmatrix}
4 & -5 \\
-5 & 21 \end{pmatrix}<math>
and
- <math>\begin{pmatrix} a \\ b\end{pmatrix}={1\over 59}\begin{pmatrix}
4 & -5 \\
-5 & 21 \end{pmatrix}\begin{pmatrix}
20 \\
12 \end{pmatrix}=\begin{pmatrix}
20/59 \\
152/59 \end{pmatrix}<math>
and the line of best fit is (20/59)x + 152/59 = y.
See also
|