Egwald's popular web pages are provided without cost to users. Please show your support by joining Egwald Web Services as a Facebook Fan:
Follow Elmer Wiens on Twitter:
The same problem expressed in matrix and vector form is:
A *x = b
where A = [a_{i, j}] is the matrix of coefficients, x^{T} = (x_{1}, x_{2}, . . . . x_{n}) is the vector of unknowns, and b^{T} = (b_{1}, b_{1}, . . . . b_{m}) is the vector of RHS (right hand side) coefficients.
If all the coefficients of the vector b are zero, the system A * x = b is called a homogeneous system.
Otherwise, the system is called nonhomogeneous.
Example: Nonhomogeneous System.
A =
6
2
2
4
3
5
1
6
4
x =
x_{1}
x_{2}
x_{3}
b =
6
4
3
The columns (and rows) of the matrix A are linearly independent, as confirmed by the following diagram, where I use the standard (x, y, z) coordinate system.
Please note that I distinguish between the axis x, and the vector x^{T} = (x_{1}, x_{2}, x_{3}). Also, note that while the vector x^{T} is a row vector, and the vector x is a column vector, the diagrams below do not distinguish between column and row vectors.
Graphically, the matrix A, its column vectors, and the column vector b can be represented
Since the columns of A spanR^{3}, one can express any three dimensional vector, such as b in the example, as a linear combination of the columns of A. The solution to the linear equation problem, the vector
x^{T} = (x_{1}, x_{2}, x_{3}) = (
1, 1, 1)
provides the coefficients of the linear combination of the columns of A that yield b.
A
*
x
=
b
6
2
2
4
3
5
1
6
4
*
1
1
1
=
6
4
3
Example: Homogeneous System.
Since the columns of A span R^{3}, with the RHS vector b^{T} = (0, 0, 0) the only solution is x^{T} = (0, 0, 0), called the zero solution.
Any homogeneous system, (ie a homogeneous system with arbitrary matrix A and b^{T} = o^{T} the zero vector), always has at least one solution, the zero or trivial solution where x^{T} = o^{T}.
However, if the columns of A are not linearly independent, its homogeneous system has nontrivial solutions.
General Solution.
If x_{p} is a particular solution of the nonhomogeneous system, and x_{h} is a solution of the homogeneous system, then:
A * (x_{p} + x_{h}) = b + o = b,
proving that (x_{p} + x_{h}) is also a solution of the nonhomogeneous system.
The Image of a Matrix.
Let A be an m by n matrix. Write A = {a_{1}, a_{2}, ... a_{n}} as the set of m by 1 column vectors of A, ie A = [a_{1}, a_{2}, ... a_{n}].
The image of A, im(A), is the span of the columns of A. Formally:
im(A) = span(A) = {y in R^{m}  y = A * x, for some x in R^{n}.
Sometimes this is referred to as the column space of the matrix A, denoted by R(A).
The Kernel of a Matrix A.
The kernel of A, ker(A), is the set of all vectors x in R^{n} for which A * x = o, the zero vector in R^{m}. Formally:
ker(A) = {x in R^{n}  A * x = 0 in R^{m}}
Sometimes this is referred to as the null space of the matrix A, denoted by N(A).
The matrix A, and its transpose matrix A^{T} (represented by A' in the diagrams below) generate four important subspaces: im(A), ker(A), im(A^{T}), and ker(A^{T}).
Example: General Solution with Linearly Dependent Columns (and Rows).
The matrix of the next linear equation problem has linearly dependent columns (and rows).
A
*
x
=
b
8
2
3
2
4
6
1
2
3
*
x_{1 }
x_{2 }
x_{3 }
=
4
6
3
The following diagram displays this linear equation problem. The image of the matrix A, Im(A), is the two dimensional yellow subspace of R^{3} generated by the columns of A.
Because the vector b^{T} = (4, 6, 3) lies within the Im(A), it can be expressed as a linear combination of the columns of A. The particular solution to the linear equation problems is x_{p}^{T} = (1, 2, 0), since:
A
*
x_{p}
=
b
8
2
3
2
4
6
1
2
3
*
1
2
0
=
4
6
3
The ker(A) is the line (also a subspace) through the vector x_{h}^{T} = (0, 6, 4), since x_{h} is a solution of the homogeneous system for the matrix A:
A
*
x_{h}
=
o
8
2
3
2
4
6
1
2
3
*
0
6
4
=
0
0
0
Now let us look at A^{T}, the transpose matrix of the matrix A.
The image of the matrix A^{T}, Im(A^{T}), is the two dimensional yellow subspace of R^{3} generated by the rows of A (the columns of A^{T}).
As suggested by the diagram, ker(A) — the line through the vector x_{h} — is perpendicular to the Im(A^{T}) subspace.
Similarly, the kernel of the matrix A^{T}, ker(A^{T}) is the line through the vector y_{h}^{T} = (0, 3, 6), since y_{h} is a solution of the homogeneous system for the matrix A^{T}:
A^{T}
*
y_{h}
=
o
8
2
1
2
4
2
3
6
3
*
0
3
6
=
0
0
0
The subspace ker(A^{T}) is perpendicular to the subspace Im(A), as one can see two diagrams above.
Example: General Solution.
The general solution to the linear equation problem A * x = b is x = x_{p} + s * x_{h} for any scalar s. If x is defined by:
x
=
x_{p}
+ s *
x_{h}
x_{1}
x_{2}
x_{3}
=
1
2
0
+ s *
0
6
4
then
A * x = A * x_{p} + s * A * x_{h} = b + s * o = b.
Example: Inconsistent System.
Consider the system with the same matrix A as in the example above, but with a RHS vector b that lies outside of the Im(A).
A
*
x
=
b
8
2
3
2
4
6
1
2
3
*
x_{1 }
x_{2 }
x_{3 }
=
4
6
3
The following diagram displays this linear equation problem. The image of the matrix A, Im(A), is the same two dimensional yellow subspace of R^{3} generated by the columns of A.
However, the vector b^{T} = (4, 6, 3) lies outside the Im(A) subspace. Therefore, it cannot be expressed as a linear combination of the columns of A. Thus, this linear equation problem has no particular solution, although its homogeneous system has solutions consisting of each vector on the line through the vector x_{h}^{T} = (0, 6, 4).
Solutions: Inconsistent System.
In the diagram above, I projected the vector b^{T} = (4, 6, 3) onto the Im(A) subspace, yielding the vector p^{T} = (4, 3.6, 1.8). Recall that the point of the vector p is the closest point on Im(A) to the point of the vector b. Consequently, the vector (b  p) is perpendicular to the vector p. To verify this, note that the dot productp^{T} * (b  p) = (4, 3.6, 1.8)^{T} * (0, 2.4, 4.8) = (0, 0, 0)^{T}.
A case can be made that the vector p^{T} = (4, 3.6, 1.8) provides a "solution" to the inconsistent system A * x = b, if one solves the system A * x = p, instead.
In leastsquares problems, with more equations than unknowns, the solution provided by the algorithms is the vector x such that A * x = p. That is A * x equals the projection of the vector b onto Im(A).
Writing the matrix A = [a_{1}, a_{2}, a_{3}], one can see that a_{3} = 1.5 * a_{2}, ie a_{2} and a_{3} are linearly dependent. Dropping the vector a_{2} yields the leastsquares problem B * z = bgiven by:
B
*
z
=
b
8
3
2
6
1
3
*
z_{1 }
z_{2 }
=
4
6
3
The leastsquares solution is z^{T} = (0.8286, 0.8762) since
B
*
z
=
p
8
3
2
6
1
3
*
0.82857
0.87619
=
4
3.6
1.8
A leastsquares problem always has a solution for B * z = p, since p is in the Im(B). Furthermore, if the columns of B are linearly independent, the solution vector z is unique.
To get back to the inconsistent system, write w^{T} = (0.82857, 0, 0.87619). Then, w is a solution of A * x = p since
A
*
w
=
p
8
2
3
2
4
6
1
2
3
*
0.82857
0
0.87619
=
4
3.6
1.8
However, the vector x = w + s * x_{h}, where s is a scalar and x_{h} is a solution of A * x = o, given by:
x
=
w
+ s *
x_{h}
x_{1}
x_{2}
x_{3}
=
0.8286
0
0.8762
+ s *
0
6
4
is also a solution to A * x = p, because A * (w + s * x_{h}) = A * w + s * A * x_{h} = p + s * o = p.
If small changes in the coefficients of the matrix A of a leastsquares problem produce a matrix Â with linearly dependent columns, the columns of A are "almost" linearly dependent. The estimates of the coefficients of the solution vector are tentative, since small changes in the data matrix A result in a system with a matrix Â having many solutions.
Solution: Inconsistent System  Minimum Length.
One might choose as "optimal" the solution vector x^{T} = (x_{1}, x_{2}, x_{3})^{T} = w + s * x_{h} with minimum length. The length (norm) of x = norm(w + s * x_{h}) is:
The singular value decomposition algorithm provides x^{+} = A^{+} * b = (0.8286, 0.4044, 0.6066), where A^{+} is the pseudoinverse of A.
Echelon Algorithm.
The Echelon Algorithm is similar to the GaussJordan method and the Gaussian method. The Echelon Algorithm permits one to calculate the general solution of a system of linear equations having more columns than rows. Begin by writing the matrix A and the vector b side by side as shown in Tableau^{0} below. This matrix arrangement is called the augmented matrix.
Using the following row operations on the tableaux of the augmented matrix, convert the left hand matrix to echelon form.
1. exchange two rows to get the largest entry (in absolute value) onto the diagonal.
2. divide a row by the diagonal entry.
3. add a multiple of one row to another row.
Begin in the upper left hand corner of the left hand matrix. Proceed along the diagonal converting each column into the appropriate column of the identity matrix as shown in the sequence of tableaux. Exchange rows with the lower row that has the largest entry (in absolute value) for the column. If all entries on and below the diagonal of the column are zeros, proceed to the next column.
Tableau^{0}
A
b
R^{0}_{1}
R^{0}_{2}
R^{0}_{3}
6
12
1
5
4
8
10
6
1
2
3
2
7
14
4
Tableau^{1}
Ã
ß
R^{1}_{1} = R^{0}_{1} /
6
R^{1}_{2} = R^{0}_{2}  (4) * R^{1}_{1}
R^{1}_{3} = R^{0}_{3}  (1) * R^{1}_{1}
1
2
0.1667
0.8333
0
0
9.3333
9.3333
0
0
2.8333
2.8333
1.1667
9.3333
2.8333
Tableau^{2}
Ã
ß
R^{2}_{1} = R^{1}_{1}  (0.1667) * R^{2}_{2}
R^{2}_{2} = R^{1}_{2} /
9.3333
R^{2}_{3} = R^{1}_{3}  (2.8333) * R^{2}_{2}
1
2
0
1
0
0
1
1
0
0
0
0
1
1
0
The last tableau contains the information one needs to obtain the general solution to the original linear equation problem A * x = b.
Echelon Tableau
Ã
*
x
=
ß
1
2
0
1
0
0
1
1
0
0
0
0
*
x_{1 }
x_{2 }
x_{3 }
x_{4 }
=
1
1
0
The solution to the Echelon Tableau yields the particular solution, x_{p}, and the entire set of solutions of the associated homogeneous system, ie all vectors in the Ker(A).
Vectors in the Ker(A) are scalar multiples of the basis
vectors, x_{h1}, x_{h2}, of the Ker(A) subspace.
x
=
x_{p}
+ s1 *
x_{h1}
+ s2 *
x_{h2}
x_{1}
x_{2}
x_{3}
x_{4}
=
1
0
1
0
+ s1 *
2
1
0
0
+ s2 *
1
0
1
1
Use the form below to solve some other linear equation problem.
Specify the dimensions of A (m <= n, 2 <= m,n) and fill in / change the numbers in the upper left hand corner, and then click "Solve".
Linear Equations Echelon Algorithm.
If you add a constant to an entry of the b vector, the system might be inconsistent. If you add a constant to an entry of a basis vector of Ker(A), you can create a matrix with more linearly independent columns.
Singular Value Decomposition Algorithm.
Consider the same linear equation problem A*x = b solved with the Echelon Algorithm.
A
*
x
=
b
6
12
1
5
4
8
10
6
1
2
3
2
*
x_{1 }
x_{2 }
x_{3 }
x_{4 }
=
7
14
4
Using singular value decomposition yields the orthogonal matrices U and V, and the diagonal matrix S of singular values of A:
A * x = U * S * V^{T} * x = b,
U
S
V^{T}
x
=
b
0.658
0.7523
0.0342
0.7268
0.6225
0.2903
0.1971
0.2158
0.9563
17.9819
0
0
0
0
10.8005
0
0
0
0
0
0
0.3922
0.7844
0.4737
0.0815
0.1674
0.3348
0.5667
0.734
0.9045
0.402
0.1005
0.1005
0
0.3333
0.6667
0.6667
x_{1}
x_{2}
x_{3}
x_{4}
=
=
=
=
7
14
4
Since the matrix S has 2 positive singular values, the rank of the matrix is 2. Furthermore, the columns of U and V provide orthonormal bases for the following subspaces:
Subspace
Orthonormal Bases
Im(A)
first 2 columns of U
Ker(A^{T})
last 1 columns of U
Im(A^{T})
first 2 columns of V
Ker(A)
last 2 columns of V
The Pseudoinverse is A^{+} = V * S^{+} * U^{T}
A^{+}
=
V
*
S^{+}
*
U^{T}
0.026
0.0062
0.001
0.052
0.0124
0.0019
0.0221
0.0518
0.0165
0.0481
0.0456
0.0156
=
0.3922
0.1674
0.9045
0
0.7844
0.3348
0.402
0.3333
0.4737
0.5667
0.1005
0.6667
0.0815
0.734
0.1005
0.6667
*
0.0556
0
0
0
0.0926
0
0
0
0
0
0
0
*
0.658
0.7268
0.1971
0.7523
0.6225
0.2158
0.0342
0.2903
0.9563
Both the columns and rows of A are linearly dependent. Setting x^{+} = A^{+} * b yields p = A * x^{+}, the projection of b onto the Im(A) subspace.
x^{+} = A^{+} * b
x^{+}
=
A^{+}
* b
=
x_{1}
x_{2}
x_{3}
x_{4}
=
=
=
=
0.026
0.0062
0.001
0.052
0.0124
0.0019
0.0221
0.0518
0.0165
0.0481
0.0456
0.0156
7
14
4
=
=
=
=
0.2727
0.5455
0.6364
0.3636
p = A * x^{+}
p
=
A
* x^{+}
=
p_{1}
p_{2}
p_{3}
=
=
=
6
12
1
5
4
8
10
6
1
2
3
2
0.2727
0.5455
0.6364
0.3636
=
=
=
7
14
4
The RHS vector, b, lies inside the Im(A). Consequently, the system A * x = b is consistent. Therefore, p = b.
Overdetermined Systems With # Rows m >= # Columns n
Suppose one wants to derive a linear equation model of the form A * x = b from a data set with a design (data) matrix A having more rows than columns. The columns of A and the column vector b represent measurable factors of the phenomena of interest. Each row of A represents one observation of the measured factors of the phenomena.
Since the matrix A has more rows than columns, one expects that A * x = b will be inconsistent, ie that b will lie outside the Im(A). Consequently, b will not be represented exactly as a linear combination of the columns of A.
I discuss least squares methods, also called multiple regression methods, on the Statistical web pages. Here I derive methods that statisticians and econometricians use to evaluate the least square (multiple regression) estimates for overdetermined systems of linear equations.
Least Squares (Multiple Regression) Solution.
With an inconsistent system, the least squares solution obtains a vector x^{+} from the data such that the vector p given by:
p = A * x^{+}
is the projection of the vector b onto Im(A).
To be the projection of the vector b onto Im(A), p must be the vector in Im(A) that minimizes the distance between p and b, ie:
As the number of observations increases, one hopes that ones understanding of the phenomena improves. This could be the case if the least squares estimated values of each parameter of the vector x^{+} converge as the number of observations increases.
Furthermore, one might deduce that a model of the form:
To reduce the dimensions of the system, multiply the equation A * x = b by the matrix A^{T}, the transpose of A. The resulting equations are called the normal equations:
A^{T} * A * x = A^{T} * b.
Example.
As an experimenter, I want to summarize the data set with an equation in R^{3} space with standard coordinates (x, y, z):
z = x^{+}_{1} * x + x^{+}_{2} * y
described by the linear equations system A * x = b. The matrix A has two columns representing the independent variables (factors) of my model, x and y. The RHS vector b represents the dependent variable, z. I want to estimate the coefficients x^{+}_{1} and x^{+}_{2} using least squares. Perhaps, I want to use the equation to predict values of the dependent variable z for as yet unknown values of the independent variables x and y.
A
*
x
=
b
5.3
7
0.5
6
2.1
5.1
4.6
5.5
4.2
2.8
2.2
6.1
1.9
4.7
3.7
4
*
x_{1 }
x_{2 }
=
6.75
3.55
1.8
0.85
4.2
2.25
2.7
1.95
The normal equations system A^{T} * A * x = A^{T} * b is:
A^{T}
*
A
*
x
=
A^{T}
*
b
5.3
0.5
2.1
4.6
4.2
2.2
1.9
3.7
7
6
5.1
5.5
2.8
6.1
4.7
4
*
5.3
7
0.5
6
2.1
5.1
4.6
5.5
4.2
2.8
2.2
6.1
1.9
4.7
3.7
4
*
x_{1 }
x_{2 }
=
5.3
0.5
2.1
4.6
4.2
2.2
1.9
3.7
7
6
5.1
5.5
2.8
6.1
4.7
4
*
6.75
3.55
1.8
0.85
4.2
2.25
2.7
1.95
Multiplying by the matrix A^{T} yields the reduced system with 2 equations and 2 unknowns:
(A^{T}*A)
*
x
=
(A^{T}*b)
93.69
50.56
50.56
224.4
*
x_{1 }
x_{2 }
=
66.365
119.03
The matrix (A^{T}*A) is symmetric. If the data matrix A has linearly independent columns, (A^{T}*A) has an inverse. The normal equations provide a system with a square coefficient matrix, solvable by standard methods.
Proceeding with the inverse of (A^{T}*A):
x^{+}
=
(A^{T}*A)^{1}
*
(A^{T}*b)
=
x^{+}_{1 }
x^{+}_{2 }
=
0.0122
0.0027
0.0027
0.0051
*
66.365
119.03
=
0.4805
0.4222
The least squares solution using the normal equations is x^{+} = (
0.4805, 0.4222 ).
The projection of b onto Im(A) is p = A * x^{+}. Comparing p and b and computing b  p:
a_{1}
a_{2}
b

p
=
(bp)
1
2
3
4
5
6
7
8
5.3
0.5
2.1
4.6
4.2
2.2
1.9
3.7
7
6
5.1
5.5
2.8
6.1
4.7
4
6.75
3.55
1.8
0.85
4.2
2.25
2.7
1.95

5.502
2.2928
3.1622
0.1115
3.2003
3.6324
1.0712
3.4666
=
1.248
1.2572
1.3622
0.9615
0.9997
1.3824
1.6288
1.5166
The distance from Im(A) to b — the length of the vector (bp) — is 3.712, found by computing the square root of the sums of the squares of the (bp) column. (Im(A) is a subspace of R^{8}, while b and p are vectors in R^{8})
For each row i of the data matrix A = [a_{1}, a_{2}], the triples (a_{i,1}, a_{i,2}, b_{i}) and (a_{i,1}, a_{i,2}, p_{i}) are points in R^{3} with standard coordinates (x, y, z). Furthermore, the least squares solution can be represented by the plane:
z = 0.4805 * x + 0.4222 * y
The plane is displayed below. If the point (a_{i,1}, a_{i,2}, b_{i}) lies above the plane, a blue line joins it to the point (a_{i,1}, a_{i,2}, p_{i}) on the plane. If (a_{i,1}, a_{i,2}, b_{i}) lies below the plane, a red line joins it to (a_{i,1}, a_{i,2}, p_{i}).
By clicking the refresh button on your browser, you can generate a new system of linear equations, with the new solution with least squares parameters x^{+} = (x^{+}_{1}, x^{+}_{2}) displayed above.
The functional form,
z = x^{+}_{1} * x + x^{+}_{2} * y,
ensures that the solution plane passes through the origin.
In the example of the next section, the estimated solution plane is not restricted to pass through the origin.
Singular Value Decomposition and Least Squares.
The "normal equations" least squares method of solving systems of linear equations computes the matrix (A^{T} * A). For systems with many observations, the data matrix A has many rows. Multiplying two matrices using a computer incurs the risk of roundoff errors distorting the components of the matrix (A^{T} * A) from their true values. Furthermore, the matrix (A^{T} * A) may be illconditioned if the columns of A are nearly linearly dependent. Experimenters often include as many factors as possible in their design matrix A, hoping to improve the "explanatory power" of their model. The probability that subtle inter dependencies among the columns of A will render A nearly linearly dependent increases with the number of included factors.
The singular value decomposition method controls for such errors. Firstly, it controls roundoff errors by decomposing the data matrix A using Householder and Jacobi transformations, known for their stability. This decomposition is obtained without computing (A^{T} * A). Secondly, singular value decomposition controls for linear dependence (and roundoff error) by setting very small singular values to zero.
Example.
The matrix A has two columns representing the independent variables (factors) of my model, x and y. The RHS vector b represents the dependent variable, z. As an experimenter, I want to summarize the data set with an equation in R^{3} space with standard coordinates (x, y, z):
z = x^{+}_{1} + x^{+}_{2} * x + x^{+}_{3} * y
described by the linear equations system A * x = b. I need to estimate three coefficients x^{+}_{1}, x^{+}_{2}, and x^{+}_{3} since I do not require the solution plane to pass through the origin. To obtain a zintercept coefficient given by x^{+}_{1}, I append a column of ones to the front of the matrix A
A
*
x
=
b
1
6.1
2.4
1
2.4
8
1
7.7
5.7
1
1.2
4.5
1
2.9
0.2
1
1
4.4
1
7.9
7.3
1
5.9
1.6
*
x_{1 }
x_{2 }
x_{3 }
=
0.55
2.7
7.2
2.35
1.15
2.1
10.9
2.25
Singular value decomposition yields the orthogonal matrices U and V, and the diagonal matrix S of singular values of A:
A * x = U * S * V^{T} * x = b,
U
S
V^{T}
x
=
b
0.15
0.6785
0.2656
0.4007
0.4723
0.1853
0.5245
0.1287
0.2789
0.2166
0.2665
0.4706
0.1249
0.2029
0.3823
0.2085
0.2879
0.2649
0.5895
0.0277
0.4939
0.2974
0.3227
0.3682
18.1793
0
0
0
8.8438
0
0
0
2.7164
0.033
0.7237
0.6893
0.0617
0.6898
0.7213
0.9975
0.0187
0.0674
x_{1}
x_{2}
x_{3}
=
=
=
0.55
2.7
7.2
2.35
1.15
2.1
10.9
2.25
Since the matrix S has 3 positive singular values, the rank of the matrix A is 3. Computing A^{+} = V * S^{+} * U^{T}, the least squares solution to A * x = b is:
x^{+} = A^{+} * b.
x^{+}
=
A^{+}
* b
=
x_{1}
x_{2}
x_{3}
=
=
=
0.102
0.0721
0.1025
0.1706
0.1392
0.0997
0.1805
0.1335
0.0571
0.0196
0.0328
0.0154
0.0234
0.0123
0.0222
0.0395
0.0431
0.0491
0.0025
0.0416
0.0213
0.0248
0.0323
0.0242
0.55
2.7
7.2
2.35
1.15
2.1
10.9
2.25
=
=
=
0.8224
0.511
0.5503
The least squares solution using singular value decomposition is x^{+} = (
0.8224, 0.511, 0.5503 ).
The projection of b onto Im(A) is p = A * x^{+}. Comparing p and b and computing b  p:
a_{1}
a_{2}
a_{3}
b

p
=
(bp)
1
2
3
4
5
6
7
8
1
1
1
1
1
1
1
1
6.1
2.4
7.7
1.2
2.9
1
7.9
5.9
2.4
8
5.7
4.5
0.2
4.4
7.3
1.6
0.55
2.7
7.2
2.35
1.15
2.1
10.9
2.25

2.619
4.8061
6.2488
3.9118
0.7695
2.1097
8.8763
3.0729
=
2.069
2.1061
0.9512
1.5618
0.3805
0.0097
2.0237
0.8229
The distance from Im(A) to b — the length of the vector (bp) — is 4.12, found by computing the square root of the sums of the squares of the (bp) column. (Im(A) is a subspace of R^{8}, while b and p are vectors in R^{8})
For each row i of the data matrix A = [a_{1}, a_{2}, a_{3}], the triples (a_{i,2}, a_{i,3}, b_{i}) and (a_{i,2}, a_{i,3}, p_{i}) are points in R^{3} with standard coordinates (x, y, z). Furthermore, the least squares solution can be represented by the plane:
z = 0.8224 + 0.511 * x + 0.5503 * y
The plane is displayed below. If the point (a_{i,2}, a_{i,3}, b_{i}) lies above the plane, a blue line joins it to the point (a_{i,2}, a_{i,3}, p_{i}) on the plane. If (a_{i,2}, a_{i,3}, b_{i}) lies below the plane, a red line joins it to (a_{i,2}, a_{i,3}, p_{i}).
By clicking the refresh button on your browser, you can generate a new system of linear equations, with the new solution with least squares parameters x^{+} = (x^{+}_{1}, x^{+}_{2}, x^{+}_{3}) displayed above.
A Square Matrix.
If A is a square matrix of dimension n, the system of linear equations problem has of n equations in n unknowns. If the rank of A, rank(A), equals n, its columns and rows are linearly independent and the problem has a unique solution.
The variables x = [x_{i}] that solve the system of linear equations A * x = b can be obtained by various methods.
You can adjust the matrix A and the RHS vector b in the form below to see how the different methods solve the problem. The available methods are based on the techniques for decomposing a matrix that I discussed on the matrices web page. These methods are the A inverse solution using the GaussJordan method, P*A = L*U solution using the Gaussian method, A = Q*R solution using Householder transformations, and the svd solution using singular value decomposition.
The GaussJordan Inverse Method
The GausssJordan inverse method uses row operations on the initial tableau [ A  I ] to convert the RHS matrix to the identity matrix I and the LHS matrix to the inverse matrix A^{1}.
Solve A * x = b with A^{1}.
x
=
A^{1}
* b
=
x_{1}
x_{2}
x_{3}
x_{4}
x_{5}
=
=
=
=
=
2
3
1
0
0
3
6
4
1
0
1
4
6
4
1
0
1
4
6
2.8
0
0
1
2.8
1.64
310
325
356
405
475
=
=
=
=
=
1
1
1
1
1
Solution vector x^{T}
1
1
1
1
1
The Gaussian Decomposition Method
The Gaussian decomposition method factors the matrix A into the product of a lower triangular matrix L and an upper triangular matrix U, keeping track of row switches to construct a permutation matrix P so that P * A = L * U.
Solve A * x = b using P * A * x = L * U * x = P * b.
L
U
x
=
P
b
1
0
0
0
0
0.7333
1
0
0
0
0.7467
0.65
1
0
0
0.7867
0.35
0.5385
1
0
0.8667
0.125
0.1923
0.3571
1
75
80
90
105
125
0
2.6667
7
12
16.6667
0
0
0.65
1.6
2.5
0
0
0
0.5385
1.1538
0
0
0
0
0.3571
x_{1}
x_{2}
x_{3}
x_{4}
x_{5}
=
=
=
=
=
0
0
0
0
1
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
310
325
356
405
475
The solution vector x is obtained in two steps. First, solve for y in L * y = P * b by forward substitution:
L
y
=
P * b
1
0
0
0
0
0.7333
1
0
0
0
0.7467
0.65
1
0
0
0.7867
0.35
0.5385
1
0
0.8667
0.125
0.1923
0.3571
1
y_{1}
y_{2}
y_{3}
y_{4}
y_{5}
=
=
=
=
=
475
310
325
356
405
Solution vector y^{T}
475
38.33333
4.75
1.69231
0.35714
Next, solve for x in U * x =y by backward substitution:
U
x
=
y
75
80
90
105
125
0
2.6667
7
12
16.6667
0
0
0.65
1.6
2.5
0
0
0
0.5385
1.1538
0
0
0
0
0.3571
x_{1}
x_{2}
x_{3}
x_{4}
x_{5}
=
=
=
=
=
475
38.3333
4.75
1.6923
0.3571
Solution vector x^{T}
1
1
1
1
1
The Householder QR Method
The Householder decomposition method factors the matrix A into the product of an orthonormal matrix Q and an upper triangular matrix R, such that A = Q * R.
Solve A * x = b using A * x = Q * R * x = b.
Q
R
x
=
b
0.3939
0.73
0.5585
0
0
0.4011
0.3366
0.7228
0.4509
0
0.4226
0.0152
0.2782
0.8106
0.2945
0.4656
0.3046
0.0697
0.0751
0.8246
0.5372
0.5107
0.2886
0.3661
0.483
139.6138
146.626
161.0443
183.6639
215.7022
0
2.4143
6.513
11.2295
15.5124
0
0
0.5585
1.5112
2.4155
0
0
0
0.4509
0.993
0
0
0
0
0.2945
x_{1}
x_{2}
x_{3}
x_{4}
x_{5}
=
=
=
=
=
310
325
356
405
475
Multiplying by Q^{T} gives R * x = Q^{T} * b.
R
x
=
Q^{T} * b
139.6138
146.626
161.0443
183.6639
215.7022
0
2.4143
6.513
11.2295
15.5124
0
0
0.5585
1.5112
2.4155
0
0
0
0.4509
0.993
0
0
0
0
0.2945
x_{1}
x_{2}
x_{3}
x_{4}
x_{5}
=
=
=
=
=
846.6501
35.6692
4.4853
1.444
0.2945
These equations are solved for x by back substitution yielding:
Solution vector x^{T}
1
1
1
1
1
The Singular Value Decomposition Method
The singular value decomposition method factors the matrix A into the product A = U * S* V^{T}, where U consists of the eigenvectors of A * A^{T}, V consists of the eigenvectors of A^{T} * A, and S consists of the square roots of the eigenvalues of A * A^{T} (which equal the eigenvalues of A^{T} * A).
Solve A * x = b using A * x = U * S * V^{T} * x = b.
U
S
V^{T}
x
=
b
0.3631
0.642
0.5234
0.3778
0.1982
0.3816
0.4404
0.1746
0.6017
0.5176
0.4196
0.1006
0.6398
0.0213
0.6356
0.4791
0.2708
0.2518
0.6143
0.5063
0.5629
0.5572
0.472
0.3427
0.1801
384.0674
0
0
0
0
0
10.1448
0
0
0
0
0
0.5623
0
0
0
0
0
0.1488
0
0
0
0
0
0.0767
0.3631
0.3816
0.4196
0.4791
0.5629
0.642
0.4404
0.1006
0.2708
0.5572
0.5234
0.1746
0.6398
0.2518
0.472
0.3778
0.6017
0.0213
0.6143
0.3427
0.1982
0.5176
0.6356
0.5063
0.1801
x_{1}
x_{2}
x_{3}
x_{4}
x_{5}
=
=
=
=
=
310
325
356
405
475
Since the matrix S has 5 positive singular values, the rank of the matrix A is 5. Computing A^{+} = V * S^{+} * U^{T}, the solution to A * x = b is:
x
=
A^{+}
* b
=
x_{1}
x_{2}
x_{3}
x_{4}
x_{5}
=
=
=
=
=
2
3
1
0
0
3
6
4
1
0
1
4
6
4
1
0
1
4
6
2.8
0
0
1
2.8
1.64
310
325
356
405
475
=
=
=
=
=
1
1
1
1
1
Solution vector x^{T}
1
1
1
1
1
References.
Ayres, Frank Jr. Matrices. New York: Schaum McGrawHill, 1962.
Ayres, Frank Jr. Modern Algebra. New York: Schaum McGrawHill 1965.
Bretscher, Otto. Linear Algebra with Applications. Upper Saddle River: Prentice Hall, 1997.
Burden, Richard L. and J. Douglas Faires. Numerical Analysis. 6th ed. Pacific Grove: Brooks/Cole, 1997.
Cohn, P. M. Linear Equations. London: Routledge, 1964.
Demmel, James W. Applied Numerical Linear Algebra. Philadelphia: Siam, 1997.
Dowling, Edward T. Mathematics for Economists. New York: Schaum McGrawHill, 1980.
Lipschutz, Seymour. Linear Algebra. New York: Schaum McGrawHill, 1968.
Mathews, John H. and Kurtis D. Fink. Numerical Methods Using MATLAB. 3rd ed. Upper Saddle River: Prentice Hall, 1999.
Press, William H., Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling. Numerical Recipes: The Art of Scientific Computing. Cambridge: Cambridge UP, 1989.
Strang, Gilbert. Linear Algebra and Its Applications. 3d ed. San Diego: Harcourt, 1976.
Varah, James. Numerical Linear Algebra: Computer Science 402 Lecture Notes. Vancouver: University of B.C., 2000.
Watkins, David S. Fundamentals of Matrix Computations. New York: John Wiley, 1991.