Egwald Statistics - Multiple Regression Formula

Egwald Web Services
Domain Names
Web Site Design

Egwald Statistics: Multiple Regression

Linear and Restricted Multiple Regression

by

Elmer G. Wiens

Egwald's popular web pages are provided without cost to users.
Follow Elmer Wiens on Twitter:

The Regression Model

Let's use some of the notation from the popular statistics textbook:

Weisberg, Sanford. Applied Linear Regression, Second Edition. New York: Wiley, 1985.

Let X be the mxn matrix of predictors, Y the mx1 vector of responses, and ß the nx1 vector of unknown parameters(coefficients). I think there is a linear relation of the form:

Y = X * ß

If I want an intercept term, the first column of X must all be 1's. Also, I want m > n and X to have full column rank(the columns are linearly independent).

Householder's Algorithm:

Decompose the matrix X into:

X = Q*U

where Q is a mxm orthogonal matrix and U is a mxn matrix with:

the first n rows of U = V, a nxn upper triangular matrix(non-singular);
the last m-n rows of V = W, a (m-n)xn matrix whose entries are all 0's.

Rewrite the regression problem as:

Y = Q*U*ß

U*ß = Q'*Y = b, a mx1 vector.

where the superscript ' means matrix transposition.

Note that Q orthogonal means that Q*Q' = I, a mxm identity matrix.

Let b1 = the first n components of b; b2 = the last m-n components of b.

Taking advantage of the structure of U, write:

V*ß = b1 (*)

The linear problem (*) is equivalent to the original problem. Moreover:

I can solve for ß by back substitution.
The regression problem's residual = ||b2||, the norm of b2.

So I have the main part of the problem. The statistical terms can be calculated as:

Residual sum of squares: RSS = ||b2||^2
Residual mean square(MSRSS): s2 = RSS/(m-n)
Estimated variance of ß: var(ß) = s2 * (V'*V)^(-1) where ^(-1) denotes the inverse of the matrix. Note: var(ß) = VC, a nxn symmetric matrix - the variance-covariance matrix of ß
Standard error of parameter ßi; stderror(i) = sqrt(VC(i,i))
The t-value of parameter ßi: t-val = ßi / stderror(i)
Sum of squares of Y: YY = Y'*Y
Sum of Y: sumY = sum(Y) = sum(Y1,Y2,...,Ym)
Total corrected sum of squares: SYY = YY - sumY*sumY/m
Regression sum of squares: SSreg = SYY - RSS
Coefficient of determination: R2 = SSreg/SYY
Adjusted coefficient of determination: R2b = 1 - (1 -R2)*(m-1)/(m-n)
Regression mean square: MSSreg = SSreg/(m-1)
F-test for regression: compare F = MSSreg / MSRSS with F(n,m-n) distribution

Analysis of Variance
Source Sum of Squares Degrees of
Freedom Mean Square F(n,m-n)

Regression SSreg n-1 MSSreg F
Residual RSS m-n MSRSS
Total SYY m-1

Notes:
1. Do not form the matrix X'*X and then compute its inverse. It takes too much computer time and could be numerically unstable.
2. Do not compute (V'*V)^(-1). Solve for each column of the matrix iV, the inverse matrix of V, by back substitution:

V * iV = I
where I is the nxn identity matrix. Compute iV*iV' if necessary (it is symmetric), or only the diagonal if you just need the standard errors of the parameters.

Restricted least squares:

Having obtained the unrestricted estimate of ß,

with linear restrictions given by:

r = R * ß,

compute the restricted estimate rß as:

Let S = (V'*V)^(-1) * R' * [R * (V'*V)^(-1) * R']^(-1)

then rß = ß + S * [r - R*ß]

The residual = ||Y - X * rß||
Residual sum of squares: RSS = ||Y - X * rß||^2
Residual mean square(MSRSS): s2 = RSS/(m-n)
Estimated variance of rß: var(rß)= s2 * (I - S * R) * (V'*V)^(-1)

Return to the statistics page