next up previous
Next: References Up: Numerical Analysis for Statisticians Previous: Real numbers: Floating Point

Subsections

Matrices, and their decompositions

Matrices Notations, operations, partitionning, careful with the defaults from one system to another.

The Botany of Matrices Diagonal, Symmetric, Banded, Hessenberg, Circulant, Hankel, Triangular, Sparse, Orthogonal, semi-positive definite,

My notations for: transpose, inner product,norm, matrix norms,

The linear equation solving problem

Suppose we have a linear system of equtions, typified by the matrix of coefficients $A$, unknowns, $x$ and the right hand sides, the vector $b$.

Square systems

Singularity, Degenerate Systems
If the rows are linear combinations of one another, or the columns linear combinations or one another then even if there seems to be the same number of unknowns as equations, the system is in fact equivalent to a singular set of equations.

Close singularity is just as bad because the roundoff could take us there, we will have to measure this.

Single precision is usually enough for 30-50 equations.

Double precision makes it possible to solve for a few hundred.

This does NOT hold for pathological systems we will look into these and how to recognize them when we talk about conditionning.

A full square system
:

Suppose the matrix of coefficients is a square matrix, (of full rank also, as we saw, non-degenerate) then the equation looks like this:

\begin{displaymath}
Ax=b
\end{displaymath}

Formally you would say : invert it, in fact a computational mistake, except if you have multiple right hand sides.

Doing what one would first think of doing if you had to solve a system by hand.

The easiest system: triangular.

Bringing to triangular shape.

Pivotting. This is NOT to solve ill-conditionning, but unlucky diagonal elements that are very small make the algorithm unstable.

LU decomposition, with a permutation matrix (prepivotting) is what is best to do.

\begin{displaymath}A=LU, L\; lower\;triang.\;U\; upper\; triang.\end{displaymath}

Flops: Of an order $\frac{n^3}{2}$ (the same as the cost of the multiplication of two matrices, not so bad).

Condition number for the inversion problem

Try a small perturbation of $b$ to $b^*=b+\delta b$, call the solution of the perturbed system: $x^*=x+\delta x$,

\begin{eqnarray*}
A(x+\delta x)&=&b+\delta b\\
x+\delta x&=&A^{-1}(b+\delta b)\...
...vert \frac{\vert\vert\delta b\vert\vert}{\vert\vert b\vert\vert}
\end{eqnarray*}

The condition number for the data $A$ and the linear solving problem is thus:

\begin{displaymath}
\kappa=\vert\vert A \vert\vert \vert\vert A^{-1} \vert\vert
\end{displaymath}

(depends on the norm, not on the algorithm). For the two norm we have a better explanation of how to compute this, we need the following to do it:

Singular Value Decomposition
This is the most important decomposition in statistics.

We will study it several times, here is my first introduction: a little made-up example with matlab to start with:

>> u=[3 1 -1 2]'
u =
     3
     1
    -1
     2
>> v=(1:4)'
v = 
     1
     2
     3
     4
>> X=u*v'
X =
     3     6     9    12
     1     2     3     4
    -1    -2    -3    -4
     2     4     6     8
>>svd(X)
ans =
   21.2132
    0.0000
         0
         0

>> E=10^(-3)*randn(4);
>> XE=X+E 
XE =
 
    3.0012    5.9993    9.0003   12.0012
    1.0006    2.0017    3.0009    3.9994
   -0.9999   -1.9999   -3.0014   -3.9994
    2.0004    4.0018    5.9993    7.9996

>> svd(XE) 
ans =
   21.2143
    0.0028
    0.0019
    0.0005
>> cond(X)
Condition is infinite
ans =
   Inf 
>> cond(XE)
ans =
 
   4.2567e+04
%An example you can't see with your bare eyes:
>> X2=u'*v 
X2 =
    0.2468    0.2499    0.0166    0.2487
    0.3225    0.3266    0.0218    0.3250
    0.1495    0.1514    0.0101    0.1506
    0.4367    0.4423    0.0295    0.4401
 
>> svd(X2) 
ans =
 
    1.0730
    0.0000
         0
         0
>> [U,S,V]=svd(X2);
>> U*S*V' 
ans =
    0.2468    0.2499    0.0166    0.2487
    0.3225    0.3266    0.0218    0.3250
    0.1495    0.1514    0.0101    0.1506
    0.4367    0.4423    0.0295    0.4401
>> 10000*(X2-U*S*V') 
ans =
    1.0e-11 *
   -0.0278    0.0833   -0.0035    0.0555
   -0.0555    0.0555         0    0.0555
         0    0.0555    0.0017    0.0278
   -0.0555    0.1665         0    0.1110

>> A=rand(4)
A = 
    0.5045    0.4940    0.0737    0.9138
    0.5163    0.2661    0.5007    0.5297
    0.3190    0.0907    0.3841    0.4644
    0.9866    0.9478    0.2771    0.9410
>> flops(0)
>> [L,U,P]=lu(A) 
L =
     1.0000         0         0         0
    0.5233    1.0000         0         0
    0.5114   -0.0406    1.0000         0
    0.3234    0.9388    0.7363    1.0000
U =
     0.9866    0.9478    0.2771    0.9410
         0   -0.2298    0.3557    0.0373
         0         0   -0.0535    0.4342
         0         0         0   -0.1945
>> flops 
ans =
    34 
>> P*A
ans =
    0.9866    0.9478    0.2771    0.9410
    0.5163    0.2661    0.5007    0.5297
    0.5045    0.4940    0.0737    0.9138
    0.3190    0.0907    0.3841    0.4644
>> L*U
ans =
     0.9866    0.9478    0.2771    0.9410
    0.5163    0.2661    0.5007    0.5297
    0.5045    0.4940    0.0737    0.9138
    0.3190    0.0907    0.3841    0.4644
>> P*A-L*U
ans =    1.0e-15 *
         0         0         0         0
   -0.1110         0         0         0
         0         0         0         0
   -0.0555   -0.0139         0         0

Here is what we need to remember:

\begin{displaymath}
X=USV', V'V=I, U'U=I, S\; diagonal\; s_i
\end{displaymath}

Actually the singular values are the square roots of the eigenvalues of $X'X$.

The condition number in $\ell^2$ norm has the simple interpretation as a function of the singular values, ration of the largest to the smallest:

\begin{displaymath}\kappa=\frac{s_1}{s_n}\end{displaymath}

Read 2.6 of NR (pages 61-71).

Tridiagonal

Of an order of $N$ steps are sufficient, see NR 51.

Iterative Improvement

This enables, a the price of storing A, to improve the solution by using the residual. Do the following in double precision:

\begin{eqnarray*}
r=b-Ax^*,solve \;\; Ly&=&Pr\;\; for\;y\; \\
and\; \; Uz&=&y\; for\; z \\
x_{new} &= &x^*+z\\
Ax_{new}&=&A(x^*+z)=(b-r)+Az=b
\end{eqnarray*}

Cholesky Decomposition
For a symmetric positive definite matrix A the LU decomposition can be rewritten:

\begin{displaymath}A=GG'=(LD^{1/2})(D^{1/2}L')\end{displaymath}

with D a diagonal matrix with only positive entries.


Rectangular system: QR decomposition and least squares

When there are more equations than variables, one can't hope for an exact solution, the system is overdetermined.

We try to minimize the error which is usually measured as the norm

\begin{displaymath}\vert\vert Ax-b\vert\vert,\; for\; the\; equation\; Ax=b\end{displaymath}

Important remark: For any orthogonal Q, this norm is unchanged by trnasformation by $Q$, so

\begin{displaymath}\vert\vert Ax-b\vert\vert=\vert\vert Q'Ax-Q'b\vert\vert\end{displaymath}

we try and find a $Q$ that makes the minimization particularly simple, that is so that $Q'A$ has a particularly simple form.

Examples

Least Squares in Two different ways:
>> t = (1900:10:1990)';
>> y = [75.995 91.972 105.711 123.203 131.669 ...
     150.697 179.323 203.212 226.505 249.633]';
>> X=[t ones(10,1)] 
X =     1900           1
        1910           1
        1920           1
        1930           1
        1940           1
        1950           1
        1960           1
        1970           1
        1980           1
        1990           1
>> bhat=X\p
bhat =    1.93
      -3594.01
>>yhat=polyval(bhat,t);
>>[polyval(bhat,t) y]
ans =    67.08         76.00
         86.35         91.97
        105.62        105.71
        124.89        123.20
        144.16        131.67
        163.43        150.70
        182.70        179.32
        201.96        203.21
        221.23        226.50
        240.50        249.63

>> norm(yhat-y)
ans =
   23.5794
>> [Q,RC]=qr([X y]);
RC =
      -6151.30         -3.16       -488.86
             0         -0.05        167.82
             0             0         23.58
             0             0             0
             0             0             0
             0             0             0
             0             0             0
             0             0             0
             0             0             0
             0             0             0
>>  R1=RC(1:2,1:2)
R1 =    -6151.30         -3.16
              0          -0.05
>> C1=RC(1:2,3)
C1 = -488.8643
      167.8181
>> R1\C1
ans =     1.93
      -3594.01
>> C2=RC(3,3)
C2 = 23.5794
>>[Q*[C1;zeros(8,1)] yhat]
ans      67.08         67.08
         86.35         86.35
        105.62        105.62
        124.89        124.89
        144.16        144.16
        163.43        163.43
        182.70        182.70
        201.96        201.96
        221.23        221.23
        240.50        240.50


Givens Rotations

What are Given's rotations good for?

You can use them to zero out individual isolated elements in any matrix, without changing any of the norms of the vectors, these transformations are orthogonal.

They are based on simple $2\times 2$ ones of the form:

\begin{displaymath}
\left [\begin {array}{cc} c&s \noalign{\medskip }-s&c\end {array}
\right ]
\end{displaymath}

You can choose $c$ and $s$ so that a certain vector is orthogonally transformed (actually, rotated) into a vector $y=(y_1,y_2)$ whose second coordinate is zero.

This is of course easy by solving the equations

$\displaystyle c^2+s^2=1$     (2.1)
$\displaystyle y_2=-sx_1+cx_2=0$     (2.2)

The examples that follow show two ways of doing this:

%%%%Van Loan's Function, Chapter 7%%%%%%%%
  function [c,s] = Rotate(x1,x2);
% Pre:
%   x1,x2   scalars
% Post:
%   c,s     c^2+s^2=1 so -s*x1 + c*x2 = 0.
%
   if x2==0
      c = 1;
          s = 0;
   else
      if abs(x2)>=abs(x1)
             cotangent = x1/x2;
                 s = 1/sqrt(1+cotangent^2);
                 c = s*cotangent;
          else
             tangent = x2/x1;
                 c = 1/sqrt(1+tangent^2);
                 s = c*tangent;
          end
   end
%%%%%%%%%%%%%%%The resident function:%%%%%%%
function [G,x] = planerot(x)
%PLANEROT Generate a Givens plane rotation.
%       [G,Y] = PLANEROT(X), where X is a 2-component column vector,
%       returns a 2-by-2 orthogonal matrix G so that Y = G*X has Y(2) = 0.
%       See also QRINSERT, QRDELETE.
if x(2) ~= 0
   r = norm(x);
   G = [x'; -x(2) x(1)]/r;
   x = [r; 0];
else
   G = eye(2);
end


>> x=[2 3]';
>> [G,y]=planerot(x)

G =
          0.55          0.83
         -0.83          0.55
y =
          3.61
             0

Now in order to extend this to a general context you just notice that if you fill out the rest of the matrix with $1'$s on the diagonal, it will not change the other coordinates thus:

\begin{displaymath}
\left [\begin {array}{ccccc} c&0&s&0&0 \noalign{\medskip }...
...}0&0&0&1&0
 \noalign{\medskip }0&0&0&0&1\end {array}\right ]
\end{displaymath}

is a rotation in the (1,3) plane in 5 dimensions. In order to reduce to upper triangular form, one cycles through all the elements that you want to zero out one by one.

\begin{displaymath}
\left [\begin {array}{cccc} \times &\times &\times &\times \...
...align{\medskip }0 &\times &\times &\times \end {array}\right ]
\end{displaymath}

R examples handout

Backsubstitution

backsub=function(X,y){
#Backsubstitution to find solution in a triangular case
l=dim(X)
p=l[2]
b=matrix(0,p,1)
b[p,1]=y[p,1]/X[pp]
for (j in seq(p-1,1,-1){
b[j,1]=(y[j,1]-X[j,(j+1):n]*b[(j+1):n,1])/X[j,j]
}
return(b)}

Householder Functions

Algorithm  (Householder Vector) in R

house= function(x){
#Householder vector calculation
p = length(x);
mu = sqrt(x%*%x)
v = x
if (mu!= 0)
{
beta = x[1] + sign(x[1])*mu

v[2 : p,1] = v[2 : p,1]/beta
}
v[1] = 1
return(v)
}

Algorithm  (Householder Vector) in matlab

function v = house(x);
%Compute a Householder vector
m = length(x);
mu = norm(x,2);
v = x;
if mu ~= 0
       beta = x(1) + sign(x(1))*mu;
      v(2 : m,1) = v(2 : m,1)/beta;
end
v(1) = 1

Householder Multiplication-Row

rowhouse=function(X,v){
beta=-2/(t(v)%*%v)
w=beta*t(X)%*%v
X=X+v%*%t(w)
return(X)}

QR decomposition in R

qr                   package:base                   R Documentation
The QR Decomposition of a Matrix
Description:
     'qr' computes the QR decomposition of a matrix.  It provides an
     interface to the techniques used in the LINPACK routine DQRDC or
     the LAPACK routines DGEQP3 and (for complex matrices) ZGEQP3.
Usage:
     qr(x, tol = 1e-07 , LAPACK = FALSE)
     qr.coef(qr, y)
     qr.qy(qr, y)
     qr.qty(qr, y)
     qr.resid(qr, y)
     qr.fitted(qr, y, k = qr$rank)
     qr.solve(a, b, tol = 1e-7)
     ## S3 method for class 'qr':
     solve(a, b, ...)
     is.qr(x)
     as.qr(x)

Arguments:
       x: a matrix whose QR decomposition is to be computed.
     tol: the tolerance for detecting linear dependencies in the
          columns of 'x'. Only used if 'LAPACK' is false and 'x' is
          real.
      qr: a QR decomposition of the type computed by 'qr'.
    y, b: a vector or matrix of right-hand sides of equations.
       a: A QR decomposition or ('qr.solve' only) a rectangular matrix.
       k: effective rank.
  LAPACK: logical. For real 'x', if true use LAPACK otherwise use
          LINPACK.
     ...: further arguments passed to or from other methods
Details:
     The QR decomposition plays an important role in many statistical
     techniques.  In particular it can be used to solve the equation
     *Ax* = *b* for given matrix *A*, and vector *b*.  It is useful for
     computing regression coefficients and in applying the
     Newton-Raphson algorithm.

     The functions 'qr.coef', 'qr.resid', and 'qr.fitted' return the
     coefficients, residuals and fitted values obtained when fitting
     'y' to the matrix with QR decomposition 'qr'. 'qr.qy' and 'qr.qty'
     return 'Q %*% y' and 't(Q) %*% y', where 'Q' is the (complete) *Q*
     matrix.
     All the above functions keep 'dimnames' (and 'names') of 'x' and
     'y' if there are.
     'solve.qr' is the method for 'solve' for 'qr' objects. 'qr.solve'
     solves systems of equations via the QR decomposition: if 'a' is a
     QR decomposition it is the same as 'solve.qr', but if 'a' is a
     rectangular matrix the QR decomposition is computed first.  Either
     will handle over- and under-determined systems, providing a
     minimal-length solution or a least-squares fit if appropriate.
     'is.qr' returns 'TRUE' if 'x' is a 'list' with components named
     'qr', 'rank' and 'qraux' and 'FALSE' otherwise.

     It is not possible to coerce objects to mode '"qr"'.  Objects
     either are QR decompositions or they are not.
Value:
     The QR decomposition of the matrix as computed by LINPACK or
     LAPACK. The components in the returned value correspond directly
     to the values returned by DQRDC/DGEQP3/ZGEQP3. 

      qr: a matrix with the same dimensions as 'x'. The upper triangle
          contains the *R* of the decomposition and the lower triangle
          contains information on the *Q* of the decomposition (stored
          in compact form).  Note that the storage used by DQRDC and
          DGEQP3 differs.
   qraux: a vector of length 'ncol(x)' which contains additional
          information on *Q*.

    rank: the rank of 'x' as computed by the decomposition: always full
          rank in the LAPACK case.

   pivot: information on the pivoting strategy used during the
          decomposition.
     Non-complex QR objects computed by LAPACK have the attribute
     '"useLAPACK"' with value 'TRUE'.
Note:
     To compute the determinant of a matrix (do you _really_ need it?),
     the QR decomposition is much more efficient than using Eigen
     values ('eigen').  See 'det'.
     Using LAPACK (including in the complex case) uses column pivoting
     and does not attempt to detect rank-deficient matrices.
References:
     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_. Wadsworth & Brooks/Cole.
     Dongarra, J. J., Bunch, J. R., Moler, C. B. and Stewart, G. W.
     (1978) _LINPACK Users Guide._  Philadelphia: SIAM Publications.
     Anderson. E. and ten others (1999) _LAPACK Users' Guide_. Third
     Edition. SIAM.
      Available on-line at <URL:
     http://www.netlib.org/lapack/lug/lapack_lug.html>.
See Also:
     'qr.Q',  'qr.R',  'qr.X' for reconstruction of the matrices.
     'lm.fit',  'lsfit', 'eigen', 'svd'.
     'det' (using 'qr') to compute the determinant of a matrix.
Examples:
hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, "+") }
 h9 <- hilbert(9); h9
     qr(h9)$rank           #--> only 7
     qrh9 <- qr(h9, tol = 1e-10)
     qrh9$rank             #--> 9
     ##-- Solve linear equation system  H %*% x = y :
    y <- 1:9/10
     x <- qr.solve(h9, y, tol = 1e-10) # or equivalently :
     x <- qr.coef(qrh9, y) #-- is == but much better than
                           #-- solve(h9) %*% y
     h9 %*% x              # = y

h9 %*% (solve(h9) %*% y)
            [,1]
 [1,] 0.09999996
 [2,] 0.20000011
 [3,] 0.30000012
 [4,] 0.40000010
 [5,] 0.50000008
 [6,] 0.60000007
 [7,] 0.70000005
 [8,] 0.80000004
 [9,] 0.90000003
> h9 %*% qr.coef(qrh9, y) 
      [,1]
 [1,]  0.1
 [2,]  0.2
 [3,]  0.3
 [4,]  0.4
 [5,]  0.5
 [6,]  0.6
 [7,]  0.7
 [8,]  0.8
 [9,]  0.9

 > round(h9,4)
        [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9]
 [1,] 1.0000 0.5000 0.3333 0.2500 0.2000 0.1667 0.1429 0.1250 0.1111
 [2,] 0.5000 0.3333 0.2500 0.2000 0.1667 0.1429 0.1250 0.1111 0.1000
 [3,] 0.3333 0.2500 0.2000 0.1667 0.1429 0.1250 0.1111 0.1000 0.0909
 [4,] 0.2500 0.2000 0.1667 0.1429 0.1250 0.1111 0.1000 0.0909 0.0833
 [5,] 0.2000 0.1667 0.1429 0.1250 0.1111 0.1000 0.0909 0.0833 0.0769
 [6,] 0.1667 0.1429 0.1250 0.1111 0.1000 0.0909 0.0833 0.0769 0.0714
 [7,] 0.1429 0.1250 0.1111 0.1000 0.0909 0.0833 0.0769 0.0714 0.0667
 [8,] 0.1250 0.1111 0.1000 0.0909 0.0833 0.0769 0.0714 0.0667 0.0625
 [9,] 0.1111 0.1000 0.0909 0.0833 0.0769 0.0714 0.0667 0.0625 0.0588


next up previous
Next: References Up: Numerical Analysis for Statisticians Previous: Real numbers: Floating Point
Susan Holmes 2005-04-06