Essays/Linear Regression

From J Wiki
Jump to: navigation, search

Linear regression is a statistical method of modeling the relationship between the dependent variable Y and independent X by estimating the coefficients of the linear form:

where each terms is a certain expression with the original independent variables (). For example, it could be that .

Least Squares Method

In least squares method, the coefficients of linear regression are selected in a way to minimize the sum of squared deviations between observations and their estimates:

Surface Fit Example

As an example we will take a certain bi-quadratic form

then add a small amount of noise, to simulate observed data, and try to reconstruct the coefficients using the least squares method.

inline:lsq_form.png inline:lsq_data.png inline:lsq_estm.png
'surface'plot X1;X2;FORM 'surface'plot X1;X2;DATA 'surface'plot X1;X2;COEF mp XMAT
   load 'plot'
   mp =: +/ . *

      'X1 X2' =: |: ,"0/~ i:8
      $XMAT   =: 1 , X1 , (X1^2) , X2 , (X1*X2) ,: (X2^2)
6 17 17

      FORM    =: 1   0     0.2     0.3   0    _0.4 mp XMAT
      FORM    -: 1 + (0.2*X1^2) + (0.3*X2) + (_0.4*X2^2)
1

      NOISE   =: 4 * _0.5 + ($X1) ?.@$ 0
      $DATA   =: FORM + NOISE
17 17
         COEF  =: (,DATA) %. |:,"2 XMAT

Now we can compare the obtained coefficients with the original formula.

   0j4": COEF  ,: (,FORM) %. |:,"2 XMAT
1.0011 _0.0144 0.2005 0.3104 0.0024 _0.4013
1.0000  0.0000 0.2000 0.3000 0.0000 _0.4000

Additional regression analysis is provided in the 'stats' package.

   load 'stats'
   (|:}.,"2 XMAT) regression ,DATA

             Var.       Coeff.         S.E.           t
              0        1.00105        0.12654        7.91
              1       _0.01444        0.01375       _1.05
              2        0.20052        0.00316       63.55
              3        0.31036        0.01375       22.56
              4        0.00241        0.00281        0.86
              5       _0.40131        0.00316     _127.17

  Source     D.F.        S.S.          M.S.           F
Regression    5    27192.76720     5438.55344     4144.49
Error       283      371.36300        1.31224
Total       288    27564.13020

S.E. of estimate         1.14553
Corr. coeff. squared     0.98653

The index shows high degree of match between the observations and their estimates.

See Also