Essays/Linear Regression

From J Wiki
Jump to: navigation, search

Linear regression is a statistical method of modeling the relationship between the dependent variable Y and independent X by estimating the coefficients b_0,...b_p of the linear form:

 \hat y = b_0 + b_1 x_1 + b_2 x_2 + ... + b_p x_p

where each terms x_i is a certain expression with the original independent variables (X^{(1)}...X^{(k)}). For example, it could be that x_1 = X, x_2 = X^2.

Least Squares Method

In least squares method, the coefficients of linear regression are selected in a way to minimize the sum of squared deviations between observations and their estimates:

 \sum_{i=1..n} \left( Y_i - \hat y(X_i) \right)^2  \rightarrow min

Surface Fit Example

As an example we will take a certain bi-quadratic form

 y(x_1,x_2) = 1 + 0.2 x_1^2 + 0.3 x_2 - 0.4 x_2^2

then add a small amount of noise, to simulate observed data, and try to reconstruct the coefficients using the least squares method.

inline:lsq_form.png inline:lsq_data.png inline:lsq_estm.png
'surface'plot X1;X2;FORM 'surface'plot X1;X2;DATA 'surface'plot X1;X2;COEF mp XMAT
   load 'plot'
   mp =: +/ . *

      'X1 X2' =: |: ,"0/~ i:8
      $XMAT   =: 1 , X1 , (X1^2) , X2 , (X1*X2) ,: (X2^2)
6 17 17

      FORM    =: 1   0     0.2     0.3   0    _0.4 mp XMAT
      FORM    -: 1 + (0.2*X1^2) + (0.3*X2) + (_0.4*X2^2)
1

      NOISE   =: 4 * _0.5 + ($X1) ?.@$ 0
      $DATA   =: FORM + NOISE
17 17
         COEF  =: (,DATA) %. |:,"2 XMAT

Now we can compare the obtained coefficients with the original formula.

   0j4": COEF  ,: (,FORM) %. |:,"2 XMAT
1.0011 _0.0144 0.2005 0.3104 0.0024 _0.4013
1.0000  0.0000 0.2000 0.3000 0.0000 _0.4000

Additional regression analysis is provided in the 'stats' package.

   load 'stats'
   (|:}.,"2 XMAT) regression ,DATA

             Var.       Coeff.         S.E.           t
              0        1.00105        0.12654        7.91
              1       _0.01444        0.01375       _1.05
              2        0.20052        0.00316       63.55
              3        0.31036        0.01375       22.56
              4        0.00241        0.00281        0.86
              5       _0.40131        0.00316     _127.17

  Source     D.F.        S.S.          M.S.           F
Regression    5    27192.76720     5438.55344     4144.49
Error       283      371.36300        1.31224
Total       288    27564.13020

S.E. of estimate         1.14553
Corr. coeff. squared     0.98653

The R^2 index shows high degree of match between the observations and their estimates.

See Also