# Essays/Linear Regression

Jump to: navigation, search

Linear regression is a statistical method of modeling the relationship between the dependent variable Y and independent X by estimating the coefficients ${\displaystyle b_{0},...b_{p}}$ of the linear form:

${\displaystyle {\hat {y}}=b_{0}+b_{1}x_{1}+b_{2}x_{2}+...+b_{p}x_{p}}$

where each terms ${\displaystyle x_{i}}$ is a certain expression with the original independent variables (${\displaystyle X^{(1)}...X^{(k)}}$). For example, it could be that ${\displaystyle x_{1}=X,x_{2}=X^{2}}$.

## Least Squares Method

In least squares method, the coefficients of linear regression are selected in a way to minimize the sum of squared deviations between observations and their estimates:

${\displaystyle \sum _{i=1..n}\left(Y_{i}-{\hat {y}}(X_{i})\right)^{2}\rightarrow min}$

## Surface Fit Example

As an example we will take a certain bi-quadratic form

${\displaystyle y(x_{1},x_{2})=1+0.2x_{1}^{2}+0.3x_{2}-0.4x_{2}^{2}}$

then add a small amount of noise, to simulate observed data, and try to reconstruct the coefficients using the least squares method.

 inline:lsq_form.png inline:lsq_data.png inline:lsq_estm.png 'surface'plot X1;X2;FORM 'surface'plot X1;X2;DATA 'surface'plot X1;X2;COEF mp XMAT
   load 'plot'
mp =: +/ . *

'X1 X2' =: |: ,"0/~ i:8
$XMAT =: 1 , X1 , (X1^2) , X2 , (X1*X2) ,: (X2^2) 6 17 17 FORM =: 1 0 0.2 0.3 0 _0.4 mp XMAT FORM -: 1 + (0.2*X1^2) + (0.3*X2) + (_0.4*X2^2) 1 NOISE =: 4 * _0.5 + ($X1) ?.@$0$DATA   =: FORM + NOISE
17 17
COEF  =: (,DATA) %. |:,"2 XMAT


Now we can compare the obtained coefficients with the original formula.

   0j4": COEF  ,: (,FORM) %. |:,"2 XMAT
1.0011 _0.0144 0.2005 0.3104 0.0024 _0.4013
1.0000  0.0000 0.2000 0.3000 0.0000 _0.4000


Additional regression analysis is provided in the 'stats' package.

   load 'stats'
(|:}.,"2 XMAT) regression ,DATA

Var.       Coeff.         S.E.           t
0        1.00105        0.12654        7.91
1       _0.01444        0.01375       _1.05
2        0.20052        0.00316       63.55
3        0.31036        0.01375       22.56
4        0.00241        0.00281        0.86
5       _0.40131        0.00316     _127.17

Source     D.F.        S.S.          M.S.           F
Regression    5    27192.76720     5438.55344     4144.49
Error       283      371.36300        1.31224
Total       288    27564.13020

S.E. of estimate         1.14553
Corr. coeff. squared     0.98653


The ${\displaystyle R^{2}}$ index shows high degree of match between the observations and their estimates.