NYCJUG/2010-11-09/Levenberg-MarquardtAlgorithm

From J Wiki
Jump to: navigation, search

Levenberg, Marquardt, non-linear least-squares, optimization, solver


This is a J implementation, originally provided by Bob O'Boyle, of the Levenberg-Marquardt algorithm for non-linear least-squares optimization.

However, this page intends to provide a basis for improving the implementation by making it more J-like. To accomplish this, we first need to ensure that it runs to completion and provides the expected results. As we'll see in the next section, this is not necessarily simple. The two parts shown below are available as discrete files File:LevenbergMarquardt.ijs and File:LMAUsageEG.ijs.

Current Status and Likely Directions

We are now able to read the code and run it to completion though the results seem to differ slightly but significantly from those expected in both examples. We need to concentrate on NLLS by winnowing it down to its core routine - to properly separate out the statistics generation code now embedded in it.

Notes on Changes and Problems

(20100120 20:49) A utility function "qprintf" is not defined. It appears to be used to output information, so I assigned it to the standard J output function

   qprintf=: smoutput

This appears to allow the example to run to completion. Since the output produced by the invocation of this function appears to be fairly useless - simply the constant string 'npar ' - I eliminated it entirely. Also made the code more readable by: using indentation within function bodies; put control constructs like "if. ... do." and such on single line instead of having single lines empty except for a "do." statement.

(20100120 22:37) Re-loading the updated version brings to light a bug: the variable "initial" which supplies the initial parameters has three values in the main routine but only two in the example use and the code expects only two values. This is why we don't pass data around in global variables with common names. In fact, it's not clear why we want to assign a name at all when we use it only once at the invocation.

This also points up the unconventional use of all-caps for function names but not for global variables; usually, we want to emphasize the danger of globals by spelling them all-caps and emphasize the meaningfulness of functions by using mixed-case as appropriate.

Running the example gives results which differ from the expected result of 813.4583 960.9063 as mentioned in the comments to the example code:

   'LS_PROB' NLLS 750 1200
813.90901 961.01103

(20100120 23:07) The discrepant result can be resolved if we apply NLLS to its result until it converges. Looking at doing this points up further problems with the idiosyncratic spacing of the assignments in this code which appears to aim at aligning the the assignment symbols on consecutive lines. It's hard to see how this is at all helpful and easy to see how unhelpful it is when we attempt to search for assignments to a name as opposed to uses of the same name. If we always put the assignment symbol immediately after the name being assigned, we have an unambiguous target for these kind of searches.

(20100120 23:15) Applying the optimization recursively

   'LS_PROB'&NLLS^:_] 750 1200
813.85949 960.999

Which disagrees with the expected result mentioned but appears to be a convergent solution. This also points up that NLLS should really be an adverb - the argument 'LS_PROB' is the name of the objective function which is instantiated within NLLS.

(20100120 23:28) NLLS as an adverb works fine; also simplified the example objective function LS_PROB by removing unnecessary parens and a name which is used only once immediately following its assignment.

(20100120 23:34) Attempting the VON_B example in the main routine

   VON_B NLLS^:_ ] 100 0.05 1
57.501351 0.31545876 0.0002625186

gives results differing from those indicated in the comment: NB. Solution with Excel's SOLVER is linf= 55.9778, k= 0.385595, t0= 0.171371. We need to better understand how to evaluate the results - perhaps the globals like stats will help us here?

Current J Implementation of the Levenberg-Marquardt Algorithm (LMA)

Here is the currently modified code in-line; it is followed by the in-line code of an example usage. One problem we face here is that the code does not quite seem to work. However, as we get it to work and improve it, we can keep the following instance updated.

Further explorations

There are a couple of inter-related issues with the program not seeming to work. It actually does when cnstrnt =: 'OFF' . When cnstrnt =: 'ON" , the solution is sensitive to the values of ucnstrnt and lcnstrnt. It is important to ensure that the upper and lower constraints bound the initial parameter estimates. When these are set to 110 0.75 1.1 and 40 0 0, which bound the initial values, the solution is closer to the unconstrained one.

Also, as originally configured, the input function (OBJ_FN) was the left argument of NLLS. The original NLLS (as noun) implementation is attached to this page. Converting the latter to an adverb allows iteration using the power function but this may be unnecessary as the iteration is done within NLLS. One could use VON_B NLLS ^: n ] 100 0.05 1 where n could be 1, 2, 3 etc to see this effect. It is not clear what the advantages of the adverb use are other than confirmation of convergence. It might be better to conduct a constrained run and use that solution in an unconstrained run to ensure stable convergence. It might also be better to use different initial values to see if the convergence is stable.

So one should use either version of NLLS with these points in mind.

NB.* LevenbergMarquardt.ijs: Von Bertalanffy fit using Marquardt nonlinear
NB. least-squares.
NB. Need to type 'objective function' NLLS initial parameter list to execute
NB. outputs stats matrix & parm vector
NB.
NB. Original written by R. O'Boyle Dec 2008

NB.    Inputs
NB. Data from Haddon (2001) page 92
years=:    1.0 2.0 3.3 4.3 5.3 6.3 7.3 8.3 9.3 10.3 11.3
length=:    15.4 26.93 42.23 44.59 47.63 49.67 50.87 52.3 54.77 56.43 55.88
initial=:    100 0.05 1.0

NB.    Constraints
ucnstrnt=:    60 1 2
lcnstrnt=:    30 0 0
cnstrnt=:  'ON'
alpha=:    0.001 * initial          NB. influence of penalty function
NB. solution with Excel's SOLVER is linf= 55.9778, k= 0.385595, t0= 0.171371

VON_B=: 3 : 0
   'a b c'=. y
   l_pred=. a * (1 -^ - (b * years - c))
   length - l_pred
)

NLLS=: 1 : 0
NB. Marquardt nonlinear least-squares
NB. Need to type 'objective function' NLLS initial parameter list to execute
NB. Need to supply upper & lower parameter constraints
NB. outputs stats matrix & parm vector
   identity=.=/~i.$y
   limit=.     200
   p=.     $par=. npar=. ,y
   rss=.     e +/ .* e=. u par
   n=.     $e
   pnlty=.    alpha PNLTY_FN par        NB. PENALTY FOR CONSTRAINTS
   nphi=.    phi=. rss + pnlty
   lambda=.    0.01
   con=.     10
   j=. 0
   whilst.                        NB. Start of Main Loop
    1=<./(10>:i),(limit>:j),(0.0001<con=.(((n-p)*q +/ . * -v)%p*rss)^0.5), (0.00001<|(nphi-phi)%phi), (0.00001 +./ . < |(npar-par)%0.00000000000000000001 + |par)
   do.
       j=.    j + 1
       par=.    npar
       phi=.    nphi
       de=. ((0.00001*|npar) + (0.00000001*npar= 0)) u D: 1 npar
       q=.    2 * de +/ . * e        NB.    Gradient
       hess=.    2 *  (] +/ . *"2 1 ]) de    NB.    Hessian
       if. 'ON' -: cnstrnt do.            NB. Start of Difference for penalty
           dpnlty=.    pnlty DIFF_PNLTY par
           q=.    q + {. dpnlty
           hess=.    hess + identity * {: dpnlty    NB. Adds penalty to diag of Hessian
       end.                                NB.    Difference for penalty

       lambda=.    0.000001 >. lambda * 0.01
       hess=.    hess + hess * identity * lambda=. lambda * 10
       norm=.    %:(+/"2 hess^2)          NB. Column norms
       hess=.    hess %" 1 1 norm         NB.    Scale Hessian to norm
       npar=.    par + v=.    (-q %. hess) % norm    NB. Step direction; step size= 1
       rss=.     e +/ . * e=. u npar
       pnlty=.     alpha PNLTY_FN par
       i=.     1
       if. (phi < nphi=. rss + pnlty) do.
           lambda=. lambda * 100
           while. (phi < nphi=. rss + pnlty) *. (i <: 10) do.
               i=.     i + 1
               npar=.     par + v=. v * 0.1^i
               rss=.     e +/ . * e=. u npar
               pnlty=.    alpha PNLTY_FN par
           end.
       end.
   end.                                 NB. End of Main Loop

   e=.    u npar
   msr=: {.(rss=. e +/ . * e) % n - p
   aic=: (^.rss) + 2*p%n              NB. AIC analog for least-squares:
        NB. (Hongzhi,A. 1989. Acta Mathematica Applicatae Sinica. 5: 60-67)
        NB. Invert hessian & re-normalize
   hess_inv=: (%.hess)%"1 0 norm
        NB. Covariance matrix (msr divided by hessian)
   cov=: 2 * msr * hess_inv
        NB. Standard error (diagonal of covariance matrix)
   par_se=: %:(+/"2 identity * cov)
        NB. Correlation matrix
   corr=:  cov%(*/~par_se)
   cv=:    par_se % par
   stats=: 7 2 $ 'Loops'; j; 'Lambda'; lambda; 'No. Observations'; n; 'No. Parameters'; p; 'RSS'; rss; 'MSR'; msr; 'AIC'; aic
   parm=:     npar
)

PNLTY_FN=: 4 : 0
NB.    Penalty function for specified constraints
NB. r= resultant penalty
NB. y= parameters
NB. x= vector of constants for the constraints
   if. 'ON' -: cnstrnt do.
       r=. +/(x, x)%(y-lcnstrnt),ucnstrnt-y
NB. GENERIC LOWER AND UPPER CONSTRAINTS
   else.  r=. 0                         NB. Default is no penalty
   end.
)

PNLTY_FN=: 4 : 0
NB.    Penalty function for specified constraints
NB. r= resultant penalty
NB. y= parameters
NB. x= vector of constants for the constraints
   if. 'ON' -: cnstrnt do.
       r=. +/(x, x)%(y-lcnstrnt),ucnstrnt-y
NB. GENERIC LOWER AND UPPER CONSTRAINTS
   else.  r=. 0                         NB. Default is no penalty
   end.
)

DIFF_PNLTY=: 4 : 0
   r=.    2 0 $ 0
   identity=.=/~i.$y
   delta=.    (0.01*y) + (0.001*y= 0)
   tpar_f=.    y +" 1 1 (identity * delta)
   tpar_b=.    y -" 1 1 (identity * delta)
   for_a. i. $y do.
       r1=.    (x-fpnlty=.alpha PNLTY_FN a { tpar_f)% a { delta
       bpnlty=.    alpha PNLTY_FN a { tpar_b
       r=.    r,.r1,(fpnlty+bpnlty-2*x)% a { delta
   end.
)

Example Use of LMA Code

NB.* LMAUsageEG.ijs: example of using Levenberg-Marquardt algorithm to solve a
NB. known problem: least-squares problem from Bard (1974) page 124.
NB. Constraints and boundaries per Bard

NB.    Inputs
time=: 0.1 0.2 0.3 0.4 0.5 0.05 0.1 0.15 0.2 0.25 0.02 0.04 0.06 0.08 0.1
temp=: 100 100 100 100 100 200 200 200 200 200 300 300 300 300 300
fract=: 0.98 0.983 0.955 0.979 0.993 0.626 0.544 0.455 0.225 0.167 0.566 0.317 0.034 0.016 0.066
initial=: 750 1200

NB.    Constraints
ucnstrnt=: 1000 2000
lcnstrnt=: 500 750
cnstrnt=: 'OFF'

alpha=: 0.001 * initial            NB. influence of penalty function
NB. Unconstrained solution by Bard (1974) is a= 813.4583 and b= 960.9063

LS_PROB=: 3 : 0
   'a b'=. y
   fract - ^-a*time * ^-b%temp
)

   LS_PROB NLLS 750 1200
813.909 961.011