From J Wiki
Jump to: navigation, search

The Levenberg – Marquardt Algorithm (LMA)

The Levenberg – Marquardt Algorithm (LMA) is a commonly used routine for least-squares optimization problems. The current NLLS verb is based upon an APL application developed by Gavaris (1988) which has been used extensively in the oceans science community. Indeed, numerous subsequent applications in other languages (e.g. FORTRAN, C, ACON) have been developed. To this list is now added J.

There are two attachments to this wiki page. These can be obtained by clicking on the attachment tab at the bottom of the NLLS page. One is the NLLS script (nlls_j602_5 april 2010) along with an example application from Haddon (2001) while the other (ls_prob_2 jan 2010) is a least-squares problem described in Bard (1974), starting in section 5.21 and which is referred to in the subsequent chapters. These provide good examples of how NLLS works and is a useful guide for other applications.

Some Theory

In all gradient methods,
\theta_{i+1} = \theta_i + r_i \times v_i

where r_i and v_i are the step size and direction, respectively, required to change the parameter, update \theta in order to minimize some objective function.

A direction v_i is acceptable if and only if there exists a positive definite matrix, R_i such that
v_i	=	- R_i q_i

where q_i is the gradient vector (first partial derivative).

Thus the basic equation of the ith iteration in any gradient method is
\theta_{i+1} = \theta_i + r_i R_i q_i

The various gradient methods differ in the manner of choosing r_i and R_i. In the Newton-Raphson method (to guarantee quadratic convergence), r_i is 1 and R_i equals H_i-1, the Hessian matrix of second partial derivatives. In general, there is a minimum if H_i-1 and thus R_i is positive definite.

In the Levenberg – Marquardt Algorithm (LMA)
	R_i	= (A_i + l_i B_i^2)-1

B_i^2 is a diagonal matrix whose elements coincide with the absolute values of the diagonal elements of A_i. Regarding l_i, it can be shown that if B_i^2 is any positive definite matrix, then A_i + l_i B_i^2 is positive definite for sufficiently large l_i, no matter what A_i. As l_i approaches infinity, the term l_i B_i^2 dominates A_i, resulting in an extremely short step in the downhill direction, B_i^2 being positive definite. On the other hand, as l_i decreases, v_i approaches - R_i q_i  = A_i-1 q_i, the Newton-Raphson direction. Thus, r_i, the step size, is indirectly determined through the choice of l_i and in the LMA, can be assumed to be one. In each iteration, progress towards minimization of the objective function is evaluated. If the residual sum of squares is not decreasing, l_i is increased according to some rule (e.g. multiplied by order of magnitude) and the objective function re-evaluated. This has the effect of exploring smaller steps until the residual sums of squares decreases. Once this occurs, l_i is decreased, thus increasing the step size. In essence, use of l_i ensures that R_i, the Hessian matrix, remains positive definite and the algorithm is always going towards a minimum and stays away from saddle points.

The routine NLLS allows for both unconstrained and constrained optimization by setting a constraint equal to either ‘OFF’ or ‘ON’. Constraints are achieved through a penalty function, p, added to the residual sum of squares produced by the objective function
p	=	SUM (a / h (\theta))

For each \theta, two h terms are defined

h_u	=	ucnstrnt - \theta
h_l	=	\theta - lcnstrnt

The constant, a, is a measure of the influence of the constraint. The larger the a, the greater the impact of the constraint. Now, a is scaled by the initial value of each parameter
	a	=	constant * \theta_i

The constant is user-defined. In NLLS, it is set at 0.001 but can obviously be changed if the user desires the penalty function to have more influence on the optimization. Use of a implies that each lower and upper constraint is standardized by the size of \theta

p = \Sigma a / h (\theta)

= constant * ( \theta_i / (\theta - lcnstrnt)) + constant * ( \theta_i / ucnstrnt - \theta))

Thus, as the \theta terms approach the constraint, p increases, causing a change in the direction of the optimization.

Inputs to NLLS

The required inputs of the NLLS verb are the y (dependent) variable, the x (independent) variables, a set of starting estimates of the parameters (initial) and the constraints. As well, the least-squares objective function needs to be defined which from the y and x variables and the parameter set produces a list of residuals.

Within the NLLS verb, a number of variables relevant to the minimization are used. These can be changed by the user although through experience, this has been found to be rarely necessary. These include con = 10 , limit = 100 (maximum number of steps in main loop and the convergence and tolerance criteria in the whilst statement relating to the relative change in the parameter set and residual sum of squares between steps.


The NLLS verb is executed as
'objective function name' NLLS initial
Note that the noun initial can be a scalar (in the case of a one parameter model) as within NLLS, the y parameter set is formed into a list by the assignment par =. ,y .

Experience with this routine is thus far limited to models with less than 40 parameters and 400 observations although in principle much larger models should be able to be run. One quite complicated model (with lots of processing within the objective function) took just over a minute (Windows XP32 on laptop with Intel T7700 (2.4 Ghz) processor and 2 GB ram).


The NLLS verb produces a list of the parameter estimates for each loop executed. A table (stats) is produced upon completion which indicates

• Number of main loops • Final value of lambda • Number of observations • Number of parameters • Residual sum of squares • Mean square residual • AIC (analog for least-squares)

It also produces a number of global variables, including the inverse of the Hessian (hess_inv), the covariance matrix (cov), the standard deviation of each parameter (par_se), the correlation matrix (corr) and the coefficient of variation of each parameter (cv). The final parameter set is in the noun parm.

Further Developments

This code is offered to the J community as a tool to assist in modeling and its further development is encouraged. It was initially written when the author was just learning J and thus is a bit ‘loopy’. It is hoped that through use, it can be further optimized. It has received a fair bit of testing but errors are always a possibility. Thus, if and when these are encountered, these should be reported to the author who will make the changes on the wiki.

There's also a page on which to work on the code as a collaborative project.


Bard, Y. 1974. Nonlinear Parameter Estimation. Academic Press. New York. 341 pp.

Gavaris, S. 1988. An adaptive framework for the estimation of population size. CAFSAC res doc 88 / 29.

Haddon, M. 2001. Modelling and quantitative methods in fisheries. Chapman and Hall Press, New York. 406 pp.

Contributed by Bob O’Boyle (