Optimization Method

ContentsIndex


Optimization Method

The default method of optimization of the likelihood function in Program MARK is the Newton-Raphson method with numerical derivatives.  The VA09AD algorithm from Harwell library is the default method.  However, Program MARK has available a second numerical optimization algorithm to maximize the likelihood function and obtain parameter estimates available in the Run Window.  Sometimes, the default optimization routine does not converge properly and using the second algorithm works.  Select the second algorithm by clicking on the “Use Alt. Opt. Method” check box in the Run Window dialog.

The second method of optimization is simulated annealing.  Simulated annealing is a global optimization method that distinguishes between different local optima. Starting from an initial point, the algorithm takes a step and the function is evaluated. When minimizing a function, any downhill step is accepted and the process repeats from this new point.  An uphill step may be accepted.  Thus, simulated annealing can escape from local optima.  This uphill decision is made by the Metropolis criteria.  As the optimization process proceeds, the length of the steps decline and the algorithm closes in on the global optimum. Since the algorithm makes very few assumptions regarding the function to be optimized, it is quite robust with respect to non-quadratic surfaces.  Simulated annealing can be used as a local optimizer for difficult functions.  The  implementation of simulated annealing in Program MARK was used in “Global Optimization of Statistical Functions with Simulated Annealing,” Goffe, Ferrier and Rogers, Journal of Econometrics, vol. 60, no. 1/2, Jan./Feb. 1994, pp. 65-100.

The difficulty with using simulated annealing is that the algorithm is very inefficient compared to the Newton-Raphson method that is the default optimization method in MARK.  However, some data types, specifically the multi-strata data types, may have multiple maxima, i.e., local maxima may exist.  As a result, the default optimization method may end up at local maximum of the likelihood instead of the global maximum.  Simulated annealing is more likely to find the global maximum.

Simulated annealing searches within a bounded range of the parameters.  In MARK, none of the link functions put constrains on the beta parameters, i.e., the parameters that are actually optimized.  The main problem occurs with the logit link function because extreme values of beta parameters can become fixed in the tails (extremes) of this function.  That is, the derivative of the inverse logit function appears to be zero because the parameter is apparently constant.  Because the simulated annealing optimization method requires a range over which parametgers may vary, the constrains on the beta parameters in the range -20 to 20 have been set, so that the logit link (and the MLogit and CLogit link functions) will function reasonably well.  This range is not a problem for the sin link, but may occasionally be a problem with population estimation parameters (N) for closed capture models that use the log link

As a consequence of the constraint that beta parameters must be in the [-20, 20] interval, some models may not fit will with the simulated annealing optimization method.  I suggest that you only use this optimization method to get near the global maximum, and then use the parameter values from the resulting model as initial values for the default optimization method.

Warning: the simulated annealing optimization method can take as much as 10 to 100 times the computer time of the default optimization method.  Only use this alternative method if you suspect a problem with local maxima.