source: sasview/docs/sphinx-docs/source/user/sasgui/perspectives/fitting/optimizer.rst @ 5878a9ea

ESS_GUIESS_GUI_DocsESS_GUI_batch_fittingESS_GUI_bumps_abstractionESS_GUI_iss1116ESS_GUI_iss879ESS_GUI_iss959ESS_GUI_openclESS_GUI_orderingESS_GUI_sync_sascalccostrafo411magnetic_scattrelease-4.1.1release-4.1.2release-4.2.2ticket-1009ticket-1094-headlessticket-1242-2d-resolutionticket-1243ticket-1249ticket885unittest-saveload
Last change on this file since 5878a9ea was e81f6dd, checked in by smk78, 9 years ago

Label optimizer-burn added (by someone, not me!)

  • Property mode set to 100644
File size: 31.8 KB
RevLine 
[49148bb]1.. _optimizer-guide:
2
3*******************
4Optimizer Selection
5*******************
6
7Bumps has a number of different optimizers available, each with its own
8control parameters:
9
10* :ref:`fit-lm`
11* :ref:`fit-amoeba`
12* :ref:`fit-dream`
13* :ref:`fit-de`
14* :ref:`fit-newton`
15* :ref:`fit-rl` [experimental]
16* :ref:`fit-ps` [experimental]
17* :ref:`fit-pt` [experimental]
18
19In general there is a trade-off between convergence
20rate and robustness, with the fastest algorithms most likely to find a
21local minimum rather than a global minimum.   The gradient descent algorithms
22(:ref:`fit-lm`, :ref:`fit-newton`) tend to be fast but they will find local
23minima only, while the population algorithms (:ref:`fit-dream`, :ref:`fit-de`)
24are more robust and likely slower.   :ref:`fit-amoeba` is somewhere between,
25with a small population keeping the search local but more robust than the
26gradient descent algorithms.
27
28Each algorithm has its own set of control parameters for adjusting the
29search process and the stopping conditions.  The same option may mean
30slightly different things to different optimizers.  The bumps package
31provides a dialog box for selecting the optimizer and its options
32when running the fit wx application.  This only includes the common options
33for the most useful optimizers.  For full control, the fit will need to
34be run from the command line interface or through a python script.
35
36For parameter uncertainty, most algorithms use the covariance matrix at
37the optimum to estimate an uncertainty ellipse.  This is okay for a
38preliminary analysis, but only works reliably for weakly correlated parameters.
39For full uncertainty analysis, :ref:`fit-dream` uses a random walk to explore
40the parameter space near the minimum, showing pair-wise correlations
41amongst the parameter values.  In order for :ref:`fit-dream` to return the
42correct uncertainy, the function to be optimized should be a conditional
43probability density, with *nllf* as the negative log likelihood function
44of seeing point $x$ in the parameter space.  Other functions
45can be fitted, but uncertainty estimates will be meaningless.
46
47Most algorithms have been adapted to run in parallel at least to some degree.
48The  implementation is not heavily tuned, either in terms of minimizing the
49overhead per function evaluation or for distributing the problem across
50multiple processors.   If the theory function is implemented in parallel,
51then the optimizer should be run in serial.  Mixed mode is also possible
52when running on a cluster with a multi-threaded theory function.  In this
53case, only one theory function will be evaluated on each cluster node, but
54the optimizer will distribute the parameters values to the cluster nodes
55in parallel.  Do not run serial algorithms (:ref:`fit-lm`, :ref:`fit-newton`) on
56a cluster.
57
58We have included a number of optimizers in Bumps that did not perform
59particularly well on our problem sets.  However, they may be perfect
60for your problem, so we have left them in the package for you to explore.
61They are not available in the GUI selection.
62
63.. _fit-lm:
64
65Levenberg-Marquardt
66===================
67
68.. image:: fit-lm.png
69    :alt: Levenberg-Marquardt option screen.
70    :align: left
71
72The Levenberg-Marquardt algorithm has been
73the standard method for non-linear data fitting.  As a gradient descent
74trust region method, it starts at the initial value of the function and
75steps in the direction of the derivative until it reaches the minimum.
76Set up as an explicit minimization of the sum of square differences between
77theory and model, it uses a numerical approximation of the Jacobian matrix
78to set the step direction and an adaptive algorithm to set the size of
79the trust region.
80
81When to use
82-----------
83
84Use this method when you have a reasonable fit near the minimum, and
85you want to get the best possible value.  This can then be used as the starting
86point for uncertainty analysis using :ref:`fit-dream`.  This method requires
87that the problem definition includes a *residuals* method, but this should
88always be true when fitting data.
89
90When modeling the results of an experiment, the best fit value is an
91accident of the measurement.  Redo the same measurement, and the slightly
92different values you measure will lead to a different best fit.  The
93important quantity to report is the credible interval covering
9468%  (1-\ $\sigma$) or 95% (2-\ $\sigma$\ ) of the range of
95parameter values that are somewhat consistent with the data.
96
97This method uses *lmfit* from *scipy*, and does not run in parallel.
98
99Options
100-------
101
102*Steps* is the number of gradient steps to take.  Each step requires
103a calculation of the Jacobian matrix to determine the direction.  This
104needs $2 m n$ function evaluations, where $n$ is the number of parameters and
105each function is evaluated and $m$ data points (assuming center point
106formula for finite difference estimate of the derivative).  The resulting
107linear equation is then solved, but for small $n$ and expensive function
108evaluation this overhead can be ignored.  Use ``--steps=n`` from
109the command line.
110
111*f(x) tolerance* and *x tolerance* are used to determine when
112the fit has reached the point where no significant improvement is expected.
113If the function value does not improve significantly within the step, or
114the step is too short, then the fit will terminate.  Use ``--ftol=v`` and
115``--xtol=v`` from the command line.
116
117From the command line, ``--starts=n`` will automatically restart the algorithm
118after it has converged so that a slightly better value can be found. If
119``--keep_best`` is included then restart will use a value near the minimum,
120otherwise it will restart the fit from a random point in the parameter space.
121
122Use ``--fit=lm`` to select the Levenberg-Marquardt fitter from the command line.
123
124References
125----------
126
127.. [Levenberg1944]
128    Levenberg, K.
129    *Quarterly Journal of Applied Mathmatics*
130    1944, II (2), 164–168.
131
132.. [Marquardt1963]
133    Marquardt, D. W.
134    *Journal of the Society for Industrial and Applied Mathematics*
135    1963, 11 (2), 431–441.
136    DOI: `10.1137/0111030 <http://dx.doi.org/10.1137/0111030>`_
137
138.. _fit-amoeba:
139
140Nelder-Mead Simplex
141===================
142
143.. image:: fit-amoeba.png
144    :alt: Nelder-Mead Simplex option screen.
145    :align: left
146
147The Nelder-Mead downhill simplex algorithm is a robust optimizer which
148does not require the function to be continuous or differentiable.
149It uses the relative values of the function at the corners of a
150simplex (an n-dimensional triangle) to decide which points of the simplex
151to update.  It will take the worst value and try moving it inward or
152outward, or reflect it through the centroid of the remaining values
153stopping if it finds a better value.  If none of these values are
154better, then it will shrink the simplex and start again.  The name
155amoeba comes from the book *Numerical Recipes* [Press1992]_ wherein they
156describe the search as acting like an amoeba, squeezing through narrow valleys
157as it makes its way down to the minimum.
158
159When to use
160-----------
161
162Use this method as a first fit to your model.  If your fitting function
163is well behaved with few local minima this will give a quick estimate of
164the model, and help you decide if the model needs to be refined.  If your
165function is poorly behaved, you will need to select a good initial value
166before fitting, or use a more robust method such
167as :ref:`fit-de` or :ref:`fit-dream`.
168
169The uncertainty reported comes from a numerical derivative estimate at the
170minimum.
171
172This method requires a series of function updates, and does not benefit
173much from running in parallel.
174
175Options
176-------
177
178*Steps* is the simplex update iterations to perform.  Most updates
179require one or two function evaluations, but shrinking the simplex evaluates
180every value in the simplex. Use ``--steps=n`` from the command line.
181
182*Starts* tells the optimizer to restart a given number of times.
183Each time it restarts it uses a random starting point.   Use
184``--starts=n`` from the command line.
185
186*Simplex radius* is the initial size of the simplex, as a portion of
187the bounds defining the parameter space.  If a parameter is unbounded, then
188the radius will be treated as a portion of the parameter value. Use
189``--radius=n`` from the command line.
190
191*x tolerance* and *f(x) tolerance* are used to determine when the
192fit has reached the point where no significant improvement is expected.
193If the simplex is tiny (that is, the corners are close to each other) and
194flat (that is, the values at the corners are close to each other),
195then the fit will terminate.  Use ``--xtol=v`` and ``--ftol=v`` from
196the command line.
197
198From the command line, use ``--keep_best`` so that restarts are centered on a
199value near the minimum rather than restarting from a random point within the
200parameter bounds.
201
202Use ``--fit=amoeba`` to select the Nelder-Mead simplex fitter from the
203command line.
204
205References
206----------
207
208.. [Nelder1965]
209    Nelder, J. A.; Mead, R.
210    *The Computer Journal*
211    1965, 7 (4), 308–313.
212    DOI: `10.1093/comjnl/7.4.308 <http://dx.doi.org/10.1093/comjnl/7.4.308>`_
213
214.. [Press1992]
215   Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; Vetterling, W. T.
216   In *Numerical Recipes in C: The Art of Scientific Computing, Second Edition*;
217   Cambridge University Press: Cambridge; New York, 1992; pp 408–412.
218
219
220.. _fit-newton:
221
222Quasi-Newton BFGS
223=================
224
225.. image:: fit-newton.png
226    :alt: Quasi-Newton BFGS option screen.
227    :align: left
228
229Broyden-Fletcher-Goldfarb-Shanno is a gradient descent method which uses the
230gradient to determine the step direction and an approximation of the Hessian
231matrix to estimate the curvature and guess a step size.  The step is further
232refined with a one-dimensional search in the direction of the gradient.
233
234When to use
235-----------
236
237Like :ref:`fit-lm`, this method converges quickly to the minimum.  It does
238not assume that the problem is in the form of a sum of squares and does not
239require a *residuals* method.
240
241The $n$ partial derivatives are computed in parallel.
242
243Options
244-------
245
246*Steps* is the number of gradient steps to take.  Each step requires
247a calculation of the Jacobian matrix to determine the direction.  This
248needs $2 m n$ function evaluations, where $n$ is the number of parameters and
249each function is evaluated and $m$ data points (assuming center point
250formula for finite difference estimate of the derivative).  The resulting
251linear equation is then solved, but for small $n$ and expensive function
252evaluation this overhead can be ignored.
253Use ``--steps=n`` from the command line.
254
255*Starts* tells the optimizer to restart a given number of times.
256Each time it restarts it uses a random starting point.
257Use ``--starts=n`` from the command line.
258
259*f(x) tolerance* and *x tolerance* are used to determine when
260the fit has reached the point where no significant improvement is expected.
261If the function is small or the step is too short then the fit
262will terminate.  Use ``--ftol=v`` and ``--xtol=v`` from the command line.
263
264From the command line, ``--keep_best`` uses a value near the previous minimum
265when restarting instead of using a random value within the parameter bounds.
266
267Use ``--fit=newton`` to select BFGS from the commandline.
268
269References
270----------
271
272.. [Dennis1987]
273    Dennis, J. E.; Schnabel, R. B.
274    *Numerical Methods for Unconstrained Optimization and Nonlinear Equations*;
275    Society for Industrial and Applied Mathematics: Philadelphia, 1987.
276
277
278.. _fit-de:
279
280Differential Evolution
281======================
282
283.. image:: fit-de.png
284    :alt: Differential Evolution option screen.
285    :align: left
286
287Differential evolution is a population based algorithm which uses differences
288between points as a guide to selecting new points.  For each member of the
289population a pair of points is chosen at random, and a difference vector is
290computed.  This vector is scaled, and a random subset of its components are
291added to the current point based on crossover ratio. This new point is
292evaluated, and if its value is lower than the current point, it replaces
293it in the population.   There are many variations available within DE that
294have not been exposed in Bumps.  Interested users can modify
295:class:`bumps.fitters.DEFit` and experiment with different crossover and
296mutation algorithms, and perhaps add them as command line options.
297
298Differential evolution is a robust directed search strategy.  Early in the
299search, when the population is disperse, the difference vectors are large
300and the search remains broad.  As the search progresses, more of the
301population goes into the valleys and eventually all the points end up in
302local minima.  Now the differences between random pairs will often be small
303and the search will become more localized.
304
305The population is initialized according to the prior probability distribution
306for each each parameter.  That is, if the parameter is bounded, it will use
307a uniform random number generate within the bounds.  If it is unbounded, it
308will use a uniform value in [0,1].  If the parameter corresponds to the result
309of a previous measurement with mean $\mu$ and standard deviation $\sigma$,
310then the initial values will be pulled from a gaussian random number generator.
311
312When to use
313-----------
314
315Convergence with differential evolution will be slower, but more robust.
316
317Each update will evaluate $k$ points in parallel, where $k$ is the size
318of the population.
319
320Options
321-------
322
323*Steps* is the number of iterations.  Each step updates each member
324of the population.  The population size scales with the number of fitted
325parameters. Use ``--steps=n`` from the command line.
326
327*Population* determines the size of the population.  The number of
328individuals, $k$, is equal to the number of fitted parameters times the
329population scale factor.  Use ``--pop=k`` from the command line.
330
331*Crossover ratio* determines what proportion of the dimensions to update
332at each step.  Smaller values will likely lead to slower convergence, but
333more robust results.  Values must be between 0 and 1.  Use ``--CR=v`` from
334the command line.
335
336*Scale* determines how much to scale each difference vector before adding
337it to the candidate point.  The selected mutation algorithm chooses a scale
338factor uniformly in $[0,F]$.  Use ``--F=v`` from the command line.
339
340*f(x) tolerance* and *x tolerance* are used to determine when the
341fit has reached the point where no significant improvement is expected.
342If the population is flat (that is, the minimum and maximum values are
343within tolerance) and tiny (that is, all the points are close to each
344other) then the fit will terminate.  Use ``ftol=v`` and ``xtol=v`` from the
345command line.
346
347Use ``--fit=de`` to select diffrential evolution from the commandline.
348
349References
350----------
351
352.. [Storn1997]
353    Storn, R.; Price, K.
354    *Journal of Global Optimization*
355    1997, 11 (4), 341–359.
356    DOI: `10.1023/A:1008202821328 <http://dx.doi.org/10.1023/A:1008202821328>`_
357
358
359
360.. _fit-dream:
361
362DREAM
363=====
364
365.. image:: fit-dream.png
366    :alt: DREAM option screen.
367    :align: left
368
369DREAM is a population based algorithm like differential evolution, but
370instead of only keeping individuals which improve each generation, it
371will sometimes keep individuals which get worse.  Although it is not
372fast and does not give the very best value for the function, we have
373found it to be a robust fitting engine which will give a good value given
374enough time.
375
376The progress of each individual in the population from generation to
377generation can considered a Markov chain, whose transition probability
378is equal to the probability of taking the step times the probability
379that it keeps the step based on the difference in value between the points.
380By including a purely random stepper with some probability, the detailed
381balance condition is preserved, and the Markov chain converges onto
382the underlying equilibrium distribution.  If the theory function represents
383the conditional probability of selecting each point in the parameter
384space, then the resulting chain is a random draw from the posterior
385distribution.
386
387This means that the DREAM algorithm can be used to determine the parameter
388uncertainties.  Unlike the hessian estimate at the minimum that is
389used to report uncertainties from the other fitters, the resulting
390uncertainty need not gaussian.  Indeed, the resulting distribution can
391even be multi-modal.  Fits to measured data using theory functions that
392have symmetric solutions have shown all equivalent solutions with approximately
393equal probability.
394
395When to use
396-----------
397
398Use DREAM when you need a robust fitting algorithm.  It takes longer but
399it does an excellent job of exploring different minima and getting close
400to the global optimum.
401
402Use DREAM when you want a detailed analysis of the parameter uncertainty.
403
404Like differential evolution, DREAM will evaluate $k$ points in parallel,
405where $k$ is the size of the population.
406
[e81f6dd]407.. _option-burn:
408
[49148bb]409Options
410-------
411
412*Samples* is the number of points to be drawn from the Markov chain.
413To estimate the 68% interval to two digits of precision, at least
4141e5 (or 100,000) samples are needed.  For the 95% interval, 1e6
415(or 1,000,000) samples are needed.  The default 1e4 samples
416gives a rough approximation of the uncertainty relatively quickly.
417Use ``--samples=n`` from the command line.
418
419*Burn-in Steps* is the number of iterations to required for the Markov
420chain to converge to the equilibrium distribution.  If the fit ends
421early, the tail of the burn will be saved to the start of the steps.
422Use ``--burn=n`` from the command line.
423
424*Population* determines the size of the population.  The number of
425individuals, $k$, is equal to the number of fitted parameters times the
426population scale factor.  Use ``--pop=k`` from the command line.
427
428*Initializer* determines how the population will be initialized.
429The options are as follows:
430
431     *eps* (epsilon ball), in which the entire initial population is chosen
432     at random from within a tiny hypersphere centered about the initial point
433
434     *lhs* (latin hypersquare), which chops the bounds within each dimension
435     in $k$ equal sized chunks where $k$ is the size of the population and
436     makes sure that each parameter has at least one value within each chunk
437     across the population.
438
439     *cov* (covariance matrix), in which the uncertainty is estimated using
440     the covariance matrix at the initial point, and points are selected
441     at random from the corresponding gaussian ellipsoid
442
443     *random* (uniform random), in which the points are selected at random
444     within the bounds of the parameters
445
446Use ``--init=type`` from the command line.
447
448
449*Thinning* is the amount of thinning to use when collecting the
450population.  If the fit is somewhat stuck, with most steps not improving
451the fit, then you will need to thin the population to get proper
452statistics.  Use ``--thin=k`` from the command line.
453
454*Calculate entropy*, if true, computes the entropy for the fit.  This is
455an estimate of the amount of information in the data.  Use ``--entropy``
456from the command line.
457
458*Steps*, if not zero, determines the number of iterations to use for
459drawing samples after burn in. Each iteration updates the full population,
460which is (population x number of fitted parameters) points. This option
461is available for compatibility; it is more useful to set the number of
462samples directly.  Use ``--steps=n`` from the command line.
463
464Use ``--fit=dream`` to select DREAM from the commandline.
465
466Output
467------
468
469DREAM produces a number of different outputs, and there are a number of
470things to check before using its reported uncertainty values.  The main
471goal of selecting ``--burn=n`` is to wait long enough to reach the
472equilibrium distribution.
473
474.. figure:: dream-incomplete.png
475    :alt: example of incomplete fit
476
477    This DREAM fit is incomplete, as can be seen on all four plots.  The
478    *Convergence* plot is still decreasing, *Parameter Trace* plot does not
479    show random mixing of Markov chain values, the *Correlations* plots are
480    fuzzy and mostly empty, the *Uncertainty* plot shows black histograms
481    (indicating that there are a few stray values far away from the best) and
482    green maximum likelihood spikes not matching the histogram (indicating
483    that the region around the best value has not been adequately explored).
484
485.. figure:: dream-complete.png
486    :alt: example of a completed fit
487
488    This DREAM fit completed successfully.  The *Convergence* plot is flat,
489    the *Parameter Trace* plot is flat and messy, the *Correlateions* plots
490    show nice blobs (and a bit of correlation between the *M1.radius* parameter
491    and the *M1.radius.width* parameter), and the uncertainty plots show
492    a narrow range of -log(P) values in the mostly brown histograms and
493    a good match to the green constrained maximum likelihood line.
494
495For each parameter in the fit, DREAM finds the mean, median and best value,
496as well as the 68% and 95% credible intervals.  The mean value is
497defined as $\int x P(x) dx$, which is just the expected value of the
498probability distribution for the parameter.  The median value is the 50%
499point in the probability distribution, and the best value is the maximum
500likelihood value seen in the random walk.  The credible intervals are the
501central intervals which capture 68% and 95% of the parameter values
502respectively.  You need approximately 100,000 samples to get two digits of
503precision on the 68% interval, and 1,000,000 samples for the 95% interval.
504
505.. table:: Example fit output
506
507    = =============== ============ ======== ======== ================= =================
508    #  Parameter         mean       median    best   [   68% interval] [   95% interval]
509    = =============== ============ ======== ======== ================= =================
510    1   M1.background 0.059925(41) 0.059924 0.059922 [0.05988 0.05997] [0.05985 0.06000]
511    2       M1.radius   2345.3(15) 2345.234 2345.174 [2343.83 2346.74] [2342.36 2348.29]
512    3 M1.radius.width  0.00775(41)  0.00774  0.00777 [ 0.0074  0.0081] [ 0.0070  0.0086]
513    4        M1.scale  0.21722(20) 0.217218 0.217244 [0.21702 0.21743] [0.21681 0.21761]
514    = =============== ============ ======== ======== ================= =================
515
516The *Convergence* plot shows the range of $\chi^2$ values in the population
517for each iteration.  The band shows the 68% of values around the median, and
518the solid line shows the minimum value.  If the distribution has reached
519equilibrium, then convergence graph should be roughly flat, with little
520change in the minimum value throughout the graph.  If there is no convergence,
521then the remaining plots don't mean much.
522
523The *Correlations* plot shows cross correlation between each pair of
524parameters.  If the parameters are completely uncorrelated then the boxes
525should contain circles.  Diagonals indicate strong correlation.  Square
526blocks indicate that the fit is not sensitive to one of the parameters.
527The range plotted on the correlation plot is determined by the 95% interval
528of the data.  The individual correlation plots are too small to show the
529range of values for the parameters.  These can instead be read from the
530*Uncertainty* plot for each parameter, which covers the same range of values
531and indicates 68% and 95% intervals.  If there are some chains that are
532wandering around away from the minimum, then the plot will look fuzzy, and
533not have a nice blob in the center.  If a correlation plot has multiple blobs,
534then there are multiple minima in your problem space, usually because there
535are symmetries in the problem definition.  For example, a model fitting
536$x + a^2$ will have identical solutions for $\pm\,a$.
537
538The *Uncertainty* plot shows histograms for each fitted parameter generated
539from the values for that parameter across all chains.  Within each histogram
540bar the values are sorted and displayed as a gradient from black to copper,
541with black values having the lowest $\chi^2$ and copper values having the
542highest.  The resulting histogram should be dark brown, with a black hump
543in the center and light brown tips.  If there are large lumps of light brown,
544or excessive black then its likely that the optimizer did not converge.  The
545green line over the histogram shows the best value seen within each
546histogram bin (the maximum likelihood given $p_k == x$).
547With enough samples and proper convergence, it should roughly follow the
548outline of the histogram.  The yellow band in the center of the plot
549represents the 68% interval for the data.  The histogram cuts off at 95%.
550These values along with the median are shown as labels along the x axis.
551The green asterisk represents the best value, the green *E* the mean value
552and the vertical green line the median value.  If the fit is not sensitive
553to a parameter, or if two parameters are strongly correlated, the parameter
554histogram will show a box rather than a hump.  Spiky shapes (either in the
555histogram or the maximum likelihood line) indicate lack of convergence or
556maybe not enough steps.  A chopped histograms indicates that the range for
557that parameter is too small.
558
559The *Parameter Trace* plot is diagnostic for models which have poor mixing.
560In this cases no matter how the parameter values are changing, they are
561landing on much worse values for the $\chi^2$.  This can happen if the
562problem is highly constrained with many tight and twisty values.
563
564The *Data and Theory* plot should show theory and data lining up pretty well,
565with the theory overlaying about 2/3 of the error bars on the data
566(1-\ $\sigma$ = 68%).  The *Residuals* plot shows the difference between
567theory and data divided by uncertainty.  The residuals should be 2/3 within
568[-1, 1], They should not show any structure, such as humps where the theory
569misses the data for long stretches.  This indicates some feature missing
570from the model, or a lack of convergence to the best model.
571
572If entropy is requested, then bumps will show the total number of bits of
573information in the fit.  This derives from the entropy term:
574
575.. math:
576
577    S = \int_\Theta p(\Theta) \log p(\Theta) d\Theta
578
579Using entropy and simulation we hope to be able to make experiment
580planning decisions in a way that maximizes information, by estimating
581whether it is better to measure more precisely or to measure different
582but related values and fit them with shared parameters.
583
584
585References
586----------
587
588.. [Vrugt2009]
589    Vrugt, J. A.; Ter Braak, C. J. F.; Diks, C. G. H.; Robinson, B. A.;
590    Hyman, J. M.; Higdon, D.
591    *International Journal of Nonlinear Sciences and Numerical Simulation*
592    2009, 10 (3), 273–290.
593    DOI: `10.1515/IJNSNS.2009.10.3.273 <http://dx.doi.org/10.1515/IJNSNS.2009.10.3.273>`_
594
595.. [Kramer2010]
596    Kramer, A.; Hasenauer, J.; Allgower, F.; Radde, N.
597    *In 2010 IEEE International Conference on Control Applications (CCA)*
598    2010; pp 493–498.
599    DOI: `10.1109/CCA.2010.5611198 <http://dx.doi.org/10.1109/CCA.2010.5611198>`_
600
601.. [JCGM2008]
602    JCGM.
603    *Evaluation of measurement data — Supplement 1 to the “Guide to the
604    expression of uncertainty in measurement” — Propagation of distributions
605    using a Monte Carlo method*; Joint Committee for Guides in Metrology,
606    JCGM 101:2008; Geneva, Switzerland, 2008; p 90.
607    `<http://www.bipm.org/utils/common/documents/jcgm/JCGM_101_2008_E.pdf>`_
608
609
610
611.. _fit-ps:
612
613Particle Swarm
614==============
615
616Inspired by bird flocking behaviour, the particle swarm algorithm is a
617population-based method which updates an individual according to its
618momentum and a force toward the current best fit parameter values.  We
619did not explore variations of this algorithm in any detail.
620
621When to use
622-----------
623
624Particle swarm performed well enough in our low dimensional test problems,
625but made little progress when more fit parameters were added.
626
627The population updates can run in parallel, but the tiny population size
628limits the amount of parallelism.
629
630Options
631-------
632
633``--steps=n`` is the number of iterations.  Each step updates each member
634of the population.  The population size scales with the number of fitted
635parameters.
636
637``--pop=k`` determines the size of the population.  The number of
638individuals, $k$, is equal to the number of fitted parameters times the
639population scale factor.  The default scale factor is 1.
640
641Use ``--fit=ps`` to select particle swarm from the commandline.
642
643Add a few more lines
644
645References
646----------
647
648.. [Kennedy1995]
649    Kennedy, J.; Eberhart, R.
650    Particle Swarm Optimization
651    *Proceedings of IEEE International Conference on Neural Networks. IV.*
652    1995; pp 1942–1948.
653    DOI: `10.1109/ICNN.1995.48896 <http://dx.doi.org/810.1109/ICNN.1995.488968>`_
654
655
656.. _fit-rl:
657
658Random Lines
659============
660
661Most of the population based algorithms ignore the value of the function
662when choosing the points in the next iteration.  Random lines is a new
663style of algorithm which fits a quadratic model to a selection from the
664population, and uses that model to propose a new point in the next
665generation of the population.  The hope is that the method will inherit
666the robustness of the population based algorithms as well as the rapid
667convergence of the newton descent algorithms.
668
669When to use
670-----------
671
672Random lines works very well for some of our test problems, showing
673rapid convergence to the optimum, but on other problems it makes
674very little progress.
675
676The population updates can run in parallel.
677
678Options
679-------
680
681``--steps=n`` is the number of iterations.  Each step updates each member
682of the population.  The population size scales with the number of fitted
683parameters.
684
685``--pop=k`` determines the size of the population.  The number of
686individuals, $k$, is equal to the number of fitted parameters times the
687population scale factor.  The default scale factor is 0.5.
688
689``--CR=v`` is the crossover ratio, determining what proportion of the
690dimensions to update at each step.  Values must be between 0 and 1.
691
692``--starts=n`` tells the optimizer to restart a given number of times.
693Each time it restarts it uses a random starting point.
694
695``--keep_best`` uses a value near the previous minimum when restarting
696instead of using a random value within the parameter bounds.  This option is
697not available in the options dialog.
698
699Use ``--fit=rl`` to select random lines from the commandline.
700
701References
702----------
703
704.. [Sahin2013]
705
706    Sahin, I.
707    *An International Journal of Optimization and Control:  Theories & Applications (IJOCTA)*
708    2013, 3 (2), 111–119.
709
710
711
712.. _fit-pt:
713
714Parallel Tempering
715==================
716
717Parallel tempering is an MCMC algorithm for uncertainty analysis.  This
718version runs at multiple temperatures simultaneously, with chains at high
719temperature able to more easily jump between minima and chains at low
720temperature to fully explore the minima.  Like :ref:`fit-dream` it has a
721differential evolution stepper, but this version uses the chain history
722as the population rather than maintaining a population at each temperature.
723
724This is an experimental algorithm which does not yet perform well.
725
726When to use
727-----------
728
729When complete, parallel tempering should be used for problems with widely
730spaced local minima which dream cannot fit.
731
732Options
733-------
734
735``--steps=n`` is the number of iterations to include in the Markov
736chain.  Each iteration updates the full population.  The population size
737scales with the number of fitted parameters.
738
739``--burn=n`` is the number of iterations to required for the Markov
740chain to converge to the equilibrium distribution.  If the fit ends
741early, the tail of the burn will be saved to the start of the steps.
742
743``--CR=v`` is the differential evolution crossover ratio to use when
744computing step size and direction.  Use a small value to step through the
745dimensions one at a time, or a large value to step through all at once.
746
747``-nT=k``, ``-Tmin=v`` and ``--Tmax=v`` specify a log-spaced initial
748distribution of temperatures.  The default is 25 points between
7490.1 and 10.  :ref:`fit-dream` runs at a fixed temperature of 1.0.
750
751Use ``--fit=pt`` to select parallel tempering from the commandline.
752
753References
754----------
755
756.. [Swendsen1986]
757    Swendsen, R. H.; Wang J. S.
758    Replica Monte Carlo simulation of spin glasses
759    *Physical Review Letters*
760    1986, 57, 2607-2609
761
762
763..
764    SNOBFIT (fit=snobfit) attempts to construct a locally quadratic model of
765    the entire search space.  While promising because it can begin to offer
766    some guarantees that the search is complete given reasonable assumptions
767    about the fitting surface, initial trials did not perform well and the
768    algorithm has not yet been tuned to our problems.
769
Note: See TracBrowser for help on using the repository browser.