source: sasview/docs/sphinx-docs/source/user/sasgui/perspectives/fitting/optimizer.rst @ 197260f

ESS_GUIESS_GUI_DocsESS_GUI_batch_fittingESS_GUI_bumps_abstractionESS_GUI_iss1116ESS_GUI_iss879ESS_GUI_iss959ESS_GUI_openclESS_GUI_orderingESS_GUI_sync_sascalccostrafo411magnetic_scattrelease-4.1.1release-4.1.2release-4.2.2release_4.0.1ticket-1009ticket-1094-headlessticket-1242-2d-resolutionticket-1243ticket-1249ticket885unittest-saveload
Last change on this file since 197260f was 49148bb, checked in by smk78, 9 years ago

Add in guiframe related docs files

  • Property mode set to 100644
File size: 31.8 KB
RevLine 
[49148bb]1.. _optimizer-guide:
2
3*******************
4Optimizer Selection
5*******************
6
7Bumps has a number of different optimizers available, each with its own
8control parameters:
9
10* :ref:`fit-lm`
11* :ref:`fit-amoeba`
12* :ref:`fit-dream`
13* :ref:`fit-de`
14* :ref:`fit-newton`
15* :ref:`fit-rl` [experimental]
16* :ref:`fit-ps` [experimental]
17* :ref:`fit-pt` [experimental]
18
19In general there is a trade-off between convergence
20rate and robustness, with the fastest algorithms most likely to find a
21local minimum rather than a global minimum.   The gradient descent algorithms
22(:ref:`fit-lm`, :ref:`fit-newton`) tend to be fast but they will find local
23minima only, while the population algorithms (:ref:`fit-dream`, :ref:`fit-de`)
24are more robust and likely slower.   :ref:`fit-amoeba` is somewhere between,
25with a small population keeping the search local but more robust than the
26gradient descent algorithms.
27
28Each algorithm has its own set of control parameters for adjusting the
29search process and the stopping conditions.  The same option may mean
30slightly different things to different optimizers.  The bumps package
31provides a dialog box for selecting the optimizer and its options
32when running the fit wx application.  This only includes the common options
33for the most useful optimizers.  For full control, the fit will need to
34be run from the command line interface or through a python script.
35
36For parameter uncertainty, most algorithms use the covariance matrix at
37the optimum to estimate an uncertainty ellipse.  This is okay for a
38preliminary analysis, but only works reliably for weakly correlated parameters.
39For full uncertainty analysis, :ref:`fit-dream` uses a random walk to explore
40the parameter space near the minimum, showing pair-wise correlations
41amongst the parameter values.  In order for :ref:`fit-dream` to return the
42correct uncertainy, the function to be optimized should be a conditional
43probability density, with *nllf* as the negative log likelihood function
44of seeing point $x$ in the parameter space.  Other functions
45can be fitted, but uncertainty estimates will be meaningless.
46
47Most algorithms have been adapted to run in parallel at least to some degree.
48The  implementation is not heavily tuned, either in terms of minimizing the
49overhead per function evaluation or for distributing the problem across
50multiple processors.   If the theory function is implemented in parallel,
51then the optimizer should be run in serial.  Mixed mode is also possible
52when running on a cluster with a multi-threaded theory function.  In this
53case, only one theory function will be evaluated on each cluster node, but
54the optimizer will distribute the parameters values to the cluster nodes
55in parallel.  Do not run serial algorithms (:ref:`fit-lm`, :ref:`fit-newton`) on
56a cluster.
57
58We have included a number of optimizers in Bumps that did not perform
59particularly well on our problem sets.  However, they may be perfect
60for your problem, so we have left them in the package for you to explore.
61They are not available in the GUI selection.
62
63.. _fit-lm:
64
65Levenberg-Marquardt
66===================
67
68.. image:: fit-lm.png
69    :alt: Levenberg-Marquardt option screen.
70    :align: left
71
72The Levenberg-Marquardt algorithm has been
73the standard method for non-linear data fitting.  As a gradient descent
74trust region method, it starts at the initial value of the function and
75steps in the direction of the derivative until it reaches the minimum.
76Set up as an explicit minimization of the sum of square differences between
77theory and model, it uses a numerical approximation of the Jacobian matrix
78to set the step direction and an adaptive algorithm to set the size of
79the trust region.
80
81When to use
82-----------
83
84Use this method when you have a reasonable fit near the minimum, and
85you want to get the best possible value.  This can then be used as the starting
86point for uncertainty analysis using :ref:`fit-dream`.  This method requires
87that the problem definition includes a *residuals* method, but this should
88always be true when fitting data.
89
90When modeling the results of an experiment, the best fit value is an
91accident of the measurement.  Redo the same measurement, and the slightly
92different values you measure will lead to a different best fit.  The
93important quantity to report is the credible interval covering
9468%  (1-\ $\sigma$) or 95% (2-\ $\sigma$\ ) of the range of
95parameter values that are somewhat consistent with the data.
96
97This method uses *lmfit* from *scipy*, and does not run in parallel.
98
99Options
100-------
101
102*Steps* is the number of gradient steps to take.  Each step requires
103a calculation of the Jacobian matrix to determine the direction.  This
104needs $2 m n$ function evaluations, where $n$ is the number of parameters and
105each function is evaluated and $m$ data points (assuming center point
106formula for finite difference estimate of the derivative).  The resulting
107linear equation is then solved, but for small $n$ and expensive function
108evaluation this overhead can be ignored.  Use ``--steps=n`` from
109the command line.
110
111*f(x) tolerance* and *x tolerance* are used to determine when
112the fit has reached the point where no significant improvement is expected.
113If the function value does not improve significantly within the step, or
114the step is too short, then the fit will terminate.  Use ``--ftol=v`` and
115``--xtol=v`` from the command line.
116
117From the command line, ``--starts=n`` will automatically restart the algorithm
118after it has converged so that a slightly better value can be found. If
119``--keep_best`` is included then restart will use a value near the minimum,
120otherwise it will restart the fit from a random point in the parameter space.
121
122Use ``--fit=lm`` to select the Levenberg-Marquardt fitter from the command line.
123
124References
125----------
126
127.. [Levenberg1944]
128    Levenberg, K.
129    *Quarterly Journal of Applied Mathmatics*
130    1944, II (2), 164–168.
131
132.. [Marquardt1963]
133    Marquardt, D. W.
134    *Journal of the Society for Industrial and Applied Mathematics*
135    1963, 11 (2), 431–441.
136    DOI: `10.1137/0111030 <http://dx.doi.org/10.1137/0111030>`_
137
138.. _fit-amoeba:
139
140Nelder-Mead Simplex
141===================
142
143.. image:: fit-amoeba.png
144    :alt: Nelder-Mead Simplex option screen.
145    :align: left
146
147The Nelder-Mead downhill simplex algorithm is a robust optimizer which
148does not require the function to be continuous or differentiable.
149It uses the relative values of the function at the corners of a
150simplex (an n-dimensional triangle) to decide which points of the simplex
151to update.  It will take the worst value and try moving it inward or
152outward, or reflect it through the centroid of the remaining values
153stopping if it finds a better value.  If none of these values are
154better, then it will shrink the simplex and start again.  The name
155amoeba comes from the book *Numerical Recipes* [Press1992]_ wherein they
156describe the search as acting like an amoeba, squeezing through narrow valleys
157as it makes its way down to the minimum.
158
159When to use
160-----------
161
162Use this method as a first fit to your model.  If your fitting function
163is well behaved with few local minima this will give a quick estimate of
164the model, and help you decide if the model needs to be refined.  If your
165function is poorly behaved, you will need to select a good initial value
166before fitting, or use a more robust method such
167as :ref:`fit-de` or :ref:`fit-dream`.
168
169The uncertainty reported comes from a numerical derivative estimate at the
170minimum.
171
172This method requires a series of function updates, and does not benefit
173much from running in parallel.
174
175Options
176-------
177
178*Steps* is the simplex update iterations to perform.  Most updates
179require one or two function evaluations, but shrinking the simplex evaluates
180every value in the simplex. Use ``--steps=n`` from the command line.
181
182*Starts* tells the optimizer to restart a given number of times.
183Each time it restarts it uses a random starting point.   Use
184``--starts=n`` from the command line.
185
186*Simplex radius* is the initial size of the simplex, as a portion of
187the bounds defining the parameter space.  If a parameter is unbounded, then
188the radius will be treated as a portion of the parameter value. Use
189``--radius=n`` from the command line.
190
191*x tolerance* and *f(x) tolerance* are used to determine when the
192fit has reached the point where no significant improvement is expected.
193If the simplex is tiny (that is, the corners are close to each other) and
194flat (that is, the values at the corners are close to each other),
195then the fit will terminate.  Use ``--xtol=v`` and ``--ftol=v`` from
196the command line.
197
198From the command line, use ``--keep_best`` so that restarts are centered on a
199value near the minimum rather than restarting from a random point within the
200parameter bounds.
201
202Use ``--fit=amoeba`` to select the Nelder-Mead simplex fitter from the
203command line.
204
205References
206----------
207
208.. [Nelder1965]
209    Nelder, J. A.; Mead, R.
210    *The Computer Journal*
211    1965, 7 (4), 308–313.
212    DOI: `10.1093/comjnl/7.4.308 <http://dx.doi.org/10.1093/comjnl/7.4.308>`_
213
214.. [Press1992]
215   Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; Vetterling, W. T.
216   In *Numerical Recipes in C: The Art of Scientific Computing, Second Edition*;
217   Cambridge University Press: Cambridge; New York, 1992; pp 408–412.
218
219
220.. _fit-newton:
221
222Quasi-Newton BFGS
223=================
224
225.. image:: fit-newton.png
226    :alt: Quasi-Newton BFGS option screen.
227    :align: left
228
229Broyden-Fletcher-Goldfarb-Shanno is a gradient descent method which uses the
230gradient to determine the step direction and an approximation of the Hessian
231matrix to estimate the curvature and guess a step size.  The step is further
232refined with a one-dimensional search in the direction of the gradient.
233
234When to use
235-----------
236
237Like :ref:`fit-lm`, this method converges quickly to the minimum.  It does
238not assume that the problem is in the form of a sum of squares and does not
239require a *residuals* method.
240
241The $n$ partial derivatives are computed in parallel.
242
243Options
244-------
245
246*Steps* is the number of gradient steps to take.  Each step requires
247a calculation of the Jacobian matrix to determine the direction.  This
248needs $2 m n$ function evaluations, where $n$ is the number of parameters and
249each function is evaluated and $m$ data points (assuming center point
250formula for finite difference estimate of the derivative).  The resulting
251linear equation is then solved, but for small $n$ and expensive function
252evaluation this overhead can be ignored.
253Use ``--steps=n`` from the command line.
254
255*Starts* tells the optimizer to restart a given number of times.
256Each time it restarts it uses a random starting point.
257Use ``--starts=n`` from the command line.
258
259*f(x) tolerance* and *x tolerance* are used to determine when
260the fit has reached the point where no significant improvement is expected.
261If the function is small or the step is too short then the fit
262will terminate.  Use ``--ftol=v`` and ``--xtol=v`` from the command line.
263
264From the command line, ``--keep_best`` uses a value near the previous minimum
265when restarting instead of using a random value within the parameter bounds.
266
267Use ``--fit=newton`` to select BFGS from the commandline.
268
269References
270----------
271
272.. [Dennis1987]
273    Dennis, J. E.; Schnabel, R. B.
274    *Numerical Methods for Unconstrained Optimization and Nonlinear Equations*;
275    Society for Industrial and Applied Mathematics: Philadelphia, 1987.
276
277
278.. _fit-de:
279
280Differential Evolution
281======================
282
283.. image:: fit-de.png
284    :alt: Differential Evolution option screen.
285    :align: left
286
287Differential evolution is a population based algorithm which uses differences
288between points as a guide to selecting new points.  For each member of the
289population a pair of points is chosen at random, and a difference vector is
290computed.  This vector is scaled, and a random subset of its components are
291added to the current point based on crossover ratio. This new point is
292evaluated, and if its value is lower than the current point, it replaces
293it in the population.   There are many variations available within DE that
294have not been exposed in Bumps.  Interested users can modify
295:class:`bumps.fitters.DEFit` and experiment with different crossover and
296mutation algorithms, and perhaps add them as command line options.
297
298Differential evolution is a robust directed search strategy.  Early in the
299search, when the population is disperse, the difference vectors are large
300and the search remains broad.  As the search progresses, more of the
301population goes into the valleys and eventually all the points end up in
302local minima.  Now the differences between random pairs will often be small
303and the search will become more localized.
304
305The population is initialized according to the prior probability distribution
306for each each parameter.  That is, if the parameter is bounded, it will use
307a uniform random number generate within the bounds.  If it is unbounded, it
308will use a uniform value in [0,1].  If the parameter corresponds to the result
309of a previous measurement with mean $\mu$ and standard deviation $\sigma$,
310then the initial values will be pulled from a gaussian random number generator.
311
312When to use
313-----------
314
315Convergence with differential evolution will be slower, but more robust.
316
317Each update will evaluate $k$ points in parallel, where $k$ is the size
318of the population.
319
320Options
321-------
322
323*Steps* is the number of iterations.  Each step updates each member
324of the population.  The population size scales with the number of fitted
325parameters. Use ``--steps=n`` from the command line.
326
327*Population* determines the size of the population.  The number of
328individuals, $k$, is equal to the number of fitted parameters times the
329population scale factor.  Use ``--pop=k`` from the command line.
330
331*Crossover ratio* determines what proportion of the dimensions to update
332at each step.  Smaller values will likely lead to slower convergence, but
333more robust results.  Values must be between 0 and 1.  Use ``--CR=v`` from
334the command line.
335
336*Scale* determines how much to scale each difference vector before adding
337it to the candidate point.  The selected mutation algorithm chooses a scale
338factor uniformly in $[0,F]$.  Use ``--F=v`` from the command line.
339
340*f(x) tolerance* and *x tolerance* are used to determine when the
341fit has reached the point where no significant improvement is expected.
342If the population is flat (that is, the minimum and maximum values are
343within tolerance) and tiny (that is, all the points are close to each
344other) then the fit will terminate.  Use ``ftol=v`` and ``xtol=v`` from the
345command line.
346
347Use ``--fit=de`` to select diffrential evolution from the commandline.
348
349References
350----------
351
352.. [Storn1997]
353    Storn, R.; Price, K.
354    *Journal of Global Optimization*
355    1997, 11 (4), 341–359.
356    DOI: `10.1023/A:1008202821328 <http://dx.doi.org/10.1023/A:1008202821328>`_
357
358
359
360.. _fit-dream:
361
362DREAM
363=====
364
365.. image:: fit-dream.png
366    :alt: DREAM option screen.
367    :align: left
368
369DREAM is a population based algorithm like differential evolution, but
370instead of only keeping individuals which improve each generation, it
371will sometimes keep individuals which get worse.  Although it is not
372fast and does not give the very best value for the function, we have
373found it to be a robust fitting engine which will give a good value given
374enough time.
375
376The progress of each individual in the population from generation to
377generation can considered a Markov chain, whose transition probability
378is equal to the probability of taking the step times the probability
379that it keeps the step based on the difference in value between the points.
380By including a purely random stepper with some probability, the detailed
381balance condition is preserved, and the Markov chain converges onto
382the underlying equilibrium distribution.  If the theory function represents
383the conditional probability of selecting each point in the parameter
384space, then the resulting chain is a random draw from the posterior
385distribution.
386
387This means that the DREAM algorithm can be used to determine the parameter
388uncertainties.  Unlike the hessian estimate at the minimum that is
389used to report uncertainties from the other fitters, the resulting
390uncertainty need not gaussian.  Indeed, the resulting distribution can
391even be multi-modal.  Fits to measured data using theory functions that
392have symmetric solutions have shown all equivalent solutions with approximately
393equal probability.
394
395When to use
396-----------
397
398Use DREAM when you need a robust fitting algorithm.  It takes longer but
399it does an excellent job of exploring different minima and getting close
400to the global optimum.
401
402Use DREAM when you want a detailed analysis of the parameter uncertainty.
403
404Like differential evolution, DREAM will evaluate $k$ points in parallel,
405where $k$ is the size of the population.
406
407Options
408-------
409
410*Samples* is the number of points to be drawn from the Markov chain.
411To estimate the 68% interval to two digits of precision, at least
4121e5 (or 100,000) samples are needed.  For the 95% interval, 1e6
413(or 1,000,000) samples are needed.  The default 1e4 samples
414gives a rough approximation of the uncertainty relatively quickly.
415Use ``--samples=n`` from the command line.
416
417*Burn-in Steps* is the number of iterations to required for the Markov
418chain to converge to the equilibrium distribution.  If the fit ends
419early, the tail of the burn will be saved to the start of the steps.
420Use ``--burn=n`` from the command line.
421
422*Population* determines the size of the population.  The number of
423individuals, $k$, is equal to the number of fitted parameters times the
424population scale factor.  Use ``--pop=k`` from the command line.
425
426*Initializer* determines how the population will be initialized.
427The options are as follows:
428
429     *eps* (epsilon ball), in which the entire initial population is chosen
430     at random from within a tiny hypersphere centered about the initial point
431
432     *lhs* (latin hypersquare), which chops the bounds within each dimension
433     in $k$ equal sized chunks where $k$ is the size of the population and
434     makes sure that each parameter has at least one value within each chunk
435     across the population.
436
437     *cov* (covariance matrix), in which the uncertainty is estimated using
438     the covariance matrix at the initial point, and points are selected
439     at random from the corresponding gaussian ellipsoid
440
441     *random* (uniform random), in which the points are selected at random
442     within the bounds of the parameters
443
444Use ``--init=type`` from the command line.
445
446
447*Thinning* is the amount of thinning to use when collecting the
448population.  If the fit is somewhat stuck, with most steps not improving
449the fit, then you will need to thin the population to get proper
450statistics.  Use ``--thin=k`` from the command line.
451
452*Calculate entropy*, if true, computes the entropy for the fit.  This is
453an estimate of the amount of information in the data.  Use ``--entropy``
454from the command line.
455
456*Steps*, if not zero, determines the number of iterations to use for
457drawing samples after burn in. Each iteration updates the full population,
458which is (population x number of fitted parameters) points. This option
459is available for compatibility; it is more useful to set the number of
460samples directly.  Use ``--steps=n`` from the command line.
461
462Use ``--fit=dream`` to select DREAM from the commandline.
463
464Output
465------
466
467DREAM produces a number of different outputs, and there are a number of
468things to check before using its reported uncertainty values.  The main
469goal of selecting ``--burn=n`` is to wait long enough to reach the
470equilibrium distribution.
471
472.. figure:: dream-incomplete.png
473    :alt: example of incomplete fit
474
475    This DREAM fit is incomplete, as can be seen on all four plots.  The
476    *Convergence* plot is still decreasing, *Parameter Trace* plot does not
477    show random mixing of Markov chain values, the *Correlations* plots are
478    fuzzy and mostly empty, the *Uncertainty* plot shows black histograms
479    (indicating that there are a few stray values far away from the best) and
480    green maximum likelihood spikes not matching the histogram (indicating
481    that the region around the best value has not been adequately explored).
482
483.. figure:: dream-complete.png
484    :alt: example of a completed fit
485
486    This DREAM fit completed successfully.  The *Convergence* plot is flat,
487    the *Parameter Trace* plot is flat and messy, the *Correlateions* plots
488    show nice blobs (and a bit of correlation between the *M1.radius* parameter
489    and the *M1.radius.width* parameter), and the uncertainty plots show
490    a narrow range of -log(P) values in the mostly brown histograms and
491    a good match to the green constrained maximum likelihood line.
492
493For each parameter in the fit, DREAM finds the mean, median and best value,
494as well as the 68% and 95% credible intervals.  The mean value is
495defined as $\int x P(x) dx$, which is just the expected value of the
496probability distribution for the parameter.  The median value is the 50%
497point in the probability distribution, and the best value is the maximum
498likelihood value seen in the random walk.  The credible intervals are the
499central intervals which capture 68% and 95% of the parameter values
500respectively.  You need approximately 100,000 samples to get two digits of
501precision on the 68% interval, and 1,000,000 samples for the 95% interval.
502
503.. table:: Example fit output
504
505    = =============== ============ ======== ======== ================= =================
506    #  Parameter         mean       median    best   [   68% interval] [   95% interval]
507    = =============== ============ ======== ======== ================= =================
508    1   M1.background 0.059925(41) 0.059924 0.059922 [0.05988 0.05997] [0.05985 0.06000]
509    2       M1.radius   2345.3(15) 2345.234 2345.174 [2343.83 2346.74] [2342.36 2348.29]
510    3 M1.radius.width  0.00775(41)  0.00774  0.00777 [ 0.0074  0.0081] [ 0.0070  0.0086]
511    4        M1.scale  0.21722(20) 0.217218 0.217244 [0.21702 0.21743] [0.21681 0.21761]
512    = =============== ============ ======== ======== ================= =================
513
514The *Convergence* plot shows the range of $\chi^2$ values in the population
515for each iteration.  The band shows the 68% of values around the median, and
516the solid line shows the minimum value.  If the distribution has reached
517equilibrium, then convergence graph should be roughly flat, with little
518change in the minimum value throughout the graph.  If there is no convergence,
519then the remaining plots don't mean much.
520
521The *Correlations* plot shows cross correlation between each pair of
522parameters.  If the parameters are completely uncorrelated then the boxes
523should contain circles.  Diagonals indicate strong correlation.  Square
524blocks indicate that the fit is not sensitive to one of the parameters.
525The range plotted on the correlation plot is determined by the 95% interval
526of the data.  The individual correlation plots are too small to show the
527range of values for the parameters.  These can instead be read from the
528*Uncertainty* plot for each parameter, which covers the same range of values
529and indicates 68% and 95% intervals.  If there are some chains that are
530wandering around away from the minimum, then the plot will look fuzzy, and
531not have a nice blob in the center.  If a correlation plot has multiple blobs,
532then there are multiple minima in your problem space, usually because there
533are symmetries in the problem definition.  For example, a model fitting
534$x + a^2$ will have identical solutions for $\pm\,a$.
535
536The *Uncertainty* plot shows histograms for each fitted parameter generated
537from the values for that parameter across all chains.  Within each histogram
538bar the values are sorted and displayed as a gradient from black to copper,
539with black values having the lowest $\chi^2$ and copper values having the
540highest.  The resulting histogram should be dark brown, with a black hump
541in the center and light brown tips.  If there are large lumps of light brown,
542or excessive black then its likely that the optimizer did not converge.  The
543green line over the histogram shows the best value seen within each
544histogram bin (the maximum likelihood given $p_k == x$).
545With enough samples and proper convergence, it should roughly follow the
546outline of the histogram.  The yellow band in the center of the plot
547represents the 68% interval for the data.  The histogram cuts off at 95%.
548These values along with the median are shown as labels along the x axis.
549The green asterisk represents the best value, the green *E* the mean value
550and the vertical green line the median value.  If the fit is not sensitive
551to a parameter, or if two parameters are strongly correlated, the parameter
552histogram will show a box rather than a hump.  Spiky shapes (either in the
553histogram or the maximum likelihood line) indicate lack of convergence or
554maybe not enough steps.  A chopped histograms indicates that the range for
555that parameter is too small.
556
557The *Parameter Trace* plot is diagnostic for models which have poor mixing.
558In this cases no matter how the parameter values are changing, they are
559landing on much worse values for the $\chi^2$.  This can happen if the
560problem is highly constrained with many tight and twisty values.
561
562The *Data and Theory* plot should show theory and data lining up pretty well,
563with the theory overlaying about 2/3 of the error bars on the data
564(1-\ $\sigma$ = 68%).  The *Residuals* plot shows the difference between
565theory and data divided by uncertainty.  The residuals should be 2/3 within
566[-1, 1], They should not show any structure, such as humps where the theory
567misses the data for long stretches.  This indicates some feature missing
568from the model, or a lack of convergence to the best model.
569
570If entropy is requested, then bumps will show the total number of bits of
571information in the fit.  This derives from the entropy term:
572
573.. math:
574
575    S = \int_\Theta p(\Theta) \log p(\Theta) d\Theta
576
577Using entropy and simulation we hope to be able to make experiment
578planning decisions in a way that maximizes information, by estimating
579whether it is better to measure more precisely or to measure different
580but related values and fit them with shared parameters.
581
582
583References
584----------
585
586.. [Vrugt2009]
587    Vrugt, J. A.; Ter Braak, C. J. F.; Diks, C. G. H.; Robinson, B. A.;
588    Hyman, J. M.; Higdon, D.
589    *International Journal of Nonlinear Sciences and Numerical Simulation*
590    2009, 10 (3), 273–290.
591    DOI: `10.1515/IJNSNS.2009.10.3.273 <http://dx.doi.org/10.1515/IJNSNS.2009.10.3.273>`_
592
593.. [Kramer2010]
594    Kramer, A.; Hasenauer, J.; Allgower, F.; Radde, N.
595    *In 2010 IEEE International Conference on Control Applications (CCA)*
596    2010; pp 493–498.
597    DOI: `10.1109/CCA.2010.5611198 <http://dx.doi.org/10.1109/CCA.2010.5611198>`_
598
599.. [JCGM2008]
600    JCGM.
601    *Evaluation of measurement data — Supplement 1 to the “Guide to the
602    expression of uncertainty in measurement” — Propagation of distributions
603    using a Monte Carlo method*; Joint Committee for Guides in Metrology,
604    JCGM 101:2008; Geneva, Switzerland, 2008; p 90.
605    `<http://www.bipm.org/utils/common/documents/jcgm/JCGM_101_2008_E.pdf>`_
606
607
608
609.. _fit-ps:
610
611Particle Swarm
612==============
613
614Inspired by bird flocking behaviour, the particle swarm algorithm is a
615population-based method which updates an individual according to its
616momentum and a force toward the current best fit parameter values.  We
617did not explore variations of this algorithm in any detail.
618
619When to use
620-----------
621
622Particle swarm performed well enough in our low dimensional test problems,
623but made little progress when more fit parameters were added.
624
625The population updates can run in parallel, but the tiny population size
626limits the amount of parallelism.
627
628Options
629-------
630
631``--steps=n`` is the number of iterations.  Each step updates each member
632of the population.  The population size scales with the number of fitted
633parameters.
634
635``--pop=k`` determines the size of the population.  The number of
636individuals, $k$, is equal to the number of fitted parameters times the
637population scale factor.  The default scale factor is 1.
638
639Use ``--fit=ps`` to select particle swarm from the commandline.
640
641Add a few more lines
642
643References
644----------
645
646.. [Kennedy1995]
647    Kennedy, J.; Eberhart, R.
648    Particle Swarm Optimization
649    *Proceedings of IEEE International Conference on Neural Networks. IV.*
650    1995; pp 1942–1948.
651    DOI: `10.1109/ICNN.1995.48896 <http://dx.doi.org/810.1109/ICNN.1995.488968>`_
652
653
654.. _fit-rl:
655
656Random Lines
657============
658
659Most of the population based algorithms ignore the value of the function
660when choosing the points in the next iteration.  Random lines is a new
661style of algorithm which fits a quadratic model to a selection from the
662population, and uses that model to propose a new point in the next
663generation of the population.  The hope is that the method will inherit
664the robustness of the population based algorithms as well as the rapid
665convergence of the newton descent algorithms.
666
667When to use
668-----------
669
670Random lines works very well for some of our test problems, showing
671rapid convergence to the optimum, but on other problems it makes
672very little progress.
673
674The population updates can run in parallel.
675
676Options
677-------
678
679``--steps=n`` is the number of iterations.  Each step updates each member
680of the population.  The population size scales with the number of fitted
681parameters.
682
683``--pop=k`` determines the size of the population.  The number of
684individuals, $k$, is equal to the number of fitted parameters times the
685population scale factor.  The default scale factor is 0.5.
686
687``--CR=v`` is the crossover ratio, determining what proportion of the
688dimensions to update at each step.  Values must be between 0 and 1.
689
690``--starts=n`` tells the optimizer to restart a given number of times.
691Each time it restarts it uses a random starting point.
692
693``--keep_best`` uses a value near the previous minimum when restarting
694instead of using a random value within the parameter bounds.  This option is
695not available in the options dialog.
696
697Use ``--fit=rl`` to select random lines from the commandline.
698
699References
700----------
701
702.. [Sahin2013]
703
704    Sahin, I.
705    *An International Journal of Optimization and Control:  Theories & Applications (IJOCTA)*
706    2013, 3 (2), 111–119.
707
708
709
710.. _fit-pt:
711
712Parallel Tempering
713==================
714
715Parallel tempering is an MCMC algorithm for uncertainty analysis.  This
716version runs at multiple temperatures simultaneously, with chains at high
717temperature able to more easily jump between minima and chains at low
718temperature to fully explore the minima.  Like :ref:`fit-dream` it has a
719differential evolution stepper, but this version uses the chain history
720as the population rather than maintaining a population at each temperature.
721
722This is an experimental algorithm which does not yet perform well.
723
724When to use
725-----------
726
727When complete, parallel tempering should be used for problems with widely
728spaced local minima which dream cannot fit.
729
730Options
731-------
732
733``--steps=n`` is the number of iterations to include in the Markov
734chain.  Each iteration updates the full population.  The population size
735scales with the number of fitted parameters.
736
737``--burn=n`` is the number of iterations to required for the Markov
738chain to converge to the equilibrium distribution.  If the fit ends
739early, the tail of the burn will be saved to the start of the steps.
740
741``--CR=v`` is the differential evolution crossover ratio to use when
742computing step size and direction.  Use a small value to step through the
743dimensions one at a time, or a large value to step through all at once.
744
745``-nT=k``, ``-Tmin=v`` and ``--Tmax=v`` specify a log-spaced initial
746distribution of temperatures.  The default is 25 points between
7470.1 and 10.  :ref:`fit-dream` runs at a fixed temperature of 1.0.
748
749Use ``--fit=pt`` to select parallel tempering from the commandline.
750
751References
752----------
753
754.. [Swendsen1986]
755    Swendsen, R. H.; Wang J. S.
756    Replica Monte Carlo simulation of spin glasses
757    *Physical Review Letters*
758    1986, 57, 2607-2609
759
760
761..
762    SNOBFIT (fit=snobfit) attempts to construct a locally quadratic model of
763    the entire search space.  While promising because it can begin to offer
764    some guarantees that the search is complete given reasonable assumptions
765    about the fitting surface, initial trials did not perform well and the
766    algorithm has not yet been tuned to our problems.
767
Note: See TracBrowser for help on using the repository browser.