optimizer.rst @ e81f6dd

ESS_GUIESS_GUI_DocsESS_GUI_batch_fittingESS_GUI_bumps_abstractionESS_GUI_iss1116ESS_GUI_iss879ESS_GUI_iss959ESS_GUI_openclESS_GUI_orderingESS_GUI_sync_sascalccostrafo411magnetic_scattrelease-4.1.1release-4.1.2release-4.2.2release_4.0.1ticket-1009ticket-1094-headlessticket-1242-2d-resolutionticket-1243ticket-1249ticket885unittest-saveload

Last change on this file since e81f6dd was e81f6dd, checked in by smk78, 8 years ago
Label optimizer-burn added (by someone, not me!)
Property mode set to `100644`
File size: 31.8 KB

Rev	Line
[49148bb]	1	.. _optimizer-guide:
	2
	3	*******************
	4	Optimizer Selection
	5	*******************
	6
	7	Bumps has a number of different optimizers available, each with its own
	8	control parameters:
	9
	10	* :ref:`fit-lm`
	11	* :ref:`fit-amoeba`
	12	* :ref:`fit-dream`
	13	* :ref:`fit-de`
	14	* :ref:`fit-newton`
	15	* :ref:`fit-rl` [experimental]
	16	* :ref:`fit-ps` [experimental]
	17	* :ref:`fit-pt` [experimental]
	18
	19	In general there is a trade-off between convergence
	20	rate and robustness, with the fastest algorithms most likely to find a
	21	local minimum rather than a global minimum. The gradient descent algorithms
	22	(:ref:`fit-lm`, :ref:`fit-newton`) tend to be fast but they will find local
	23	minima only, while the population algorithms (:ref:`fit-dream`, :ref:`fit-de`)
	24	are more robust and likely slower. :ref:`fit-amoeba` is somewhere between,
	25	with a small population keeping the search local but more robust than the
	26	gradient descent algorithms.
	27
	28	Each algorithm has its own set of control parameters for adjusting the
	29	search process and the stopping conditions. The same option may mean
	30	slightly different things to different optimizers. The bumps package
	31	provides a dialog box for selecting the optimizer and its options
	32	when running the fit wx application. This only includes the common options
	33	for the most useful optimizers. For full control, the fit will need to
	34	be run from the command line interface or through a python script.
	35
	36	For parameter uncertainty, most algorithms use the covariance matrix at
	37	the optimum to estimate an uncertainty ellipse. This is okay for a
	38	preliminary analysis, but only works reliably for weakly correlated parameters.
	39	For full uncertainty analysis, :ref:`fit-dream` uses a random walk to explore
	40	the parameter space near the minimum, showing pair-wise correlations
	41	amongst the parameter values. In order for :ref:`fit-dream` to return the
	42	correct uncertainy, the function to be optimized should be a conditional
	43	probability density, with nllf as the negative log likelihood function
	44	of seeing point $x$ in the parameter space. Other functions
	45	can be fitted, but uncertainty estimates will be meaningless.
	46
	47	Most algorithms have been adapted to run in parallel at least to some degree.
	48	The implementation is not heavily tuned, either in terms of minimizing the
	49	overhead per function evaluation or for distributing the problem across
	50	multiple processors. If the theory function is implemented in parallel,
	51	then the optimizer should be run in serial. Mixed mode is also possible
	52	when running on a cluster with a multi-threaded theory function. In this
	53	case, only one theory function will be evaluated on each cluster node, but
	54	the optimizer will distribute the parameters values to the cluster nodes
	55	in parallel. Do not run serial algorithms (:ref:`fit-lm`, :ref:`fit-newton`) on
	56	a cluster.
	57
	58	We have included a number of optimizers in Bumps that did not perform
	59	particularly well on our problem sets. However, they may be perfect
	60	for your problem, so we have left them in the package for you to explore.
	61	They are not available in the GUI selection.
	62
	63	.. _fit-lm:
	64
	65	Levenberg-Marquardt
	66	===================
	67
	68	.. image:: fit-lm.png
	69	:alt: Levenberg-Marquardt option screen.
	70	:align: left
	71
	72	The Levenberg-Marquardt algorithm has been
	73	the standard method for non-linear data fitting. As a gradient descent
	74	trust region method, it starts at the initial value of the function and
	75	steps in the direction of the derivative until it reaches the minimum.
	76	Set up as an explicit minimization of the sum of square differences between
	77	theory and model, it uses a numerical approximation of the Jacobian matrix
	78	to set the step direction and an adaptive algorithm to set the size of
	79	the trust region.
	80
	81	When to use
	82	-----------
	83
	84	Use this method when you have a reasonable fit near the minimum, and
	85	you want to get the best possible value. This can then be used as the starting
	86	point for uncertainty analysis using :ref:`fit-dream`. This method requires
	87	that the problem definition includes a residuals method, but this should
	88	always be true when fitting data.
	89
	90	When modeling the results of an experiment, the best fit value is an
	91	accident of the measurement. Redo the same measurement, and the slightly
	92	different values you measure will lead to a different best fit. The
	93	important quantity to report is the credible interval covering
	94	68% (1-\ $\sigma$) or 95% (2-\ $\sigma$\ ) of the range of
	95	parameter values that are somewhat consistent with the data.
	96
	97	This method uses lmfit from scipy, and does not run in parallel.
	98
	99	Options
	100	-------
	101
	102	Steps is the number of gradient steps to take. Each step requires
	103	a calculation of the Jacobian matrix to determine the direction. This
	104	needs $2 m n$ function evaluations, where $n$ is the number of parameters and
	105	each function is evaluated and $m$ data points (assuming center point
	106	formula for finite difference estimate of the derivative). The resulting
	107	linear equation is then solved, but for small $n$ and expensive function
	108	evaluation this overhead can be ignored. Use ``--steps=n`` from
	109	the command line.
	110
	111	f(x) tolerance and x tolerance are used to determine when
	112	the fit has reached the point where no significant improvement is expected.
	113	If the function value does not improve significantly within the step, or
	114	the step is too short, then the fit will terminate. Use ``--ftol=v`` and
	115	``--xtol=v`` from the command line.
	116
	117	From the command line, ``--starts=n`` will automatically restart the algorithm
	118	after it has converged so that a slightly better value can be found. If
	119	``--keep_best`` is included then restart will use a value near the minimum,
	120	otherwise it will restart the fit from a random point in the parameter space.
	121
	122	Use ``--fit=lm`` to select the Levenberg-Marquardt fitter from the command line.
	123
	124	References
	125	----------
	126
	127	.. [Levenberg1944]
	128	Levenberg, K.
	129	Quarterly Journal of Applied Mathmatics
	130	1944, II (2), 164â168.
	131
	132	.. [Marquardt1963]
	133	Marquardt, D. W.
	134	Journal of the Society for Industrial and Applied Mathematics
	135	1963, 11 (2), 431â441.
	136	DOI: `10.1137/0111030 <http://dx.doi.org/10.1137/0111030>`_
	137
	138	.. _fit-amoeba:
	139
	140	Nelder-Mead Simplex
	141	===================
	142
	143	.. image:: fit-amoeba.png
	144	:alt: Nelder-Mead Simplex option screen.
	145	:align: left
	146
	147	The Nelder-Mead downhill simplex algorithm is a robust optimizer which
	148	does not require the function to be continuous or differentiable.
	149	It uses the relative values of the function at the corners of a
	150	simplex (an n-dimensional triangle) to decide which points of the simplex
	151	to update. It will take the worst value and try moving it inward or
	152	outward, or reflect it through the centroid of the remaining values
	153	stopping if it finds a better value. If none of these values are
	154	better, then it will shrink the simplex and start again. The name
	155	amoeba comes from the book Numerical Recipes [Press1992]_ wherein they
	156	describe the search as acting like an amoeba, squeezing through narrow valleys
	157	as it makes its way down to the minimum.
	158
	159	When to use
	160	-----------
	161
	162	Use this method as a first fit to your model. If your fitting function
	163	is well behaved with few local minima this will give a quick estimate of
	164	the model, and help you decide if the model needs to be refined. If your
	165	function is poorly behaved, you will need to select a good initial value
	166	before fitting, or use a more robust method such
	167	as :ref:`fit-de` or :ref:`fit-dream`.
	168
	169	The uncertainty reported comes from a numerical derivative estimate at the
	170	minimum.
	171
	172	This method requires a series of function updates, and does not benefit
	173	much from running in parallel.
	174
	175	Options
	176	-------
	177
	178	Steps is the simplex update iterations to perform. Most updates
	179	require one or two function evaluations, but shrinking the simplex evaluates
	180	every value in the simplex. Use ``--steps=n`` from the command line.
	181
	182	Starts tells the optimizer to restart a given number of times.
	183	Each time it restarts it uses a random starting point. Use
	184	``--starts=n`` from the command line.
	185
	186	Simplex radius is the initial size of the simplex, as a portion of
	187	the bounds defining the parameter space. If a parameter is unbounded, then
	188	the radius will be treated as a portion of the parameter value. Use
	189	``--radius=n`` from the command line.
	190
	191	x tolerance and f(x) tolerance are used to determine when the
	192	fit has reached the point where no significant improvement is expected.
	193	If the simplex is tiny (that is, the corners are close to each other) and
	194	flat (that is, the values at the corners are close to each other),
	195	then the fit will terminate. Use ``--xtol=v`` and ``--ftol=v`` from
	196	the command line.
	197
	198	From the command line, use ``--keep_best`` so that restarts are centered on a
	199	value near the minimum rather than restarting from a random point within the
	200	parameter bounds.
	201
	202	Use ``--fit=amoeba`` to select the Nelder-Mead simplex fitter from the
	203	command line.
	204
	205	References
	206	----------
	207
	208	.. [Nelder1965]
	209	Nelder, J. A.; Mead, R.
	210	The Computer Journal
	211	1965, 7 (4), 308â313.
	212	DOI: `10.1093/comjnl/7.4.308 <http://dx.doi.org/10.1093/comjnl/7.4.308>`_
	213
	214	.. [Press1992]
	215	Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; Vetterling, W. T.
	216	In Numerical Recipes in C: The Art of Scientific Computing, Second Edition;
	217	Cambridge University Press: Cambridge; New York, 1992; pp 408â412.
	218
	219
	220	.. _fit-newton:
	221
	222	Quasi-Newton BFGS
	223	=================
	224
	225	.. image:: fit-newton.png
	226	:alt: Quasi-Newton BFGS option screen.
	227	:align: left
	228
	229	Broyden-Fletcher-Goldfarb-Shanno is a gradient descent method which uses the
	230	gradient to determine the step direction and an approximation of the Hessian
	231	matrix to estimate the curvature and guess a step size. The step is further
	232	refined with a one-dimensional search in the direction of the gradient.
	233
	234	When to use
	235	-----------
	236
	237	Like :ref:`fit-lm`, this method converges quickly to the minimum. It does
	238	not assume that the problem is in the form of a sum of squares and does not
	239	require a residuals method.
	240
	241	The $n$ partial derivatives are computed in parallel.
	242
	243	Options
	244	-------
	245
	246	Steps is the number of gradient steps to take. Each step requires
	247	a calculation of the Jacobian matrix to determine the direction. This
	248	needs $2 m n$ function evaluations, where $n$ is the number of parameters and
	249	each function is evaluated and $m$ data points (assuming center point
	250	formula for finite difference estimate of the derivative). The resulting
	251	linear equation is then solved, but for small $n$ and expensive function
	252	evaluation this overhead can be ignored.
	253	Use ``--steps=n`` from the command line.
	254
	255	Starts tells the optimizer to restart a given number of times.
	256	Each time it restarts it uses a random starting point.
	257	Use ``--starts=n`` from the command line.
	258
	259	f(x) tolerance and x tolerance are used to determine when
	260	the fit has reached the point where no significant improvement is expected.
	261	If the function is small or the step is too short then the fit
	262	will terminate. Use ``--ftol=v`` and ``--xtol=v`` from the command line.
	263
	264	From the command line, ``--keep_best`` uses a value near the previous minimum
	265	when restarting instead of using a random value within the parameter bounds.
	266
	267	Use ``--fit=newton`` to select BFGS from the commandline.
	268
	269	References
	270	----------
	271
	272	.. [Dennis1987]
	273	Dennis, J. E.; Schnabel, R. B.
	274	Numerical Methods for Unconstrained Optimization and Nonlinear Equations;
	275	Society for Industrial and Applied Mathematics: Philadelphia, 1987.
	276
	277
	278	.. _fit-de:
	279
	280	Differential Evolution
	281	======================
	282
	283	.. image:: fit-de.png
	284	:alt: Differential Evolution option screen.
	285	:align: left
	286
	287	Differential evolution is a population based algorithm which uses differences
	288	between points as a guide to selecting new points. For each member of the
	289	population a pair of points is chosen at random, and a difference vector is
	290	computed. This vector is scaled, and a random subset of its components are
	291	added to the current point based on crossover ratio. This new point is
	292	evaluated, and if its value is lower than the current point, it replaces
	293	it in the population. There are many variations available within DE that
	294	have not been exposed in Bumps. Interested users can modify
	295	:class:`bumps.fitters.DEFit` and experiment with different crossover and
	296	mutation algorithms, and perhaps add them as command line options.
	297
	298	Differential evolution is a robust directed search strategy. Early in the
	299	search, when the population is disperse, the difference vectors are large
	300	and the search remains broad. As the search progresses, more of the
	301	population goes into the valleys and eventually all the points end up in
	302	local minima. Now the differences between random pairs will often be small
	303	and the search will become more localized.
	304
	305	The population is initialized according to the prior probability distribution
	306	for each each parameter. That is, if the parameter is bounded, it will use
	307	a uniform random number generate within the bounds. If it is unbounded, it
	308	will use a uniform value in [0,1]. If the parameter corresponds to the result
	309	of a previous measurement with mean $\mu$ and standard deviation $\sigma$,
	310	then the initial values will be pulled from a gaussian random number generator.
	311
	312	When to use
	313	-----------
	314
	315	Convergence with differential evolution will be slower, but more robust.
	316
	317	Each update will evaluate $k$ points in parallel, where $k$ is the size
	318	of the population.
	319
	320	Options
	321	-------
	322
	323	Steps is the number of iterations. Each step updates each member
	324	of the population. The population size scales with the number of fitted
	325	parameters. Use ``--steps=n`` from the command line.
	326
	327	Population determines the size of the population. The number of
	328	individuals, $k$, is equal to the number of fitted parameters times the
	329	population scale factor. Use ``--pop=k`` from the command line.
	330
	331	Crossover ratio determines what proportion of the dimensions to update
	332	at each step. Smaller values will likely lead to slower convergence, but
	333	more robust results. Values must be between 0 and 1. Use ``--CR=v`` from
	334	the command line.
	335
	336	Scale determines how much to scale each difference vector before adding
	337	it to the candidate point. The selected mutation algorithm chooses a scale
	338	factor uniformly in $[0,F]$. Use ``--F=v`` from the command line.
	339
	340	f(x) tolerance and x tolerance are used to determine when the
	341	fit has reached the point where no significant improvement is expected.
	342	If the population is flat (that is, the minimum and maximum values are
	343	within tolerance) and tiny (that is, all the points are close to each
	344	other) then the fit will terminate. Use ``ftol=v`` and ``xtol=v`` from the
	345	command line.
	346
	347	Use ``--fit=de`` to select diffrential evolution from the commandline.
	348
	349	References
	350	----------
	351
	352	.. [Storn1997]
	353	Storn, R.; Price, K.
	354	Journal of Global Optimization
	355	1997, 11 (4), 341â359.
	356	DOI: `10.1023/A:1008202821328 <http://dx.doi.org/10.1023/A:1008202821328>`_
	357
	358
	359
	360	.. _fit-dream:
	361
	362	DREAM
	363	=====
	364
	365	.. image:: fit-dream.png
	366	:alt: DREAM option screen.
	367	:align: left
	368
	369	DREAM is a population based algorithm like differential evolution, but
	370	instead of only keeping individuals which improve each generation, it
	371	will sometimes keep individuals which get worse. Although it is not
	372	fast and does not give the very best value for the function, we have
	373	found it to be a robust fitting engine which will give a good value given
	374	enough time.
	375
	376	The progress of each individual in the population from generation to
	377	generation can considered a Markov chain, whose transition probability
	378	is equal to the probability of taking the step times the probability
	379	that it keeps the step based on the difference in value between the points.
	380	By including a purely random stepper with some probability, the detailed
	381	balance condition is preserved, and the Markov chain converges onto
	382	the underlying equilibrium distribution. If the theory function represents
	383	the conditional probability of selecting each point in the parameter
	384	space, then the resulting chain is a random draw from the posterior
	385	distribution.
	386
	387	This means that the DREAM algorithm can be used to determine the parameter
	388	uncertainties. Unlike the hessian estimate at the minimum that is
	389	used to report uncertainties from the other fitters, the resulting
	390	uncertainty need not gaussian. Indeed, the resulting distribution can
	391	even be multi-modal. Fits to measured data using theory functions that
	392	have symmetric solutions have shown all equivalent solutions with approximately
	393	equal probability.
	394
	395	When to use
	396	-----------
	397
	398	Use DREAM when you need a robust fitting algorithm. It takes longer but
	399	it does an excellent job of exploring different minima and getting close
	400	to the global optimum.
	401
	402	Use DREAM when you want a detailed analysis of the parameter uncertainty.
	403
	404	Like differential evolution, DREAM will evaluate $k$ points in parallel,
	405	where $k$ is the size of the population.
	406
[e81f6dd]	407	.. _option-burn:
	408
[49148bb]	409	Options
	410	-------
	411
	412	Samples is the number of points to be drawn from the Markov chain.
	413	To estimate the 68% interval to two digits of precision, at least
	414	1e5 (or 100,000) samples are needed. For the 95% interval, 1e6
	415	(or 1,000,000) samples are needed. The default 1e4 samples
	416	gives a rough approximation of the uncertainty relatively quickly.
	417	Use ``--samples=n`` from the command line.
	418
	419	Burn-in Steps is the number of iterations to required for the Markov
	420	chain to converge to the equilibrium distribution. If the fit ends
	421	early, the tail of the burn will be saved to the start of the steps.
	422	Use ``--burn=n`` from the command line.
	423
	424	Population determines the size of the population. The number of
	425	individuals, $k$, is equal to the number of fitted parameters times the
	426	population scale factor. Use ``--pop=k`` from the command line.
	427
	428	Initializer determines how the population will be initialized.
	429	The options are as follows:
	430
	431	eps (epsilon ball), in which the entire initial population is chosen
	432	at random from within a tiny hypersphere centered about the initial point
	433
	434	lhs (latin hypersquare), which chops the bounds within each dimension
	435	in $k$ equal sized chunks where $k$ is the size of the population and
	436	makes sure that each parameter has at least one value within each chunk
	437	across the population.
	438
	439	cov (covariance matrix), in which the uncertainty is estimated using
	440	the covariance matrix at the initial point, and points are selected
	441	at random from the corresponding gaussian ellipsoid
	442
	443	random (uniform random), in which the points are selected at random
	444	within the bounds of the parameters
	445
	446	Use ``--init=type`` from the command line.
	447
	448
	449	Thinning is the amount of thinning to use when collecting the
	450	population. If the fit is somewhat stuck, with most steps not improving
	451	the fit, then you will need to thin the population to get proper
	452	statistics. Use ``--thin=k`` from the command line.
	453
	454	Calculate entropy, if true, computes the entropy for the fit. This is
	455	an estimate of the amount of information in the data. Use ``--entropy``
	456	from the command line.
	457
	458	Steps, if not zero, determines the number of iterations to use for
	459	drawing samples after burn in. Each iteration updates the full population,
	460	which is (population x number of fitted parameters) points. This option
	461	is available for compatibility; it is more useful to set the number of
	462	samples directly. Use ``--steps=n`` from the command line.
	463
	464	Use ``--fit=dream`` to select DREAM from the commandline.
	465
	466	Output
	467	------
	468
	469	DREAM produces a number of different outputs, and there are a number of
	470	things to check before using its reported uncertainty values. The main
	471	goal of selecting ``--burn=n`` is to wait long enough to reach the
	472	equilibrium distribution.
	473
	474	.. figure:: dream-incomplete.png
	475	:alt: example of incomplete fit
	476
	477	This DREAM fit is incomplete, as can be seen on all four plots. The
	478	Convergence plot is still decreasing, Parameter Trace plot does not
	479	show random mixing of Markov chain values, the Correlations plots are
	480	fuzzy and mostly empty, the Uncertainty plot shows black histograms
	481	(indicating that there are a few stray values far away from the best) and
	482	green maximum likelihood spikes not matching the histogram (indicating
	483	that the region around the best value has not been adequately explored).
	484
	485	.. figure:: dream-complete.png
	486	:alt: example of a completed fit
	487
	488	This DREAM fit completed successfully. The Convergence plot is flat,
	489	the Parameter Trace plot is flat and messy, the Correlateions plots
	490	show nice blobs (and a bit of correlation between the M1.radius parameter
	491	and the M1.radius.width parameter), and the uncertainty plots show
	492	a narrow range of -log(P) values in the mostly brown histograms and
	493	a good match to the green constrained maximum likelihood line.
	494
	495	For each parameter in the fit, DREAM finds the mean, median and best value,
	496	as well as the 68% and 95% credible intervals. The mean value is
	497	defined as $\int x P(x) dx$, which is just the expected value of the
	498	probability distribution for the parameter. The median value is the 50%
	499	point in the probability distribution, and the best value is the maximum
	500	likelihood value seen in the random walk. The credible intervals are the
	501	central intervals which capture 68% and 95% of the parameter values
	502	respectively. You need approximately 100,000 samples to get two digits of
	503	precision on the 68% interval, and 1,000,000 samples for the 95% interval.
	504
	505	.. table:: Example fit output
	506
	507	= =============== ============ ======== ======== ================= =================
	508	# Parameter mean median best [ 68% interval] [ 95% interval]
	509	= =============== ============ ======== ======== ================= =================
	510	1 M1.background 0.059925(41) 0.059924 0.059922 [0.05988 0.05997] [0.05985 0.06000]
	511	2 M1.radius 2345.3(15) 2345.234 2345.174 [2343.83 2346.74] [2342.36 2348.29]
	512	3 M1.radius.width 0.00775(41) 0.00774 0.00777 [ 0.0074 0.0081] [ 0.0070 0.0086]
	513	4 M1.scale 0.21722(20) 0.217218 0.217244 [0.21702 0.21743] [0.21681 0.21761]
	514	= =============== ============ ======== ======== ================= =================
	515
	516	The Convergence plot shows the range of $\chi^2$ values in the population
	517	for each iteration. The band shows the 68% of values around the median, and
	518	the solid line shows the minimum value. If the distribution has reached
	519	equilibrium, then convergence graph should be roughly flat, with little
	520	change in the minimum value throughout the graph. If there is no convergence,
	521	then the remaining plots don't mean much.
	522
	523	The Correlations plot shows cross correlation between each pair of
	524	parameters. If the parameters are completely uncorrelated then the boxes
	525	should contain circles. Diagonals indicate strong correlation. Square
	526	blocks indicate that the fit is not sensitive to one of the parameters.
	527	The range plotted on the correlation plot is determined by the 95% interval
	528	of the data. The individual correlation plots are too small to show the
	529	range of values for the parameters. These can instead be read from the
	530	Uncertainty plot for each parameter, which covers the same range of values
	531	and indicates 68% and 95% intervals. If there are some chains that are
	532	wandering around away from the minimum, then the plot will look fuzzy, and
	533	not have a nice blob in the center. If a correlation plot has multiple blobs,
	534	then there are multiple minima in your problem space, usually because there
	535	are symmetries in the problem definition. For example, a model fitting
	536	$x + a^2$ will have identical solutions for $\pm\,a$.
	537
	538	The Uncertainty plot shows histograms for each fitted parameter generated
	539	from the values for that parameter across all chains. Within each histogram
	540	bar the values are sorted and displayed as a gradient from black to copper,
	541	with black values having the lowest $\chi^2$ and copper values having the
	542	highest. The resulting histogram should be dark brown, with a black hump
	543	in the center and light brown tips. If there are large lumps of light brown,
	544	or excessive black then its likely that the optimizer did not converge. The
	545	green line over the histogram shows the best value seen within each
	546	histogram bin (the maximum likelihood given $p_k == x$).
	547	With enough samples and proper convergence, it should roughly follow the
	548	outline of the histogram. The yellow band in the center of the plot
	549	represents the 68% interval for the data. The histogram cuts off at 95%.
	550	These values along with the median are shown as labels along the x axis.
	551	The green asterisk represents the best value, the green E the mean value
	552	and the vertical green line the median value. If the fit is not sensitive
	553	to a parameter, or if two parameters are strongly correlated, the parameter
	554	histogram will show a box rather than a hump. Spiky shapes (either in the
	555	histogram or the maximum likelihood line) indicate lack of convergence or
	556	maybe not enough steps. A chopped histograms indicates that the range for
	557	that parameter is too small.
	558
	559	The Parameter Trace plot is diagnostic for models which have poor mixing.
	560	In this cases no matter how the parameter values are changing, they are
	561	landing on much worse values for the $\chi^2$. This can happen if the
	562	problem is highly constrained with many tight and twisty values.
	563
	564	The Data and Theory plot should show theory and data lining up pretty well,
	565	with the theory overlaying about 2/3 of the error bars on the data
	566	(1-\ $\sigma$ = 68%). The Residuals plot shows the difference between
	567	theory and data divided by uncertainty. The residuals should be 2/3 within
	568	[-1, 1], They should not show any structure, such as humps where the theory
	569	misses the data for long stretches. This indicates some feature missing
	570	from the model, or a lack of convergence to the best model.
	571
	572	If entropy is requested, then bumps will show the total number of bits of
	573	information in the fit. This derives from the entropy term:
	574
	575	.. math:
	576
	577	S = \int_\Theta p(\Theta) \log p(\Theta) d\Theta
	578
	579	Using entropy and simulation we hope to be able to make experiment
	580	planning decisions in a way that maximizes information, by estimating
	581	whether it is better to measure more precisely or to measure different
	582	but related values and fit them with shared parameters.
	583
	584
	585	References
	586	----------
	587
	588	.. [Vrugt2009]
	589	Vrugt, J. A.; Ter Braak, C. J. F.; Diks, C. G. H.; Robinson, B. A.;
	590	Hyman, J. M.; Higdon, D.
	591	International Journal of Nonlinear Sciences and Numerical Simulation
	592	2009, 10 (3), 273â290.
	593	DOI: `10.1515/IJNSNS.2009.10.3.273 <http://dx.doi.org/10.1515/IJNSNS.2009.10.3.273>`_
	594
	595	.. [Kramer2010]
	596	Kramer, A.; Hasenauer, J.; Allgower, F.; Radde, N.
	597	In 2010 IEEE International Conference on Control Applications (CCA)
	598	2010; pp 493â498.
	599	DOI: `10.1109/CCA.2010.5611198 <http://dx.doi.org/10.1109/CCA.2010.5611198>`_
	600
	601	.. [JCGM2008]
	602	JCGM.
	603	*Evaluation of measurement data â Supplement 1 to the âGuide to the
	604	expression of uncertainty in measurementâ â Propagation of distributions
	605	using a Monte Carlo method*; Joint Committee for Guides in Metrology,
	606	JCGM 101:2008; Geneva, Switzerland, 2008; p 90.
	607	`<http://www.bipm.org/utils/common/documents/jcgm/JCGM_101_2008_E.pdf>`_
	608
	609
	610
	611	.. _fit-ps:
	612
	613	Particle Swarm
	614	==============
	615
	616	Inspired by bird flocking behaviour, the particle swarm algorithm is a
	617	population-based method which updates an individual according to its
	618	momentum and a force toward the current best fit parameter values. We
	619	did not explore variations of this algorithm in any detail.
	620
	621	When to use
	622	-----------
	623
	624	Particle swarm performed well enough in our low dimensional test problems,
	625	but made little progress when more fit parameters were added.
	626
	627	The population updates can run in parallel, but the tiny population size
	628	limits the amount of parallelism.
	629
	630	Options
	631	-------
	632
	633	``--steps=n`` is the number of iterations. Each step updates each member
	634	of the population. The population size scales with the number of fitted
	635	parameters.
	636
	637	``--pop=k`` determines the size of the population. The number of
	638	individuals, $k$, is equal to the number of fitted parameters times the
	639	population scale factor. The default scale factor is 1.
	640
	641	Use ``--fit=ps`` to select particle swarm from the commandline.
	642
	643	Add a few more lines
	644
	645	References
	646	----------
	647
	648	.. [Kennedy1995]
	649	Kennedy, J.; Eberhart, R.
	650	Particle Swarm Optimization
	651	Proceedings of IEEE International Conference on Neural Networks. IV.
	652	1995; pp 1942â1948.
	653	DOI: `10.1109/ICNN.1995.48896 <http://dx.doi.org/810.1109/ICNN.1995.488968>`_
	654
	655
	656	.. _fit-rl:
	657
	658	Random Lines
	659	============
	660
	661	Most of the population based algorithms ignore the value of the function
	662	when choosing the points in the next iteration. Random lines is a new
	663	style of algorithm which fits a quadratic model to a selection from the
	664	population, and uses that model to propose a new point in the next
	665	generation of the population. The hope is that the method will inherit
	666	the robustness of the population based algorithms as well as the rapid
	667	convergence of the newton descent algorithms.
	668
	669	When to use
	670	-----------
	671
	672	Random lines works very well for some of our test problems, showing
	673	rapid convergence to the optimum, but on other problems it makes
	674	very little progress.
	675
	676	The population updates can run in parallel.
	677
	678	Options
	679	-------
	680
	681	``--steps=n`` is the number of iterations. Each step updates each member
	682	of the population. The population size scales with the number of fitted
	683	parameters.
	684
	685	``--pop=k`` determines the size of the population. The number of
	686	individuals, $k$, is equal to the number of fitted parameters times the
	687	population scale factor. The default scale factor is 0.5.
	688
	689	``--CR=v`` is the crossover ratio, determining what proportion of the
	690	dimensions to update at each step. Values must be between 0 and 1.
	691
	692	``--starts=n`` tells the optimizer to restart a given number of times.
	693	Each time it restarts it uses a random starting point.
	694
	695	``--keep_best`` uses a value near the previous minimum when restarting
	696	instead of using a random value within the parameter bounds. This option is
	697	not available in the options dialog.
	698
	699	Use ``--fit=rl`` to select random lines from the commandline.
	700
	701	References
	702	----------
	703
	704	.. [Sahin2013]
	705
	706	Sahin, I.
	707	An International Journal of Optimization and Control: Theories & Applications (IJOCTA)
	708	2013, 3 (2), 111â119.
	709
	710
	711
	712	.. _fit-pt:
	713
	714	Parallel Tempering
	715	==================
	716
	717	Parallel tempering is an MCMC algorithm for uncertainty analysis. This
	718	version runs at multiple temperatures simultaneously, with chains at high
	719	temperature able to more easily jump between minima and chains at low
	720	temperature to fully explore the minima. Like :ref:`fit-dream` it has a
	721	differential evolution stepper, but this version uses the chain history
	722	as the population rather than maintaining a population at each temperature.
	723
	724	This is an experimental algorithm which does not yet perform well.
	725
	726	When to use
	727	-----------
	728
	729	When complete, parallel tempering should be used for problems with widely
	730	spaced local minima which dream cannot fit.
	731
	732	Options
	733	-------
	734
	735	``--steps=n`` is the number of iterations to include in the Markov
	736	chain. Each iteration updates the full population. The population size
	737	scales with the number of fitted parameters.
	738
	739	``--burn=n`` is the number of iterations to required for the Markov
	740	chain to converge to the equilibrium distribution. If the fit ends
	741	early, the tail of the burn will be saved to the start of the steps.
	742
	743	``--CR=v`` is the differential evolution crossover ratio to use when
	744	computing step size and direction. Use a small value to step through the
	745	dimensions one at a time, or a large value to step through all at once.
	746
	747	``-nT=k``, ``-Tmin=v`` and ``--Tmax=v`` specify a log-spaced initial
	748	distribution of temperatures. The default is 25 points between
	749	0.1 and 10. :ref:`fit-dream` runs at a fixed temperature of 1.0.
	750
	751	Use ``--fit=pt`` to select parallel tempering from the commandline.
	752
	753	References
	754	----------
	755
	756	.. [Swendsen1986]
	757	Swendsen, R. H.; Wang J. S.
	758	Replica Monte Carlo simulation of spin glasses
	759	Physical Review Letters
	760	1986, 57, 2607-2609
	761
	762
	763	..
	764	SNOBFIT (fit=snobfit) attempts to construct a locally quadratic model of
	765	the entire search space. While promising because it can begin to offer
	766	some guarantees that the search is complete given reasonable assumptions
	767	about the fitting surface, initial trials did not perform well and the
	768	algorithm has not yet been tuned to our problems.
	769

Note: See TracBrowser for help on using the repository browser.

SasView

source: sasview/docs/sphinx-docs/source/user/sasgui/perspectives/fitting/optimizer.rst @ e81f6dd

Download in other formats: