optimizer.rst @ 197260f

ESS_GUIESS_GUI_DocsESS_GUI_batch_fittingESS_GUI_bumps_abstractionESS_GUI_iss1116ESS_GUI_iss879ESS_GUI_iss959ESS_GUI_openclESS_GUI_orderingESS_GUI_sync_sascalccostrafo411magnetic_scattrelease-4.1.1release-4.1.2release-4.2.2release_4.0.1ticket-1009ticket-1094-headlessticket-1242-2d-resolutionticket-1243ticket-1249ticket885unittest-saveload

Last change on this file since 197260f was 49148bb, checked in by smk78, 9 years ago
Add in guiframe related docs files
Property mode set to `100644`
File size: 31.8 KB

Rev	Line
[49148bb]	1	.. _optimizer-guide:
	2
	3	*******************
	4	Optimizer Selection
	5	*******************
	6
	7	Bumps has a number of different optimizers available, each with its own
	8	control parameters:
	9
	10	* :ref:`fit-lm`
	11	* :ref:`fit-amoeba`
	12	* :ref:`fit-dream`
	13	* :ref:`fit-de`
	14	* :ref:`fit-newton`
	15	* :ref:`fit-rl` [experimental]
	16	* :ref:`fit-ps` [experimental]
	17	* :ref:`fit-pt` [experimental]
	18
	19	In general there is a trade-off between convergence
	20	rate and robustness, with the fastest algorithms most likely to find a
	21	local minimum rather than a global minimum. The gradient descent algorithms
	22	(:ref:`fit-lm`, :ref:`fit-newton`) tend to be fast but they will find local
	23	minima only, while the population algorithms (:ref:`fit-dream`, :ref:`fit-de`)
	24	are more robust and likely slower. :ref:`fit-amoeba` is somewhere between,
	25	with a small population keeping the search local but more robust than the
	26	gradient descent algorithms.
	27
	28	Each algorithm has its own set of control parameters for adjusting the
	29	search process and the stopping conditions. The same option may mean
	30	slightly different things to different optimizers. The bumps package
	31	provides a dialog box for selecting the optimizer and its options
	32	when running the fit wx application. This only includes the common options
	33	for the most useful optimizers. For full control, the fit will need to
	34	be run from the command line interface or through a python script.
	35
	36	For parameter uncertainty, most algorithms use the covariance matrix at
	37	the optimum to estimate an uncertainty ellipse. This is okay for a
	38	preliminary analysis, but only works reliably for weakly correlated parameters.
	39	For full uncertainty analysis, :ref:`fit-dream` uses a random walk to explore
	40	the parameter space near the minimum, showing pair-wise correlations
	41	amongst the parameter values. In order for :ref:`fit-dream` to return the
	42	correct uncertainy, the function to be optimized should be a conditional
	43	probability density, with nllf as the negative log likelihood function
	44	of seeing point $x$ in the parameter space. Other functions
	45	can be fitted, but uncertainty estimates will be meaningless.
	46
	47	Most algorithms have been adapted to run in parallel at least to some degree.
	48	The implementation is not heavily tuned, either in terms of minimizing the
	49	overhead per function evaluation or for distributing the problem across
	50	multiple processors. If the theory function is implemented in parallel,
	51	then the optimizer should be run in serial. Mixed mode is also possible
	52	when running on a cluster with a multi-threaded theory function. In this
	53	case, only one theory function will be evaluated on each cluster node, but
	54	the optimizer will distribute the parameters values to the cluster nodes
	55	in parallel. Do not run serial algorithms (:ref:`fit-lm`, :ref:`fit-newton`) on
	56	a cluster.
	57
	58	We have included a number of optimizers in Bumps that did not perform
	59	particularly well on our problem sets. However, they may be perfect
	60	for your problem, so we have left them in the package for you to explore.
	61	They are not available in the GUI selection.
	62
	63	.. _fit-lm:
	64
	65	Levenberg-Marquardt
	66	===================
	67
	68	.. image:: fit-lm.png
	69	:alt: Levenberg-Marquardt option screen.
	70	:align: left
	71
	72	The Levenberg-Marquardt algorithm has been
	73	the standard method for non-linear data fitting. As a gradient descent
	74	trust region method, it starts at the initial value of the function and
	75	steps in the direction of the derivative until it reaches the minimum.
	76	Set up as an explicit minimization of the sum of square differences between
	77	theory and model, it uses a numerical approximation of the Jacobian matrix
	78	to set the step direction and an adaptive algorithm to set the size of
	79	the trust region.
	80
	81	When to use
	82	-----------
	83
	84	Use this method when you have a reasonable fit near the minimum, and
	85	you want to get the best possible value. This can then be used as the starting
	86	point for uncertainty analysis using :ref:`fit-dream`. This method requires
	87	that the problem definition includes a residuals method, but this should
	88	always be true when fitting data.
	89
	90	When modeling the results of an experiment, the best fit value is an
	91	accident of the measurement. Redo the same measurement, and the slightly
	92	different values you measure will lead to a different best fit. The
	93	important quantity to report is the credible interval covering
	94	68% (1-\ $\sigma$) or 95% (2-\ $\sigma$\ ) of the range of
	95	parameter values that are somewhat consistent with the data.
	96
	97	This method uses lmfit from scipy, and does not run in parallel.
	98
	99	Options
	100	-------
	101
	102	Steps is the number of gradient steps to take. Each step requires
	103	a calculation of the Jacobian matrix to determine the direction. This
	104	needs $2 m n$ function evaluations, where $n$ is the number of parameters and
	105	each function is evaluated and $m$ data points (assuming center point
	106	formula for finite difference estimate of the derivative). The resulting
	107	linear equation is then solved, but for small $n$ and expensive function
	108	evaluation this overhead can be ignored. Use ``--steps=n`` from
	109	the command line.
	110
	111	f(x) tolerance and x tolerance are used to determine when
	112	the fit has reached the point where no significant improvement is expected.
	113	If the function value does not improve significantly within the step, or
	114	the step is too short, then the fit will terminate. Use ``--ftol=v`` and
	115	``--xtol=v`` from the command line.
	116
	117	From the command line, ``--starts=n`` will automatically restart the algorithm
	118	after it has converged so that a slightly better value can be found. If
	119	``--keep_best`` is included then restart will use a value near the minimum,
	120	otherwise it will restart the fit from a random point in the parameter space.
	121
	122	Use ``--fit=lm`` to select the Levenberg-Marquardt fitter from the command line.
	123
	124	References
	125	----------
	126
	127	.. [Levenberg1944]
	128	Levenberg, K.
	129	Quarterly Journal of Applied Mathmatics
	130	1944, II (2), 164â168.
	131
	132	.. [Marquardt1963]
	133	Marquardt, D. W.
	134	Journal of the Society for Industrial and Applied Mathematics
	135	1963, 11 (2), 431â441.
	136	DOI: `10.1137/0111030 <http://dx.doi.org/10.1137/0111030>`_
	137
	138	.. _fit-amoeba:
	139
	140	Nelder-Mead Simplex
	141	===================
	142
	143	.. image:: fit-amoeba.png
	144	:alt: Nelder-Mead Simplex option screen.
	145	:align: left
	146
	147	The Nelder-Mead downhill simplex algorithm is a robust optimizer which
	148	does not require the function to be continuous or differentiable.
	149	It uses the relative values of the function at the corners of a
	150	simplex (an n-dimensional triangle) to decide which points of the simplex
	151	to update. It will take the worst value and try moving it inward or
	152	outward, or reflect it through the centroid of the remaining values
	153	stopping if it finds a better value. If none of these values are
	154	better, then it will shrink the simplex and start again. The name
	155	amoeba comes from the book Numerical Recipes [Press1992]_ wherein they
	156	describe the search as acting like an amoeba, squeezing through narrow valleys
	157	as it makes its way down to the minimum.
	158
	159	When to use
	160	-----------
	161
	162	Use this method as a first fit to your model. If your fitting function
	163	is well behaved with few local minima this will give a quick estimate of
	164	the model, and help you decide if the model needs to be refined. If your
	165	function is poorly behaved, you will need to select a good initial value
	166	before fitting, or use a more robust method such
	167	as :ref:`fit-de` or :ref:`fit-dream`.
	168
	169	The uncertainty reported comes from a numerical derivative estimate at the
	170	minimum.
	171
	172	This method requires a series of function updates, and does not benefit
	173	much from running in parallel.
	174
	175	Options
	176	-------
	177
	178	Steps is the simplex update iterations to perform. Most updates
	179	require one or two function evaluations, but shrinking the simplex evaluates
	180	every value in the simplex. Use ``--steps=n`` from the command line.
	181
	182	Starts tells the optimizer to restart a given number of times.
	183	Each time it restarts it uses a random starting point. Use
	184	``--starts=n`` from the command line.
	185
	186	Simplex radius is the initial size of the simplex, as a portion of
	187	the bounds defining the parameter space. If a parameter is unbounded, then
	188	the radius will be treated as a portion of the parameter value. Use
	189	``--radius=n`` from the command line.
	190
	191	x tolerance and f(x) tolerance are used to determine when the
	192	fit has reached the point where no significant improvement is expected.
	193	If the simplex is tiny (that is, the corners are close to each other) and
	194	flat (that is, the values at the corners are close to each other),
	195	then the fit will terminate. Use ``--xtol=v`` and ``--ftol=v`` from
	196	the command line.
	197
	198	From the command line, use ``--keep_best`` so that restarts are centered on a
	199	value near the minimum rather than restarting from a random point within the
	200	parameter bounds.
	201
	202	Use ``--fit=amoeba`` to select the Nelder-Mead simplex fitter from the
	203	command line.
	204
	205	References
	206	----------
	207
	208	.. [Nelder1965]
	209	Nelder, J. A.; Mead, R.
	210	The Computer Journal
	211	1965, 7 (4), 308â313.
	212	DOI: `10.1093/comjnl/7.4.308 <http://dx.doi.org/10.1093/comjnl/7.4.308>`_
	213
	214	.. [Press1992]
	215	Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; Vetterling, W. T.
	216	In Numerical Recipes in C: The Art of Scientific Computing, Second Edition;
	217	Cambridge University Press: Cambridge; New York, 1992; pp 408â412.
	218
	219
	220	.. _fit-newton:
	221
	222	Quasi-Newton BFGS
	223	=================
	224
	225	.. image:: fit-newton.png
	226	:alt: Quasi-Newton BFGS option screen.
	227	:align: left
	228
	229	Broyden-Fletcher-Goldfarb-Shanno is a gradient descent method which uses the
	230	gradient to determine the step direction and an approximation of the Hessian
	231	matrix to estimate the curvature and guess a step size. The step is further
	232	refined with a one-dimensional search in the direction of the gradient.
	233
	234	When to use
	235	-----------
	236
	237	Like :ref:`fit-lm`, this method converges quickly to the minimum. It does
	238	not assume that the problem is in the form of a sum of squares and does not
	239	require a residuals method.
	240
	241	The $n$ partial derivatives are computed in parallel.
	242
	243	Options
	244	-------
	245
	246	Steps is the number of gradient steps to take. Each step requires
	247	a calculation of the Jacobian matrix to determine the direction. This
	248	needs $2 m n$ function evaluations, where $n$ is the number of parameters and
	249	each function is evaluated and $m$ data points (assuming center point
	250	formula for finite difference estimate of the derivative). The resulting
	251	linear equation is then solved, but for small $n$ and expensive function
	252	evaluation this overhead can be ignored.
	253	Use ``--steps=n`` from the command line.
	254
	255	Starts tells the optimizer to restart a given number of times.
	256	Each time it restarts it uses a random starting point.
	257	Use ``--starts=n`` from the command line.
	258
	259	f(x) tolerance and x tolerance are used to determine when
	260	the fit has reached the point where no significant improvement is expected.
	261	If the function is small or the step is too short then the fit
	262	will terminate. Use ``--ftol=v`` and ``--xtol=v`` from the command line.
	263
	264	From the command line, ``--keep_best`` uses a value near the previous minimum
	265	when restarting instead of using a random value within the parameter bounds.
	266
	267	Use ``--fit=newton`` to select BFGS from the commandline.
	268
	269	References
	270	----------
	271
	272	.. [Dennis1987]
	273	Dennis, J. E.; Schnabel, R. B.
	274	Numerical Methods for Unconstrained Optimization and Nonlinear Equations;
	275	Society for Industrial and Applied Mathematics: Philadelphia, 1987.
	276
	277
	278	.. _fit-de:
	279
	280	Differential Evolution
	281	======================
	282
	283	.. image:: fit-de.png
	284	:alt: Differential Evolution option screen.
	285	:align: left
	286
	287	Differential evolution is a population based algorithm which uses differences
	288	between points as a guide to selecting new points. For each member of the
	289	population a pair of points is chosen at random, and a difference vector is
	290	computed. This vector is scaled, and a random subset of its components are
	291	added to the current point based on crossover ratio. This new point is
	292	evaluated, and if its value is lower than the current point, it replaces
	293	it in the population. There are many variations available within DE that
	294	have not been exposed in Bumps. Interested users can modify
	295	:class:`bumps.fitters.DEFit` and experiment with different crossover and
	296	mutation algorithms, and perhaps add them as command line options.
	297
	298	Differential evolution is a robust directed search strategy. Early in the
	299	search, when the population is disperse, the difference vectors are large
	300	and the search remains broad. As the search progresses, more of the
	301	population goes into the valleys and eventually all the points end up in
	302	local minima. Now the differences between random pairs will often be small
	303	and the search will become more localized.
	304
	305	The population is initialized according to the prior probability distribution
	306	for each each parameter. That is, if the parameter is bounded, it will use
	307	a uniform random number generate within the bounds. If it is unbounded, it
	308	will use a uniform value in [0,1]. If the parameter corresponds to the result
	309	of a previous measurement with mean $\mu$ and standard deviation $\sigma$,
	310	then the initial values will be pulled from a gaussian random number generator.
	311
	312	When to use
	313	-----------
	314
	315	Convergence with differential evolution will be slower, but more robust.
	316
	317	Each update will evaluate $k$ points in parallel, where $k$ is the size
	318	of the population.
	319
	320	Options
	321	-------
	322
	323	Steps is the number of iterations. Each step updates each member
	324	of the population. The population size scales with the number of fitted
	325	parameters. Use ``--steps=n`` from the command line.
	326
	327	Population determines the size of the population. The number of
	328	individuals, $k$, is equal to the number of fitted parameters times the
	329	population scale factor. Use ``--pop=k`` from the command line.
	330
	331	Crossover ratio determines what proportion of the dimensions to update
	332	at each step. Smaller values will likely lead to slower convergence, but
	333	more robust results. Values must be between 0 and 1. Use ``--CR=v`` from
	334	the command line.
	335
	336	Scale determines how much to scale each difference vector before adding
	337	it to the candidate point. The selected mutation algorithm chooses a scale
	338	factor uniformly in $[0,F]$. Use ``--F=v`` from the command line.
	339
	340	f(x) tolerance and x tolerance are used to determine when the
	341	fit has reached the point where no significant improvement is expected.
	342	If the population is flat (that is, the minimum and maximum values are
	343	within tolerance) and tiny (that is, all the points are close to each
	344	other) then the fit will terminate. Use ``ftol=v`` and ``xtol=v`` from the
	345	command line.
	346
	347	Use ``--fit=de`` to select diffrential evolution from the commandline.
	348
	349	References
	350	----------
	351
	352	.. [Storn1997]
	353	Storn, R.; Price, K.
	354	Journal of Global Optimization
	355	1997, 11 (4), 341â359.
	356	DOI: `10.1023/A:1008202821328 <http://dx.doi.org/10.1023/A:1008202821328>`_
	357
	358
	359
	360	.. _fit-dream:
	361
	362	DREAM
	363	=====
	364
	365	.. image:: fit-dream.png
	366	:alt: DREAM option screen.
	367	:align: left
	368
	369	DREAM is a population based algorithm like differential evolution, but
	370	instead of only keeping individuals which improve each generation, it
	371	will sometimes keep individuals which get worse. Although it is not
	372	fast and does not give the very best value for the function, we have
	373	found it to be a robust fitting engine which will give a good value given
	374	enough time.
	375
	376	The progress of each individual in the population from generation to
	377	generation can considered a Markov chain, whose transition probability
	378	is equal to the probability of taking the step times the probability
	379	that it keeps the step based on the difference in value between the points.
	380	By including a purely random stepper with some probability, the detailed
	381	balance condition is preserved, and the Markov chain converges onto
	382	the underlying equilibrium distribution. If the theory function represents
	383	the conditional probability of selecting each point in the parameter
	384	space, then the resulting chain is a random draw from the posterior
	385	distribution.
	386
	387	This means that the DREAM algorithm can be used to determine the parameter
	388	uncertainties. Unlike the hessian estimate at the minimum that is
	389	used to report uncertainties from the other fitters, the resulting
	390	uncertainty need not gaussian. Indeed, the resulting distribution can
	391	even be multi-modal. Fits to measured data using theory functions that
	392	have symmetric solutions have shown all equivalent solutions with approximately
	393	equal probability.
	394
	395	When to use
	396	-----------
	397
	398	Use DREAM when you need a robust fitting algorithm. It takes longer but
	399	it does an excellent job of exploring different minima and getting close
	400	to the global optimum.
	401
	402	Use DREAM when you want a detailed analysis of the parameter uncertainty.
	403
	404	Like differential evolution, DREAM will evaluate $k$ points in parallel,
	405	where $k$ is the size of the population.
	406
	407	Options
	408	-------
	409
	410	Samples is the number of points to be drawn from the Markov chain.
	411	To estimate the 68% interval to two digits of precision, at least
	412	1e5 (or 100,000) samples are needed. For the 95% interval, 1e6
	413	(or 1,000,000) samples are needed. The default 1e4 samples
	414	gives a rough approximation of the uncertainty relatively quickly.
	415	Use ``--samples=n`` from the command line.
	416
	417	Burn-in Steps is the number of iterations to required for the Markov
	418	chain to converge to the equilibrium distribution. If the fit ends
	419	early, the tail of the burn will be saved to the start of the steps.
	420	Use ``--burn=n`` from the command line.
	421
	422	Population determines the size of the population. The number of
	423	individuals, $k$, is equal to the number of fitted parameters times the
	424	population scale factor. Use ``--pop=k`` from the command line.
	425
	426	Initializer determines how the population will be initialized.
	427	The options are as follows:
	428
	429	eps (epsilon ball), in which the entire initial population is chosen
	430	at random from within a tiny hypersphere centered about the initial point
	431
	432	lhs (latin hypersquare), which chops the bounds within each dimension
	433	in $k$ equal sized chunks where $k$ is the size of the population and
	434	makes sure that each parameter has at least one value within each chunk
	435	across the population.
	436
	437	cov (covariance matrix), in which the uncertainty is estimated using
	438	the covariance matrix at the initial point, and points are selected
	439	at random from the corresponding gaussian ellipsoid
	440
	441	random (uniform random), in which the points are selected at random
	442	within the bounds of the parameters
	443
	444	Use ``--init=type`` from the command line.
	445
	446
	447	Thinning is the amount of thinning to use when collecting the
	448	population. If the fit is somewhat stuck, with most steps not improving
	449	the fit, then you will need to thin the population to get proper
	450	statistics. Use ``--thin=k`` from the command line.
	451
	452	Calculate entropy, if true, computes the entropy for the fit. This is
	453	an estimate of the amount of information in the data. Use ``--entropy``
	454	from the command line.
	455
	456	Steps, if not zero, determines the number of iterations to use for
	457	drawing samples after burn in. Each iteration updates the full population,
	458	which is (population x number of fitted parameters) points. This option
	459	is available for compatibility; it is more useful to set the number of
	460	samples directly. Use ``--steps=n`` from the command line.
	461
	462	Use ``--fit=dream`` to select DREAM from the commandline.
	463
	464	Output
	465	------
	466
	467	DREAM produces a number of different outputs, and there are a number of
	468	things to check before using its reported uncertainty values. The main
	469	goal of selecting ``--burn=n`` is to wait long enough to reach the
	470	equilibrium distribution.
	471
	472	.. figure:: dream-incomplete.png
	473	:alt: example of incomplete fit
	474
	475	This DREAM fit is incomplete, as can be seen on all four plots. The
	476	Convergence plot is still decreasing, Parameter Trace plot does not
	477	show random mixing of Markov chain values, the Correlations plots are
	478	fuzzy and mostly empty, the Uncertainty plot shows black histograms
	479	(indicating that there are a few stray values far away from the best) and
	480	green maximum likelihood spikes not matching the histogram (indicating
	481	that the region around the best value has not been adequately explored).
	482
	483	.. figure:: dream-complete.png
	484	:alt: example of a completed fit
	485
	486	This DREAM fit completed successfully. The Convergence plot is flat,
	487	the Parameter Trace plot is flat and messy, the Correlateions plots
	488	show nice blobs (and a bit of correlation between the M1.radius parameter
	489	and the M1.radius.width parameter), and the uncertainty plots show
	490	a narrow range of -log(P) values in the mostly brown histograms and
	491	a good match to the green constrained maximum likelihood line.
	492
	493	For each parameter in the fit, DREAM finds the mean, median and best value,
	494	as well as the 68% and 95% credible intervals. The mean value is
	495	defined as $\int x P(x) dx$, which is just the expected value of the
	496	probability distribution for the parameter. The median value is the 50%
	497	point in the probability distribution, and the best value is the maximum
	498	likelihood value seen in the random walk. The credible intervals are the
	499	central intervals which capture 68% and 95% of the parameter values
	500	respectively. You need approximately 100,000 samples to get two digits of
	501	precision on the 68% interval, and 1,000,000 samples for the 95% interval.
	502
	503	.. table:: Example fit output
	504
	505	= =============== ============ ======== ======== ================= =================
	506	# Parameter mean median best [ 68% interval] [ 95% interval]
	507	= =============== ============ ======== ======== ================= =================
	508	1 M1.background 0.059925(41) 0.059924 0.059922 [0.05988 0.05997] [0.05985 0.06000]
	509	2 M1.radius 2345.3(15) 2345.234 2345.174 [2343.83 2346.74] [2342.36 2348.29]
	510	3 M1.radius.width 0.00775(41) 0.00774 0.00777 [ 0.0074 0.0081] [ 0.0070 0.0086]
	511	4 M1.scale 0.21722(20) 0.217218 0.217244 [0.21702 0.21743] [0.21681 0.21761]
	512	= =============== ============ ======== ======== ================= =================
	513
	514	The Convergence plot shows the range of $\chi^2$ values in the population
	515	for each iteration. The band shows the 68% of values around the median, and
	516	the solid line shows the minimum value. If the distribution has reached
	517	equilibrium, then convergence graph should be roughly flat, with little
	518	change in the minimum value throughout the graph. If there is no convergence,
	519	then the remaining plots don't mean much.
	520
	521	The Correlations plot shows cross correlation between each pair of
	522	parameters. If the parameters are completely uncorrelated then the boxes
	523	should contain circles. Diagonals indicate strong correlation. Square
	524	blocks indicate that the fit is not sensitive to one of the parameters.
	525	The range plotted on the correlation plot is determined by the 95% interval
	526	of the data. The individual correlation plots are too small to show the
	527	range of values for the parameters. These can instead be read from the
	528	Uncertainty plot for each parameter, which covers the same range of values
	529	and indicates 68% and 95% intervals. If there are some chains that are
	530	wandering around away from the minimum, then the plot will look fuzzy, and
	531	not have a nice blob in the center. If a correlation plot has multiple blobs,
	532	then there are multiple minima in your problem space, usually because there
	533	are symmetries in the problem definition. For example, a model fitting
	534	$x + a^2$ will have identical solutions for $\pm\,a$.
	535
	536	The Uncertainty plot shows histograms for each fitted parameter generated
	537	from the values for that parameter across all chains. Within each histogram
	538	bar the values are sorted and displayed as a gradient from black to copper,
	539	with black values having the lowest $\chi^2$ and copper values having the
	540	highest. The resulting histogram should be dark brown, with a black hump
	541	in the center and light brown tips. If there are large lumps of light brown,
	542	or excessive black then its likely that the optimizer did not converge. The
	543	green line over the histogram shows the best value seen within each
	544	histogram bin (the maximum likelihood given $p_k == x$).
	545	With enough samples and proper convergence, it should roughly follow the
	546	outline of the histogram. The yellow band in the center of the plot
	547	represents the 68% interval for the data. The histogram cuts off at 95%.
	548	These values along with the median are shown as labels along the x axis.
	549	The green asterisk represents the best value, the green E the mean value
	550	and the vertical green line the median value. If the fit is not sensitive
	551	to a parameter, or if two parameters are strongly correlated, the parameter
	552	histogram will show a box rather than a hump. Spiky shapes (either in the
	553	histogram or the maximum likelihood line) indicate lack of convergence or
	554	maybe not enough steps. A chopped histograms indicates that the range for
	555	that parameter is too small.
	556
	557	The Parameter Trace plot is diagnostic for models which have poor mixing.
	558	In this cases no matter how the parameter values are changing, they are
	559	landing on much worse values for the $\chi^2$. This can happen if the
	560	problem is highly constrained with many tight and twisty values.
	561
	562	The Data and Theory plot should show theory and data lining up pretty well,
	563	with the theory overlaying about 2/3 of the error bars on the data
	564	(1-\ $\sigma$ = 68%). The Residuals plot shows the difference between
	565	theory and data divided by uncertainty. The residuals should be 2/3 within
	566	[-1, 1], They should not show any structure, such as humps where the theory
	567	misses the data for long stretches. This indicates some feature missing
	568	from the model, or a lack of convergence to the best model.
	569
	570	If entropy is requested, then bumps will show the total number of bits of
	571	information in the fit. This derives from the entropy term:
	572
	573	.. math:
	574
	575	S = \int_\Theta p(\Theta) \log p(\Theta) d\Theta
	576
	577	Using entropy and simulation we hope to be able to make experiment
	578	planning decisions in a way that maximizes information, by estimating
	579	whether it is better to measure more precisely or to measure different
	580	but related values and fit them with shared parameters.
	581
	582
	583	References
	584	----------
	585
	586	.. [Vrugt2009]
	587	Vrugt, J. A.; Ter Braak, C. J. F.; Diks, C. G. H.; Robinson, B. A.;
	588	Hyman, J. M.; Higdon, D.
	589	International Journal of Nonlinear Sciences and Numerical Simulation
	590	2009, 10 (3), 273â290.
	591	DOI: `10.1515/IJNSNS.2009.10.3.273 <http://dx.doi.org/10.1515/IJNSNS.2009.10.3.273>`_
	592
	593	.. [Kramer2010]
	594	Kramer, A.; Hasenauer, J.; Allgower, F.; Radde, N.
	595	In 2010 IEEE International Conference on Control Applications (CCA)
	596	2010; pp 493â498.
	597	DOI: `10.1109/CCA.2010.5611198 <http://dx.doi.org/10.1109/CCA.2010.5611198>`_
	598
	599	.. [JCGM2008]
	600	JCGM.
	601	*Evaluation of measurement data â Supplement 1 to the âGuide to the
	602	expression of uncertainty in measurementâ â Propagation of distributions
	603	using a Monte Carlo method*; Joint Committee for Guides in Metrology,
	604	JCGM 101:2008; Geneva, Switzerland, 2008; p 90.
	605	`<http://www.bipm.org/utils/common/documents/jcgm/JCGM_101_2008_E.pdf>`_
	606
	607
	608
	609	.. _fit-ps:
	610
	611	Particle Swarm
	612	==============
	613
	614	Inspired by bird flocking behaviour, the particle swarm algorithm is a
	615	population-based method which updates an individual according to its
	616	momentum and a force toward the current best fit parameter values. We
	617	did not explore variations of this algorithm in any detail.
	618
	619	When to use
	620	-----------
	621
	622	Particle swarm performed well enough in our low dimensional test problems,
	623	but made little progress when more fit parameters were added.
	624
	625	The population updates can run in parallel, but the tiny population size
	626	limits the amount of parallelism.
	627
	628	Options
	629	-------
	630
	631	``--steps=n`` is the number of iterations. Each step updates each member
	632	of the population. The population size scales with the number of fitted
	633	parameters.
	634
	635	``--pop=k`` determines the size of the population. The number of
	636	individuals, $k$, is equal to the number of fitted parameters times the
	637	population scale factor. The default scale factor is 1.
	638
	639	Use ``--fit=ps`` to select particle swarm from the commandline.
	640
	641	Add a few more lines
	642
	643	References
	644	----------
	645
	646	.. [Kennedy1995]
	647	Kennedy, J.; Eberhart, R.
	648	Particle Swarm Optimization
	649	Proceedings of IEEE International Conference on Neural Networks. IV.
	650	1995; pp 1942â1948.
	651	DOI: `10.1109/ICNN.1995.48896 <http://dx.doi.org/810.1109/ICNN.1995.488968>`_
	652
	653
	654	.. _fit-rl:
	655
	656	Random Lines
	657	============
	658
	659	Most of the population based algorithms ignore the value of the function
	660	when choosing the points in the next iteration. Random lines is a new
	661	style of algorithm which fits a quadratic model to a selection from the
	662	population, and uses that model to propose a new point in the next
	663	generation of the population. The hope is that the method will inherit
	664	the robustness of the population based algorithms as well as the rapid
	665	convergence of the newton descent algorithms.
	666
	667	When to use
	668	-----------
	669
	670	Random lines works very well for some of our test problems, showing
	671	rapid convergence to the optimum, but on other problems it makes
	672	very little progress.
	673
	674	The population updates can run in parallel.
	675
	676	Options
	677	-------
	678
	679	``--steps=n`` is the number of iterations. Each step updates each member
	680	of the population. The population size scales with the number of fitted
	681	parameters.
	682
	683	``--pop=k`` determines the size of the population. The number of
	684	individuals, $k$, is equal to the number of fitted parameters times the
	685	population scale factor. The default scale factor is 0.5.
	686
	687	``--CR=v`` is the crossover ratio, determining what proportion of the
	688	dimensions to update at each step. Values must be between 0 and 1.
	689
	690	``--starts=n`` tells the optimizer to restart a given number of times.
	691	Each time it restarts it uses a random starting point.
	692
	693	``--keep_best`` uses a value near the previous minimum when restarting
	694	instead of using a random value within the parameter bounds. This option is
	695	not available in the options dialog.
	696
	697	Use ``--fit=rl`` to select random lines from the commandline.
	698
	699	References
	700	----------
	701
	702	.. [Sahin2013]
	703
	704	Sahin, I.
	705	An International Journal of Optimization and Control: Theories & Applications (IJOCTA)
	706	2013, 3 (2), 111â119.
	707
	708
	709
	710	.. _fit-pt:
	711
	712	Parallel Tempering
	713	==================
	714
	715	Parallel tempering is an MCMC algorithm for uncertainty analysis. This
	716	version runs at multiple temperatures simultaneously, with chains at high
	717	temperature able to more easily jump between minima and chains at low
	718	temperature to fully explore the minima. Like :ref:`fit-dream` it has a
	719	differential evolution stepper, but this version uses the chain history
	720	as the population rather than maintaining a population at each temperature.
	721
	722	This is an experimental algorithm which does not yet perform well.
	723
	724	When to use
	725	-----------
	726
	727	When complete, parallel tempering should be used for problems with widely
	728	spaced local minima which dream cannot fit.
	729
	730	Options
	731	-------
	732
	733	``--steps=n`` is the number of iterations to include in the Markov
	734	chain. Each iteration updates the full population. The population size
	735	scales with the number of fitted parameters.
	736
	737	``--burn=n`` is the number of iterations to required for the Markov
	738	chain to converge to the equilibrium distribution. If the fit ends
	739	early, the tail of the burn will be saved to the start of the steps.
	740
	741	``--CR=v`` is the differential evolution crossover ratio to use when
	742	computing step size and direction. Use a small value to step through the
	743	dimensions one at a time, or a large value to step through all at once.
	744
	745	``-nT=k``, ``-Tmin=v`` and ``--Tmax=v`` specify a log-spaced initial
	746	distribution of temperatures. The default is 25 points between
	747	0.1 and 10. :ref:`fit-dream` runs at a fixed temperature of 1.0.
	748
	749	Use ``--fit=pt`` to select parallel tempering from the commandline.
	750
	751	References
	752	----------
	753
	754	.. [Swendsen1986]
	755	Swendsen, R. H.; Wang J. S.
	756	Replica Monte Carlo simulation of spin glasses
	757	Physical Review Letters
	758	1986, 57, 2607-2609
	759
	760
	761	..
	762	SNOBFIT (fit=snobfit) attempts to construct a locally quadratic model of
	763	the entire search space. While promising because it can begin to offer
	764	some guarantees that the search is complete given reasonable assumptions
	765	about the fitting surface, initial trials did not perform well and the
	766	algorithm has not yet been tuned to our problems.
	767

Note: See TracBrowser for help on using the repository browser.

SasView

source: sasview/docs/sphinx-docs/source/user/sasgui/perspectives/fitting/optimizer.rst @ 197260f

Download in other formats: