[49148bb] | 1 | .. _optimizer-guide: |
---|
| 2 | |
---|
| 3 | ******************* |
---|
| 4 | Optimizer Selection |
---|
| 5 | ******************* |
---|
| 6 | |
---|
| 7 | Bumps has a number of different optimizers available, each with its own |
---|
| 8 | control parameters: |
---|
| 9 | |
---|
| 10 | * :ref:`fit-lm` |
---|
| 11 | * :ref:`fit-amoeba` |
---|
| 12 | * :ref:`fit-dream` |
---|
| 13 | * :ref:`fit-de` |
---|
| 14 | * :ref:`fit-newton` |
---|
| 15 | * :ref:`fit-rl` [experimental] |
---|
| 16 | * :ref:`fit-ps` [experimental] |
---|
| 17 | * :ref:`fit-pt` [experimental] |
---|
| 18 | |
---|
| 19 | In general there is a trade-off between convergence |
---|
| 20 | rate and robustness, with the fastest algorithms most likely to find a |
---|
| 21 | local minimum rather than a global minimum. The gradient descent algorithms |
---|
| 22 | (:ref:`fit-lm`, :ref:`fit-newton`) tend to be fast but they will find local |
---|
| 23 | minima only, while the population algorithms (:ref:`fit-dream`, :ref:`fit-de`) |
---|
| 24 | are more robust and likely slower. :ref:`fit-amoeba` is somewhere between, |
---|
| 25 | with a small population keeping the search local but more robust than the |
---|
| 26 | gradient descent algorithms. |
---|
| 27 | |
---|
| 28 | Each algorithm has its own set of control parameters for adjusting the |
---|
| 29 | search process and the stopping conditions. The same option may mean |
---|
| 30 | slightly different things to different optimizers. The bumps package |
---|
| 31 | provides a dialog box for selecting the optimizer and its options |
---|
| 32 | when running the fit wx application. This only includes the common options |
---|
| 33 | for the most useful optimizers. For full control, the fit will need to |
---|
| 34 | be run from the command line interface or through a python script. |
---|
| 35 | |
---|
| 36 | For parameter uncertainty, most algorithms use the covariance matrix at |
---|
| 37 | the optimum to estimate an uncertainty ellipse. This is okay for a |
---|
| 38 | preliminary analysis, but only works reliably for weakly correlated parameters. |
---|
| 39 | For full uncertainty analysis, :ref:`fit-dream` uses a random walk to explore |
---|
| 40 | the parameter space near the minimum, showing pair-wise correlations |
---|
| 41 | amongst the parameter values. In order for :ref:`fit-dream` to return the |
---|
| 42 | correct uncertainy, the function to be optimized should be a conditional |
---|
| 43 | probability density, with *nllf* as the negative log likelihood function |
---|
| 44 | of seeing point $x$ in the parameter space. Other functions |
---|
| 45 | can be fitted, but uncertainty estimates will be meaningless. |
---|
| 46 | |
---|
| 47 | Most algorithms have been adapted to run in parallel at least to some degree. |
---|
| 48 | The implementation is not heavily tuned, either in terms of minimizing the |
---|
| 49 | overhead per function evaluation or for distributing the problem across |
---|
| 50 | multiple processors. If the theory function is implemented in parallel, |
---|
| 51 | then the optimizer should be run in serial. Mixed mode is also possible |
---|
| 52 | when running on a cluster with a multi-threaded theory function. In this |
---|
| 53 | case, only one theory function will be evaluated on each cluster node, but |
---|
| 54 | the optimizer will distribute the parameters values to the cluster nodes |
---|
| 55 | in parallel. Do not run serial algorithms (:ref:`fit-lm`, :ref:`fit-newton`) on |
---|
| 56 | a cluster. |
---|
| 57 | |
---|
| 58 | We have included a number of optimizers in Bumps that did not perform |
---|
| 59 | particularly well on our problem sets. However, they may be perfect |
---|
| 60 | for your problem, so we have left them in the package for you to explore. |
---|
| 61 | They are not available in the GUI selection. |
---|
| 62 | |
---|
| 63 | .. _fit-lm: |
---|
| 64 | |
---|
| 65 | Levenberg-Marquardt |
---|
| 66 | =================== |
---|
| 67 | |
---|
| 68 | .. image:: fit-lm.png |
---|
| 69 | :alt: Levenberg-Marquardt option screen. |
---|
| 70 | :align: left |
---|
| 71 | |
---|
| 72 | The Levenberg-Marquardt algorithm has been |
---|
| 73 | the standard method for non-linear data fitting. As a gradient descent |
---|
| 74 | trust region method, it starts at the initial value of the function and |
---|
| 75 | steps in the direction of the derivative until it reaches the minimum. |
---|
| 76 | Set up as an explicit minimization of the sum of square differences between |
---|
| 77 | theory and model, it uses a numerical approximation of the Jacobian matrix |
---|
| 78 | to set the step direction and an adaptive algorithm to set the size of |
---|
| 79 | the trust region. |
---|
| 80 | |
---|
| 81 | When to use |
---|
| 82 | ----------- |
---|
| 83 | |
---|
| 84 | Use this method when you have a reasonable fit near the minimum, and |
---|
| 85 | you want to get the best possible value. This can then be used as the starting |
---|
| 86 | point for uncertainty analysis using :ref:`fit-dream`. This method requires |
---|
| 87 | that the problem definition includes a *residuals* method, but this should |
---|
| 88 | always be true when fitting data. |
---|
| 89 | |
---|
| 90 | When modeling the results of an experiment, the best fit value is an |
---|
| 91 | accident of the measurement. Redo the same measurement, and the slightly |
---|
| 92 | different values you measure will lead to a different best fit. The |
---|
| 93 | important quantity to report is the credible interval covering |
---|
| 94 | 68% (1-\ $\sigma$) or 95% (2-\ $\sigma$\ ) of the range of |
---|
| 95 | parameter values that are somewhat consistent with the data. |
---|
| 96 | |
---|
| 97 | This method uses *lmfit* from *scipy*, and does not run in parallel. |
---|
| 98 | |
---|
| 99 | Options |
---|
| 100 | ------- |
---|
| 101 | |
---|
| 102 | *Steps* is the number of gradient steps to take. Each step requires |
---|
| 103 | a calculation of the Jacobian matrix to determine the direction. This |
---|
| 104 | needs $2 m n$ function evaluations, where $n$ is the number of parameters and |
---|
| 105 | each function is evaluated and $m$ data points (assuming center point |
---|
| 106 | formula for finite difference estimate of the derivative). The resulting |
---|
| 107 | linear equation is then solved, but for small $n$ and expensive function |
---|
| 108 | evaluation this overhead can be ignored. Use ``--steps=n`` from |
---|
| 109 | the command line. |
---|
| 110 | |
---|
| 111 | *f(x) tolerance* and *x tolerance* are used to determine when |
---|
| 112 | the fit has reached the point where no significant improvement is expected. |
---|
| 113 | If the function value does not improve significantly within the step, or |
---|
| 114 | the step is too short, then the fit will terminate. Use ``--ftol=v`` and |
---|
| 115 | ``--xtol=v`` from the command line. |
---|
| 116 | |
---|
| 117 | From the command line, ``--starts=n`` will automatically restart the algorithm |
---|
| 118 | after it has converged so that a slightly better value can be found. If |
---|
| 119 | ``--keep_best`` is included then restart will use a value near the minimum, |
---|
| 120 | otherwise it will restart the fit from a random point in the parameter space. |
---|
| 121 | |
---|
| 122 | Use ``--fit=lm`` to select the Levenberg-Marquardt fitter from the command line. |
---|
| 123 | |
---|
| 124 | References |
---|
| 125 | ---------- |
---|
| 126 | |
---|
| 127 | .. [Levenberg1944] |
---|
| 128 | Levenberg, K. |
---|
| 129 | *Quarterly Journal of Applied Mathmatics* |
---|
| 130 | 1944, II (2), 164â168. |
---|
| 131 | |
---|
| 132 | .. [Marquardt1963] |
---|
| 133 | Marquardt, D. W. |
---|
| 134 | *Journal of the Society for Industrial and Applied Mathematics* |
---|
| 135 | 1963, 11 (2), 431â441. |
---|
| 136 | DOI: `10.1137/0111030 <http://dx.doi.org/10.1137/0111030>`_ |
---|
| 137 | |
---|
| 138 | .. _fit-amoeba: |
---|
| 139 | |
---|
| 140 | Nelder-Mead Simplex |
---|
| 141 | =================== |
---|
| 142 | |
---|
| 143 | .. image:: fit-amoeba.png |
---|
| 144 | :alt: Nelder-Mead Simplex option screen. |
---|
| 145 | :align: left |
---|
| 146 | |
---|
| 147 | The Nelder-Mead downhill simplex algorithm is a robust optimizer which |
---|
| 148 | does not require the function to be continuous or differentiable. |
---|
| 149 | It uses the relative values of the function at the corners of a |
---|
| 150 | simplex (an n-dimensional triangle) to decide which points of the simplex |
---|
| 151 | to update. It will take the worst value and try moving it inward or |
---|
| 152 | outward, or reflect it through the centroid of the remaining values |
---|
| 153 | stopping if it finds a better value. If none of these values are |
---|
| 154 | better, then it will shrink the simplex and start again. The name |
---|
| 155 | amoeba comes from the book *Numerical Recipes* [Press1992]_ wherein they |
---|
| 156 | describe the search as acting like an amoeba, squeezing through narrow valleys |
---|
| 157 | as it makes its way down to the minimum. |
---|
| 158 | |
---|
| 159 | When to use |
---|
| 160 | ----------- |
---|
| 161 | |
---|
| 162 | Use this method as a first fit to your model. If your fitting function |
---|
| 163 | is well behaved with few local minima this will give a quick estimate of |
---|
| 164 | the model, and help you decide if the model needs to be refined. If your |
---|
| 165 | function is poorly behaved, you will need to select a good initial value |
---|
| 166 | before fitting, or use a more robust method such |
---|
| 167 | as :ref:`fit-de` or :ref:`fit-dream`. |
---|
| 168 | |
---|
| 169 | The uncertainty reported comes from a numerical derivative estimate at the |
---|
| 170 | minimum. |
---|
| 171 | |
---|
| 172 | This method requires a series of function updates, and does not benefit |
---|
| 173 | much from running in parallel. |
---|
| 174 | |
---|
| 175 | Options |
---|
| 176 | ------- |
---|
| 177 | |
---|
| 178 | *Steps* is the simplex update iterations to perform. Most updates |
---|
| 179 | require one or two function evaluations, but shrinking the simplex evaluates |
---|
| 180 | every value in the simplex. Use ``--steps=n`` from the command line. |
---|
| 181 | |
---|
| 182 | *Starts* tells the optimizer to restart a given number of times. |
---|
| 183 | Each time it restarts it uses a random starting point. Use |
---|
| 184 | ``--starts=n`` from the command line. |
---|
| 185 | |
---|
| 186 | *Simplex radius* is the initial size of the simplex, as a portion of |
---|
| 187 | the bounds defining the parameter space. If a parameter is unbounded, then |
---|
| 188 | the radius will be treated as a portion of the parameter value. Use |
---|
| 189 | ``--radius=n`` from the command line. |
---|
| 190 | |
---|
| 191 | *x tolerance* and *f(x) tolerance* are used to determine when the |
---|
| 192 | fit has reached the point where no significant improvement is expected. |
---|
| 193 | If the simplex is tiny (that is, the corners are close to each other) and |
---|
| 194 | flat (that is, the values at the corners are close to each other), |
---|
| 195 | then the fit will terminate. Use ``--xtol=v`` and ``--ftol=v`` from |
---|
| 196 | the command line. |
---|
| 197 | |
---|
| 198 | From the command line, use ``--keep_best`` so that restarts are centered on a |
---|
| 199 | value near the minimum rather than restarting from a random point within the |
---|
| 200 | parameter bounds. |
---|
| 201 | |
---|
| 202 | Use ``--fit=amoeba`` to select the Nelder-Mead simplex fitter from the |
---|
| 203 | command line. |
---|
| 204 | |
---|
| 205 | References |
---|
| 206 | ---------- |
---|
| 207 | |
---|
| 208 | .. [Nelder1965] |
---|
| 209 | Nelder, J. A.; Mead, R. |
---|
| 210 | *The Computer Journal* |
---|
| 211 | 1965, 7 (4), 308â313. |
---|
| 212 | DOI: `10.1093/comjnl/7.4.308 <http://dx.doi.org/10.1093/comjnl/7.4.308>`_ |
---|
| 213 | |
---|
| 214 | .. [Press1992] |
---|
| 215 | Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; Vetterling, W. T. |
---|
| 216 | In *Numerical Recipes in C: The Art of Scientific Computing, Second Edition*; |
---|
| 217 | Cambridge University Press: Cambridge; New York, 1992; pp 408â412. |
---|
| 218 | |
---|
| 219 | |
---|
| 220 | .. _fit-newton: |
---|
| 221 | |
---|
| 222 | Quasi-Newton BFGS |
---|
| 223 | ================= |
---|
| 224 | |
---|
| 225 | .. image:: fit-newton.png |
---|
| 226 | :alt: Quasi-Newton BFGS option screen. |
---|
| 227 | :align: left |
---|
| 228 | |
---|
| 229 | Broyden-Fletcher-Goldfarb-Shanno is a gradient descent method which uses the |
---|
| 230 | gradient to determine the step direction and an approximation of the Hessian |
---|
| 231 | matrix to estimate the curvature and guess a step size. The step is further |
---|
| 232 | refined with a one-dimensional search in the direction of the gradient. |
---|
| 233 | |
---|
| 234 | When to use |
---|
| 235 | ----------- |
---|
| 236 | |
---|
| 237 | Like :ref:`fit-lm`, this method converges quickly to the minimum. It does |
---|
| 238 | not assume that the problem is in the form of a sum of squares and does not |
---|
| 239 | require a *residuals* method. |
---|
| 240 | |
---|
| 241 | The $n$ partial derivatives are computed in parallel. |
---|
| 242 | |
---|
| 243 | Options |
---|
| 244 | ------- |
---|
| 245 | |
---|
| 246 | *Steps* is the number of gradient steps to take. Each step requires |
---|
| 247 | a calculation of the Jacobian matrix to determine the direction. This |
---|
| 248 | needs $2 m n$ function evaluations, where $n$ is the number of parameters and |
---|
| 249 | each function is evaluated and $m$ data points (assuming center point |
---|
| 250 | formula for finite difference estimate of the derivative). The resulting |
---|
| 251 | linear equation is then solved, but for small $n$ and expensive function |
---|
| 252 | evaluation this overhead can be ignored. |
---|
| 253 | Use ``--steps=n`` from the command line. |
---|
| 254 | |
---|
| 255 | *Starts* tells the optimizer to restart a given number of times. |
---|
| 256 | Each time it restarts it uses a random starting point. |
---|
| 257 | Use ``--starts=n`` from the command line. |
---|
| 258 | |
---|
| 259 | *f(x) tolerance* and *x tolerance* are used to determine when |
---|
| 260 | the fit has reached the point where no significant improvement is expected. |
---|
| 261 | If the function is small or the step is too short then the fit |
---|
| 262 | will terminate. Use ``--ftol=v`` and ``--xtol=v`` from the command line. |
---|
| 263 | |
---|
| 264 | From the command line, ``--keep_best`` uses a value near the previous minimum |
---|
| 265 | when restarting instead of using a random value within the parameter bounds. |
---|
| 266 | |
---|
| 267 | Use ``--fit=newton`` to select BFGS from the commandline. |
---|
| 268 | |
---|
| 269 | References |
---|
| 270 | ---------- |
---|
| 271 | |
---|
| 272 | .. [Dennis1987] |
---|
| 273 | Dennis, J. E.; Schnabel, R. B. |
---|
| 274 | *Numerical Methods for Unconstrained Optimization and Nonlinear Equations*; |
---|
| 275 | Society for Industrial and Applied Mathematics: Philadelphia, 1987. |
---|
| 276 | |
---|
| 277 | |
---|
| 278 | .. _fit-de: |
---|
| 279 | |
---|
| 280 | Differential Evolution |
---|
| 281 | ====================== |
---|
| 282 | |
---|
| 283 | .. image:: fit-de.png |
---|
| 284 | :alt: Differential Evolution option screen. |
---|
| 285 | :align: left |
---|
| 286 | |
---|
| 287 | Differential evolution is a population based algorithm which uses differences |
---|
| 288 | between points as a guide to selecting new points. For each member of the |
---|
| 289 | population a pair of points is chosen at random, and a difference vector is |
---|
| 290 | computed. This vector is scaled, and a random subset of its components are |
---|
| 291 | added to the current point based on crossover ratio. This new point is |
---|
| 292 | evaluated, and if its value is lower than the current point, it replaces |
---|
| 293 | it in the population. There are many variations available within DE that |
---|
| 294 | have not been exposed in Bumps. Interested users can modify |
---|
| 295 | :class:`bumps.fitters.DEFit` and experiment with different crossover and |
---|
| 296 | mutation algorithms, and perhaps add them as command line options. |
---|
| 297 | |
---|
| 298 | Differential evolution is a robust directed search strategy. Early in the |
---|
| 299 | search, when the population is disperse, the difference vectors are large |
---|
| 300 | and the search remains broad. As the search progresses, more of the |
---|
| 301 | population goes into the valleys and eventually all the points end up in |
---|
| 302 | local minima. Now the differences between random pairs will often be small |
---|
| 303 | and the search will become more localized. |
---|
| 304 | |
---|
| 305 | The population is initialized according to the prior probability distribution |
---|
| 306 | for each each parameter. That is, if the parameter is bounded, it will use |
---|
| 307 | a uniform random number generate within the bounds. If it is unbounded, it |
---|
| 308 | will use a uniform value in [0,1]. If the parameter corresponds to the result |
---|
| 309 | of a previous measurement with mean $\mu$ and standard deviation $\sigma$, |
---|
| 310 | then the initial values will be pulled from a gaussian random number generator. |
---|
| 311 | |
---|
| 312 | When to use |
---|
| 313 | ----------- |
---|
| 314 | |
---|
| 315 | Convergence with differential evolution will be slower, but more robust. |
---|
| 316 | |
---|
| 317 | Each update will evaluate $k$ points in parallel, where $k$ is the size |
---|
| 318 | of the population. |
---|
| 319 | |
---|
| 320 | Options |
---|
| 321 | ------- |
---|
| 322 | |
---|
| 323 | *Steps* is the number of iterations. Each step updates each member |
---|
| 324 | of the population. The population size scales with the number of fitted |
---|
| 325 | parameters. Use ``--steps=n`` from the command line. |
---|
| 326 | |
---|
| 327 | *Population* determines the size of the population. The number of |
---|
| 328 | individuals, $k$, is equal to the number of fitted parameters times the |
---|
| 329 | population scale factor. Use ``--pop=k`` from the command line. |
---|
| 330 | |
---|
| 331 | *Crossover ratio* determines what proportion of the dimensions to update |
---|
| 332 | at each step. Smaller values will likely lead to slower convergence, but |
---|
| 333 | more robust results. Values must be between 0 and 1. Use ``--CR=v`` from |
---|
| 334 | the command line. |
---|
| 335 | |
---|
| 336 | *Scale* determines how much to scale each difference vector before adding |
---|
| 337 | it to the candidate point. The selected mutation algorithm chooses a scale |
---|
| 338 | factor uniformly in $[0,F]$. Use ``--F=v`` from the command line. |
---|
| 339 | |
---|
| 340 | *f(x) tolerance* and *x tolerance* are used to determine when the |
---|
| 341 | fit has reached the point where no significant improvement is expected. |
---|
| 342 | If the population is flat (that is, the minimum and maximum values are |
---|
| 343 | within tolerance) and tiny (that is, all the points are close to each |
---|
| 344 | other) then the fit will terminate. Use ``ftol=v`` and ``xtol=v`` from the |
---|
| 345 | command line. |
---|
| 346 | |
---|
| 347 | Use ``--fit=de`` to select diffrential evolution from the commandline. |
---|
| 348 | |
---|
| 349 | References |
---|
| 350 | ---------- |
---|
| 351 | |
---|
| 352 | .. [Storn1997] |
---|
| 353 | Storn, R.; Price, K. |
---|
| 354 | *Journal of Global Optimization* |
---|
| 355 | 1997, 11 (4), 341â359. |
---|
| 356 | DOI: `10.1023/A:1008202821328 <http://dx.doi.org/10.1023/A:1008202821328>`_ |
---|
| 357 | |
---|
| 358 | |
---|
| 359 | |
---|
| 360 | .. _fit-dream: |
---|
| 361 | |
---|
| 362 | DREAM |
---|
| 363 | ===== |
---|
| 364 | |
---|
| 365 | .. image:: fit-dream.png |
---|
| 366 | :alt: DREAM option screen. |
---|
| 367 | :align: left |
---|
| 368 | |
---|
| 369 | DREAM is a population based algorithm like differential evolution, but |
---|
| 370 | instead of only keeping individuals which improve each generation, it |
---|
| 371 | will sometimes keep individuals which get worse. Although it is not |
---|
| 372 | fast and does not give the very best value for the function, we have |
---|
| 373 | found it to be a robust fitting engine which will give a good value given |
---|
| 374 | enough time. |
---|
| 375 | |
---|
| 376 | The progress of each individual in the population from generation to |
---|
| 377 | generation can considered a Markov chain, whose transition probability |
---|
| 378 | is equal to the probability of taking the step times the probability |
---|
| 379 | that it keeps the step based on the difference in value between the points. |
---|
| 380 | By including a purely random stepper with some probability, the detailed |
---|
| 381 | balance condition is preserved, and the Markov chain converges onto |
---|
| 382 | the underlying equilibrium distribution. If the theory function represents |
---|
| 383 | the conditional probability of selecting each point in the parameter |
---|
| 384 | space, then the resulting chain is a random draw from the posterior |
---|
| 385 | distribution. |
---|
| 386 | |
---|
| 387 | This means that the DREAM algorithm can be used to determine the parameter |
---|
| 388 | uncertainties. Unlike the hessian estimate at the minimum that is |
---|
| 389 | used to report uncertainties from the other fitters, the resulting |
---|
| 390 | uncertainty need not gaussian. Indeed, the resulting distribution can |
---|
| 391 | even be multi-modal. Fits to measured data using theory functions that |
---|
| 392 | have symmetric solutions have shown all equivalent solutions with approximately |
---|
| 393 | equal probability. |
---|
| 394 | |
---|
| 395 | When to use |
---|
| 396 | ----------- |
---|
| 397 | |
---|
| 398 | Use DREAM when you need a robust fitting algorithm. It takes longer but |
---|
| 399 | it does an excellent job of exploring different minima and getting close |
---|
| 400 | to the global optimum. |
---|
| 401 | |
---|
| 402 | Use DREAM when you want a detailed analysis of the parameter uncertainty. |
---|
| 403 | |
---|
| 404 | Like differential evolution, DREAM will evaluate $k$ points in parallel, |
---|
| 405 | where $k$ is the size of the population. |
---|
| 406 | |
---|
| 407 | Options |
---|
| 408 | ------- |
---|
| 409 | |
---|
| 410 | *Samples* is the number of points to be drawn from the Markov chain. |
---|
| 411 | To estimate the 68% interval to two digits of precision, at least |
---|
| 412 | 1e5 (or 100,000) samples are needed. For the 95% interval, 1e6 |
---|
| 413 | (or 1,000,000) samples are needed. The default 1e4 samples |
---|
| 414 | gives a rough approximation of the uncertainty relatively quickly. |
---|
| 415 | Use ``--samples=n`` from the command line. |
---|
| 416 | |
---|
| 417 | *Burn-in Steps* is the number of iterations to required for the Markov |
---|
| 418 | chain to converge to the equilibrium distribution. If the fit ends |
---|
| 419 | early, the tail of the burn will be saved to the start of the steps. |
---|
| 420 | Use ``--burn=n`` from the command line. |
---|
| 421 | |
---|
| 422 | *Population* determines the size of the population. The number of |
---|
| 423 | individuals, $k$, is equal to the number of fitted parameters times the |
---|
| 424 | population scale factor. Use ``--pop=k`` from the command line. |
---|
| 425 | |
---|
| 426 | *Initializer* determines how the population will be initialized. |
---|
| 427 | The options are as follows: |
---|
| 428 | |
---|
| 429 | *eps* (epsilon ball), in which the entire initial population is chosen |
---|
| 430 | at random from within a tiny hypersphere centered about the initial point |
---|
| 431 | |
---|
| 432 | *lhs* (latin hypersquare), which chops the bounds within each dimension |
---|
| 433 | in $k$ equal sized chunks where $k$ is the size of the population and |
---|
| 434 | makes sure that each parameter has at least one value within each chunk |
---|
| 435 | across the population. |
---|
| 436 | |
---|
| 437 | *cov* (covariance matrix), in which the uncertainty is estimated using |
---|
| 438 | the covariance matrix at the initial point, and points are selected |
---|
| 439 | at random from the corresponding gaussian ellipsoid |
---|
| 440 | |
---|
| 441 | *random* (uniform random), in which the points are selected at random |
---|
| 442 | within the bounds of the parameters |
---|
| 443 | |
---|
| 444 | Use ``--init=type`` from the command line. |
---|
| 445 | |
---|
| 446 | |
---|
| 447 | *Thinning* is the amount of thinning to use when collecting the |
---|
| 448 | population. If the fit is somewhat stuck, with most steps not improving |
---|
| 449 | the fit, then you will need to thin the population to get proper |
---|
| 450 | statistics. Use ``--thin=k`` from the command line. |
---|
| 451 | |
---|
| 452 | *Calculate entropy*, if true, computes the entropy for the fit. This is |
---|
| 453 | an estimate of the amount of information in the data. Use ``--entropy`` |
---|
| 454 | from the command line. |
---|
| 455 | |
---|
| 456 | *Steps*, if not zero, determines the number of iterations to use for |
---|
| 457 | drawing samples after burn in. Each iteration updates the full population, |
---|
| 458 | which is (population x number of fitted parameters) points. This option |
---|
| 459 | is available for compatibility; it is more useful to set the number of |
---|
| 460 | samples directly. Use ``--steps=n`` from the command line. |
---|
| 461 | |
---|
| 462 | Use ``--fit=dream`` to select DREAM from the commandline. |
---|
| 463 | |
---|
| 464 | Output |
---|
| 465 | ------ |
---|
| 466 | |
---|
| 467 | DREAM produces a number of different outputs, and there are a number of |
---|
| 468 | things to check before using its reported uncertainty values. The main |
---|
| 469 | goal of selecting ``--burn=n`` is to wait long enough to reach the |
---|
| 470 | equilibrium distribution. |
---|
| 471 | |
---|
| 472 | .. figure:: dream-incomplete.png |
---|
| 473 | :alt: example of incomplete fit |
---|
| 474 | |
---|
| 475 | This DREAM fit is incomplete, as can be seen on all four plots. The |
---|
| 476 | *Convergence* plot is still decreasing, *Parameter Trace* plot does not |
---|
| 477 | show random mixing of Markov chain values, the *Correlations* plots are |
---|
| 478 | fuzzy and mostly empty, the *Uncertainty* plot shows black histograms |
---|
| 479 | (indicating that there are a few stray values far away from the best) and |
---|
| 480 | green maximum likelihood spikes not matching the histogram (indicating |
---|
| 481 | that the region around the best value has not been adequately explored). |
---|
| 482 | |
---|
| 483 | .. figure:: dream-complete.png |
---|
| 484 | :alt: example of a completed fit |
---|
| 485 | |
---|
| 486 | This DREAM fit completed successfully. The *Convergence* plot is flat, |
---|
| 487 | the *Parameter Trace* plot is flat and messy, the *Correlateions* plots |
---|
| 488 | show nice blobs (and a bit of correlation between the *M1.radius* parameter |
---|
| 489 | and the *M1.radius.width* parameter), and the uncertainty plots show |
---|
| 490 | a narrow range of -log(P) values in the mostly brown histograms and |
---|
| 491 | a good match to the green constrained maximum likelihood line. |
---|
| 492 | |
---|
| 493 | For each parameter in the fit, DREAM finds the mean, median and best value, |
---|
| 494 | as well as the 68% and 95% credible intervals. The mean value is |
---|
| 495 | defined as $\int x P(x) dx$, which is just the expected value of the |
---|
| 496 | probability distribution for the parameter. The median value is the 50% |
---|
| 497 | point in the probability distribution, and the best value is the maximum |
---|
| 498 | likelihood value seen in the random walk. The credible intervals are the |
---|
| 499 | central intervals which capture 68% and 95% of the parameter values |
---|
| 500 | respectively. You need approximately 100,000 samples to get two digits of |
---|
| 501 | precision on the 68% interval, and 1,000,000 samples for the 95% interval. |
---|
| 502 | |
---|
| 503 | .. table:: Example fit output |
---|
| 504 | |
---|
| 505 | = =============== ============ ======== ======== ================= ================= |
---|
| 506 | # Parameter mean median best [ 68% interval] [ 95% interval] |
---|
| 507 | = =============== ============ ======== ======== ================= ================= |
---|
| 508 | 1 M1.background 0.059925(41) 0.059924 0.059922 [0.05988 0.05997] [0.05985 0.06000] |
---|
| 509 | 2 M1.radius 2345.3(15) 2345.234 2345.174 [2343.83 2346.74] [2342.36 2348.29] |
---|
| 510 | 3 M1.radius.width 0.00775(41) 0.00774 0.00777 [ 0.0074 0.0081] [ 0.0070 0.0086] |
---|
| 511 | 4 M1.scale 0.21722(20) 0.217218 0.217244 [0.21702 0.21743] [0.21681 0.21761] |
---|
| 512 | = =============== ============ ======== ======== ================= ================= |
---|
| 513 | |
---|
| 514 | The *Convergence* plot shows the range of $\chi^2$ values in the population |
---|
| 515 | for each iteration. The band shows the 68% of values around the median, and |
---|
| 516 | the solid line shows the minimum value. If the distribution has reached |
---|
| 517 | equilibrium, then convergence graph should be roughly flat, with little |
---|
| 518 | change in the minimum value throughout the graph. If there is no convergence, |
---|
| 519 | then the remaining plots don't mean much. |
---|
| 520 | |
---|
| 521 | The *Correlations* plot shows cross correlation between each pair of |
---|
| 522 | parameters. If the parameters are completely uncorrelated then the boxes |
---|
| 523 | should contain circles. Diagonals indicate strong correlation. Square |
---|
| 524 | blocks indicate that the fit is not sensitive to one of the parameters. |
---|
| 525 | The range plotted on the correlation plot is determined by the 95% interval |
---|
| 526 | of the data. The individual correlation plots are too small to show the |
---|
| 527 | range of values for the parameters. These can instead be read from the |
---|
| 528 | *Uncertainty* plot for each parameter, which covers the same range of values |
---|
| 529 | and indicates 68% and 95% intervals. If there are some chains that are |
---|
| 530 | wandering around away from the minimum, then the plot will look fuzzy, and |
---|
| 531 | not have a nice blob in the center. If a correlation plot has multiple blobs, |
---|
| 532 | then there are multiple minima in your problem space, usually because there |
---|
| 533 | are symmetries in the problem definition. For example, a model fitting |
---|
| 534 | $x + a^2$ will have identical solutions for $\pm\,a$. |
---|
| 535 | |
---|
| 536 | The *Uncertainty* plot shows histograms for each fitted parameter generated |
---|
| 537 | from the values for that parameter across all chains. Within each histogram |
---|
| 538 | bar the values are sorted and displayed as a gradient from black to copper, |
---|
| 539 | with black values having the lowest $\chi^2$ and copper values having the |
---|
| 540 | highest. The resulting histogram should be dark brown, with a black hump |
---|
| 541 | in the center and light brown tips. If there are large lumps of light brown, |
---|
| 542 | or excessive black then its likely that the optimizer did not converge. The |
---|
| 543 | green line over the histogram shows the best value seen within each |
---|
| 544 | histogram bin (the maximum likelihood given $p_k == x$). |
---|
| 545 | With enough samples and proper convergence, it should roughly follow the |
---|
| 546 | outline of the histogram. The yellow band in the center of the plot |
---|
| 547 | represents the 68% interval for the data. The histogram cuts off at 95%. |
---|
| 548 | These values along with the median are shown as labels along the x axis. |
---|
| 549 | The green asterisk represents the best value, the green *E* the mean value |
---|
| 550 | and the vertical green line the median value. If the fit is not sensitive |
---|
| 551 | to a parameter, or if two parameters are strongly correlated, the parameter |
---|
| 552 | histogram will show a box rather than a hump. Spiky shapes (either in the |
---|
| 553 | histogram or the maximum likelihood line) indicate lack of convergence or |
---|
| 554 | maybe not enough steps. A chopped histograms indicates that the range for |
---|
| 555 | that parameter is too small. |
---|
| 556 | |
---|
| 557 | The *Parameter Trace* plot is diagnostic for models which have poor mixing. |
---|
| 558 | In this cases no matter how the parameter values are changing, they are |
---|
| 559 | landing on much worse values for the $\chi^2$. This can happen if the |
---|
| 560 | problem is highly constrained with many tight and twisty values. |
---|
| 561 | |
---|
| 562 | The *Data and Theory* plot should show theory and data lining up pretty well, |
---|
| 563 | with the theory overlaying about 2/3 of the error bars on the data |
---|
| 564 | (1-\ $\sigma$ = 68%). The *Residuals* plot shows the difference between |
---|
| 565 | theory and data divided by uncertainty. The residuals should be 2/3 within |
---|
| 566 | [-1, 1], They should not show any structure, such as humps where the theory |
---|
| 567 | misses the data for long stretches. This indicates some feature missing |
---|
| 568 | from the model, or a lack of convergence to the best model. |
---|
| 569 | |
---|
| 570 | If entropy is requested, then bumps will show the total number of bits of |
---|
| 571 | information in the fit. This derives from the entropy term: |
---|
| 572 | |
---|
| 573 | .. math: |
---|
| 574 | |
---|
| 575 | S = \int_\Theta p(\Theta) \log p(\Theta) d\Theta |
---|
| 576 | |
---|
| 577 | Using entropy and simulation we hope to be able to make experiment |
---|
| 578 | planning decisions in a way that maximizes information, by estimating |
---|
| 579 | whether it is better to measure more precisely or to measure different |
---|
| 580 | but related values and fit them with shared parameters. |
---|
| 581 | |
---|
| 582 | |
---|
| 583 | References |
---|
| 584 | ---------- |
---|
| 585 | |
---|
| 586 | .. [Vrugt2009] |
---|
| 587 | Vrugt, J. A.; Ter Braak, C. J. F.; Diks, C. G. H.; Robinson, B. A.; |
---|
| 588 | Hyman, J. M.; Higdon, D. |
---|
| 589 | *International Journal of Nonlinear Sciences and Numerical Simulation* |
---|
| 590 | 2009, 10 (3), 273â290. |
---|
| 591 | DOI: `10.1515/IJNSNS.2009.10.3.273 <http://dx.doi.org/10.1515/IJNSNS.2009.10.3.273>`_ |
---|
| 592 | |
---|
| 593 | .. [Kramer2010] |
---|
| 594 | Kramer, A.; Hasenauer, J.; Allgower, F.; Radde, N. |
---|
| 595 | *In 2010 IEEE International Conference on Control Applications (CCA)* |
---|
| 596 | 2010; pp 493â498. |
---|
| 597 | DOI: `10.1109/CCA.2010.5611198 <http://dx.doi.org/10.1109/CCA.2010.5611198>`_ |
---|
| 598 | |
---|
| 599 | .. [JCGM2008] |
---|
| 600 | JCGM. |
---|
| 601 | *Evaluation of measurement data â Supplement 1 to the âGuide to the |
---|
| 602 | expression of uncertainty in measurementâ â Propagation of distributions |
---|
| 603 | using a Monte Carlo method*; Joint Committee for Guides in Metrology, |
---|
| 604 | JCGM 101:2008; Geneva, Switzerland, 2008; p 90. |
---|
| 605 | `<http://www.bipm.org/utils/common/documents/jcgm/JCGM_101_2008_E.pdf>`_ |
---|
| 606 | |
---|
| 607 | |
---|
| 608 | |
---|
| 609 | .. _fit-ps: |
---|
| 610 | |
---|
| 611 | Particle Swarm |
---|
| 612 | ============== |
---|
| 613 | |
---|
| 614 | Inspired by bird flocking behaviour, the particle swarm algorithm is a |
---|
| 615 | population-based method which updates an individual according to its |
---|
| 616 | momentum and a force toward the current best fit parameter values. We |
---|
| 617 | did not explore variations of this algorithm in any detail. |
---|
| 618 | |
---|
| 619 | When to use |
---|
| 620 | ----------- |
---|
| 621 | |
---|
| 622 | Particle swarm performed well enough in our low dimensional test problems, |
---|
| 623 | but made little progress when more fit parameters were added. |
---|
| 624 | |
---|
| 625 | The population updates can run in parallel, but the tiny population size |
---|
| 626 | limits the amount of parallelism. |
---|
| 627 | |
---|
| 628 | Options |
---|
| 629 | ------- |
---|
| 630 | |
---|
| 631 | ``--steps=n`` is the number of iterations. Each step updates each member |
---|
| 632 | of the population. The population size scales with the number of fitted |
---|
| 633 | parameters. |
---|
| 634 | |
---|
| 635 | ``--pop=k`` determines the size of the population. The number of |
---|
| 636 | individuals, $k$, is equal to the number of fitted parameters times the |
---|
| 637 | population scale factor. The default scale factor is 1. |
---|
| 638 | |
---|
| 639 | Use ``--fit=ps`` to select particle swarm from the commandline. |
---|
| 640 | |
---|
| 641 | Add a few more lines |
---|
| 642 | |
---|
| 643 | References |
---|
| 644 | ---------- |
---|
| 645 | |
---|
| 646 | .. [Kennedy1995] |
---|
| 647 | Kennedy, J.; Eberhart, R. |
---|
| 648 | Particle Swarm Optimization |
---|
| 649 | *Proceedings of IEEE International Conference on Neural Networks. IV.* |
---|
| 650 | 1995; pp 1942â1948. |
---|
| 651 | DOI: `10.1109/ICNN.1995.48896 <http://dx.doi.org/810.1109/ICNN.1995.488968>`_ |
---|
| 652 | |
---|
| 653 | |
---|
| 654 | .. _fit-rl: |
---|
| 655 | |
---|
| 656 | Random Lines |
---|
| 657 | ============ |
---|
| 658 | |
---|
| 659 | Most of the population based algorithms ignore the value of the function |
---|
| 660 | when choosing the points in the next iteration. Random lines is a new |
---|
| 661 | style of algorithm which fits a quadratic model to a selection from the |
---|
| 662 | population, and uses that model to propose a new point in the next |
---|
| 663 | generation of the population. The hope is that the method will inherit |
---|
| 664 | the robustness of the population based algorithms as well as the rapid |
---|
| 665 | convergence of the newton descent algorithms. |
---|
| 666 | |
---|
| 667 | When to use |
---|
| 668 | ----------- |
---|
| 669 | |
---|
| 670 | Random lines works very well for some of our test problems, showing |
---|
| 671 | rapid convergence to the optimum, but on other problems it makes |
---|
| 672 | very little progress. |
---|
| 673 | |
---|
| 674 | The population updates can run in parallel. |
---|
| 675 | |
---|
| 676 | Options |
---|
| 677 | ------- |
---|
| 678 | |
---|
| 679 | ``--steps=n`` is the number of iterations. Each step updates each member |
---|
| 680 | of the population. The population size scales with the number of fitted |
---|
| 681 | parameters. |
---|
| 682 | |
---|
| 683 | ``--pop=k`` determines the size of the population. The number of |
---|
| 684 | individuals, $k$, is equal to the number of fitted parameters times the |
---|
| 685 | population scale factor. The default scale factor is 0.5. |
---|
| 686 | |
---|
| 687 | ``--CR=v`` is the crossover ratio, determining what proportion of the |
---|
| 688 | dimensions to update at each step. Values must be between 0 and 1. |
---|
| 689 | |
---|
| 690 | ``--starts=n`` tells the optimizer to restart a given number of times. |
---|
| 691 | Each time it restarts it uses a random starting point. |
---|
| 692 | |
---|
| 693 | ``--keep_best`` uses a value near the previous minimum when restarting |
---|
| 694 | instead of using a random value within the parameter bounds. This option is |
---|
| 695 | not available in the options dialog. |
---|
| 696 | |
---|
| 697 | Use ``--fit=rl`` to select random lines from the commandline. |
---|
| 698 | |
---|
| 699 | References |
---|
| 700 | ---------- |
---|
| 701 | |
---|
| 702 | .. [Sahin2013] |
---|
| 703 | |
---|
| 704 | Sahin, I. |
---|
| 705 | *An International Journal of Optimization and Control: Theories & Applications (IJOCTA)* |
---|
| 706 | 2013, 3 (2), 111â119. |
---|
| 707 | |
---|
| 708 | |
---|
| 709 | |
---|
| 710 | .. _fit-pt: |
---|
| 711 | |
---|
| 712 | Parallel Tempering |
---|
| 713 | ================== |
---|
| 714 | |
---|
| 715 | Parallel tempering is an MCMC algorithm for uncertainty analysis. This |
---|
| 716 | version runs at multiple temperatures simultaneously, with chains at high |
---|
| 717 | temperature able to more easily jump between minima and chains at low |
---|
| 718 | temperature to fully explore the minima. Like :ref:`fit-dream` it has a |
---|
| 719 | differential evolution stepper, but this version uses the chain history |
---|
| 720 | as the population rather than maintaining a population at each temperature. |
---|
| 721 | |
---|
| 722 | This is an experimental algorithm which does not yet perform well. |
---|
| 723 | |
---|
| 724 | When to use |
---|
| 725 | ----------- |
---|
| 726 | |
---|
| 727 | When complete, parallel tempering should be used for problems with widely |
---|
| 728 | spaced local minima which dream cannot fit. |
---|
| 729 | |
---|
| 730 | Options |
---|
| 731 | ------- |
---|
| 732 | |
---|
| 733 | ``--steps=n`` is the number of iterations to include in the Markov |
---|
| 734 | chain. Each iteration updates the full population. The population size |
---|
| 735 | scales with the number of fitted parameters. |
---|
| 736 | |
---|
| 737 | ``--burn=n`` is the number of iterations to required for the Markov |
---|
| 738 | chain to converge to the equilibrium distribution. If the fit ends |
---|
| 739 | early, the tail of the burn will be saved to the start of the steps. |
---|
| 740 | |
---|
| 741 | ``--CR=v`` is the differential evolution crossover ratio to use when |
---|
| 742 | computing step size and direction. Use a small value to step through the |
---|
| 743 | dimensions one at a time, or a large value to step through all at once. |
---|
| 744 | |
---|
| 745 | ``-nT=k``, ``-Tmin=v`` and ``--Tmax=v`` specify a log-spaced initial |
---|
| 746 | distribution of temperatures. The default is 25 points between |
---|
| 747 | 0.1 and 10. :ref:`fit-dream` runs at a fixed temperature of 1.0. |
---|
| 748 | |
---|
| 749 | Use ``--fit=pt`` to select parallel tempering from the commandline. |
---|
| 750 | |
---|
| 751 | References |
---|
| 752 | ---------- |
---|
| 753 | |
---|
| 754 | .. [Swendsen1986] |
---|
| 755 | Swendsen, R. H.; Wang J. S. |
---|
| 756 | Replica Monte Carlo simulation of spin glasses |
---|
| 757 | *Physical Review Letters* |
---|
| 758 | 1986, 57, 2607-2609 |
---|
| 759 | |
---|
| 760 | |
---|
| 761 | .. |
---|
| 762 | SNOBFIT (fit=snobfit) attempts to construct a locally quadratic model of |
---|
| 763 | the entire search space. While promising because it can begin to offer |
---|
| 764 | some guarantees that the search is complete given reasonable assumptions |
---|
| 765 | about the fitting surface, initial trials did not perform well and the |
---|
| 766 | algorithm has not yet been tuned to our problems. |
---|
| 767 | |
---|