source: sasview/park-1.2.1/park/assembly.py @ 6c9f25d

ESS_GUIESS_GUI_DocsESS_GUI_batch_fittingESS_GUI_bumps_abstractionESS_GUI_iss1116ESS_GUI_iss879ESS_GUI_iss959ESS_GUI_openclESS_GUI_orderingESS_GUI_sync_sascalccostrafo411magnetic_scattrelease-4.1.1release-4.1.2release-4.2.2release_4.0.1ticket-1009ticket-1094-headlessticket-1242-2d-resolutionticket-1243ticket-1249ticket885unittest-saveload
Last change on this file since 6c9f25d was 5ab5cae, checked in by Peter Parker, 10 years ago

Refs #202 - Try and fix some Sphinx warnings.

  • Property mode set to 100644
File size: 22.8 KB
Line 
1# This program is public domain
2"""
3An assembly is a collection of fitting functions.  This provides
4the model representation that is the basis of the park fitting engine.
5
6Models can range from very simple one dimensional theory functions
7to complex assemblies of multidimensional datasets from different
8experimental techniques, each with their own theory function and
9a common underlying physical model.
10
11Usage
12=====
13
14First define the models you want to work with.  In the example
15below we will use an example of a simple multilayer system measured by
16specular reflection of xrays and neutrons.  The gold depth is the only
17fitting parameter, ranging from 10-30 A.  The interface depths are
18tied together using expressions.  In this case the expression is
19a simple copy, but any standard math functions can be used.  Some
20model developers may provide additional functions for use with the
21expression.
22
23Example models::
24
25    import reflectometry.model1d as refl
26    xray = refl.model('xray')
27    xray.incident('Air',rho=0)
28    xray.interface('iAu',sigma=5)
29    xray.layer('Au',rho=124.68,depth=[10,30])
30    xray.interface('iSi',sigma=5)
31    xray.substrate('Si',rho=20.07)
32    datax = refl.data('xray.dat')
33
34    neutron = refl.model('neutron')
35    neutron.incident('Air',rho=0)
36    neutron.interface('iAu',sigma='xray.iAu')
37    neutron.layer('Au',rho=4.66,depth='xray.Au.depth')
38    neutron.interface('iSi',sigma='xray.iSi')
39    neutron.substrate('Si',rho=2.07)
40    datan = refl.data('neutron.dat')
41
42As you can see from the above, parameters can be set to a value if
43the parameter is fixed, to a range if the parametemr is fitted, or
44to a string expression if the parameter is calculated from other
45parameters.  See park.Parameter.set for further details.
46
47Having constructed the models, we can now create an assembly::
48
49    import park
50    assembly = park.Assembly([(xray,datax), (neutron,datan)])
51
52Note: this would normally be done in the context of a fit
53using fit = park.Fit([(xray,datax), (neutron,datan)]), and later referenced
54using fit.assembly.
55
56Individual parts of the assembly are accessable using the
57model number 0, 1, 2... or by the model name.  In the above,
58assembly[0] and assembly['xray'] refer to the same model.
59Assemblies have insert and append functions for adding new
60models, and "del model[idx]" for removing them.
61
62Once the assembly is created computing the values for the system
63is a matter of calling::
64
65    assembly.eval()
66    print "Chi**2",assembly.chisq
67    print "Reduced chi**2",assembly.chisq/assembly.degrees_of_freedom
68    plot(arange(len(assembly.residuals)), assembly.residuals)
69
70This defines the attributes residuals, degrees_of_freedom and chisq,
71which is what the optimizer uses as the cost function to minimize.
72
73assembly.eval uses the current values for the parameters in the
74individual models.  These parameters can be changed directly
75in the model.  In the reflectometry example above, you could
76set the gold thickness using xray.layer.Au.depth=156, or
77something similar (the details are model specific).  Parameters
78can also be changed through the assembly parameter set.  In the same
79example, this would be assembly.parameterset['xray']['Au']['depth'].
80See parameter set for details.
81
82In the process of modeling data, particularly with multiple
83datasets, you will sometimes want to temporarily ignore
84how well one of the datasets matches so that you
85can more quickly refine the model for the other datasets,
86or see how particular models are influencing the fit.  To
87temporarily ignore the xray data in the example above use::
88
89    assembly.parts[0].isfitted = False
90
91The model itself isn't ignored since its parameters may be
92needed to compute the parameters for other models.  To
93reenable checking against the xray data, you would assign
94a True value instead.  More subtle weighting of the models
95can be controlled using assembly.parts[idx].weight, but
96see below for a note on model weighting.
97
98A note on model weighting
99-------------------------
100
101Changing the weight is equivalent to scaling the error bars
102on the given model by a factor of weight/n where n is the
103number of data points.  It is better to set the correct error
104bars on your data in the first place than to adjust the weights.
105If you have the correct error bars, then you should expect
106roughly 2/3 of the data points to lie within one error bar of
107the theory curve.  If consecutive data points have largely
108overlapping errorbars, then your uncertainty is overestimated.
109
110Another case where weights are adjusted (abused?) is to
111compensate for systematic errors in the data by forcing the
112errorbars to be large enough to cover the systematic bias.
113This is a poor approach to the problem.  A better strategy
114is to capture the systematic effects in the model, and treat
115the measurement of the independent variable as an additional
116data point in the fit.  This is still not statistically sound
117as there is likely to be a large correlation between the
118uncertainty of the measurement and the values of all the
119other variables.
120
121That said, adjusting the weight on a dataset is a quick way
122of reducing its influence on the entire fit.  Please use it
123with care.
124"""
125
126__all__ = ['Assembly', 'Fitness']
127import numpy
128
129import park
130from park.parameter import Parameter,ParameterSet
131from park.fitresult import FitParameter
132import park.expression
133
134
135
136class Fitness(object):
137    """
138    Container for theory and data.
139
140    The fit object compares theory with data.
141
142    TODO: what to do with fittable metadata (e.g., footprint correction)?
143    """
144    data = None
145    model = None
146    def __init__(self, model=None,data=None):
147        self.data,self.model = data,model
148    def _parameterset(self):
149        return self.model.parameterset
150    parameterset = property(_parameterset)
151    def residuals(self):
152        return self.data.residuals(self.model.eval)
153    def residuals_deriv(self, pars=[]):
154        return self.data.residuals_deriv(self.model.eval_derivs,pars=pars)
155    def set(self, **kw):
156        """
157        Set parameters in the model.
158
159        User convenience function.  This allows a user with an assembly
160        of models in a script to for example set the fit range for
161        parameter 'a' of the model::
162
163            assembly[0].set(a=[5,6])
164
165        Raises KeyError if the parameter is not in parameterset.
166        """
167        self.model.set(**kw)
168    def abort(self):
169        if hasattr(self.model,'abort'): self.model.abort()
170
171class Part(object):
172    """
173    Part of a fitting assembly.  Part holds the model itself and
174    associated data.  The part can be initialized with a fitness
175    object or with a pair (model,data) for the default fitness function.
176
177    fitness (Fitness)
178        object implementing the `park.assembly.Fitness` interface.  In
179        particular, fitness should provide a parameterset attribute
180        containing a ParameterSet and a residuals method returning a vector
181        of residuals.
182    weight (dimensionless)
183        weight for the model.  See comments in assembly.py for details.
184    isfitted (boolean)
185        True if the model residuals should be included in the fit.
186        The model parameters may still be used in parameter
187        expressions, but there will be no comparison to the data.
188    residuals (vector)
189        Residuals for the model if they have been calculated, or None
190    degrees_of_freedom
191        Number of residuals minus number of fitted parameters.
192        Degrees of freedom for individual models does not make
193        sense in the presence of expressions combining models,
194        particularly in the case where a model has many parameters
195        but no data or many computed parameters.  The degrees of
196        freedom for the model is set to be at least one.
197    chisq
198        sum(residuals**2); use chisq/degrees_of_freedom to
199        get the reduced chisq value.
200
201        Get/set the weight on the given model.
202
203        assembly.weight(3) returns the weight on model 3 (0-origin)
204        assembly.weight(3,0.5) sets the weight on model 3 (0-origin)
205    """
206
207    def __init__(self, fitness, weight=1., isfitted=True):
208        if isinstance(fitness, tuple):
209            fitness = park.Fitness(*fitness)
210        self.fitness = fitness
211        self.weight = weight
212        self.isfitted = isfitted
213        self.residuals = None
214        self.chisq = numpy.Inf
215        self.degrees_of_freedom = 1
216
217
218class Assembly(object):
219    """
220    Collection of fit models.
221
222    Assembly implements the `park.fit.Objective` interface.
223
224    See `park.assembly` for usage.
225
226    Instance variables:
227
228    residuals : array
229        a vector of residuals spanning all models, with model
230        weights applied as appropriate.
231    degrees_of_freedom : integer
232        length of the residuals - number of fitted parameters
233    chisq : float
234        sum squared residuals; this is not the reduced chisq, which
235        you can get using chisq/degrees_of_freedom
236
237    These fields are defined for the individual models as well, with
238    degrees of freedom adjusted to the length of the individual data
239    set.  If the model is not fitted or the weight is zero, the residual
240    will not be calculated.
241
242    The residuals fields are available only after the model has been
243    evaluated.
244    """
245
246    def __init__(self, models=[]):
247        """Build an assembly from a list of models."""
248        self.parts = []
249        for m in models:
250            self.parts.append(Part(m))
251        self._reset()
252
253    def __iter__(self):
254        """Iterate through the models in order"""
255        for m in self.parts: yield m
256
257    def __getitem__(self, n):
258        """Return the nth model"""
259        return self.parts[n].fitness
260
261    def __setitem__(self, n, fitness):
262        """Replace the nth model"""
263        self.parts[n].fitness = fitness
264        self._reset()
265
266    def __delitem__(self, n):
267        """Delete the nth model"""
268        del self.parts[n]
269        self._reset()
270
271    def weight(self, idx, value=None):
272        """
273        Query the weight on a particular model.
274
275        Set weight to value if value is supplied.
276
277        :Parameters:
278         idx : integer
279           model number
280         value : float
281           model weight
282        :return: model weight
283        """
284        if value is not None:
285            self.parts[idx].weight = value
286        return self.parts[idx].weight
287
288    def isfitted(self, idx, value=None):
289        """
290        Query if a particular model is fitted.
291
292        Set isfitted to value if value is supplied.
293
294        :param idx: model number
295        :type idx: integer
296        :param value:
297        """
298        if value is not None:
299            self.parts[idx].isfitted = value
300        return self.parts[idx].isfitted
301
302    def append(self, fitness, weight=1.0, isfitted=True):
303        """
304        Add a model to the end of set.
305
306        :param fitness: the fitting model
307            The fitting model can be an instance of `park.assembly.Fitness`,
308            or a tuple of (`park.model.Model`,`park.data.Data1D`)
309        :param weight: model weighting (usually 1.0)
310        :param isfitted: whether model should be fit (equivalent to weight 0.)
311        """
312        self.parts.append(Part(fitness,weight,isfitted))
313        self._reset()
314
315    def insert(self, idx, fitness, weight=1.0, isfitted=True):
316        """Add a model to a particular position in the set."""
317        self.parts.insert(idx,Part(fitness,weight,isfitted))
318        self._reset()
319
320    def _reset(self):
321        """Adjust the parameter set after the addition of a new model."""
322        subsets = [m.fitness.parameterset for m in self]
323        self.parameterset = ParameterSet('root',subsets)
324        self.parameterset.setprefix()
325        #print [p.path for p in self.parameterset.flatten()]
326
327    def eval(self):
328        """
329        Recalculate the theory functions, and from them, the
330        residuals and chisq.
331
332        :note: Call this after the parameters have been updated.
333        """
334        # Handle abort from a separate thread.
335        self._cancel = False
336
337        # Evaluate the computed parameters
338        self._fitexpression()
339
340        # Check that the resulting parameters are in a feasible region.
341        if not self.isfeasible(): return numpy.inf
342
343        resid = []
344        k = len(self._fitparameters)
345        for m in self.parts:
346            # In order to support abort, need to be able to propagate an
347            # external abort signal from self.abort() into an abort signal
348            # for the particular model.  Can't see a way to do this which
349            # doesn't involve setting a state variable.
350            self._current_model = m
351            if self._cancel: return numpy.inf
352            if m.isfitted and m.weight != 0:
353                m.residuals = m.fitness.residuals()
354                N = len(m.residuals)
355                m.degrees_of_freedom = N-k if N>k else 1
356                m.chisq = numpy.sum(m.residuals**2)
357                resid.append(m.weight*m.residuals)
358        self.residuals = numpy.hstack(resid)
359        N = len(self.residuals)
360        self.degrees_of_freedom = N-k if N>k else 1
361        self.chisq = numpy.sum(self.residuals**2)
362        return self.chisq
363
364    def jacobian(self, pvec, step=1e-8):
365        """
366        Returns the derivative wrt the fit parameters at point p.
367
368        Numeric derivatives are calculated based on step, where step is
369        the portion of the total range for parameter j, or the portion of
370        point value p_j if the range on parameter j is infinite.
371        """
372        # Make sure the input vector is an array
373        pvec = numpy.asarray(pvec)
374        # We are being lazy here.  We can precompute the bounds, we can
375        # use the residuals_deriv from the sub-models which have analytic
376        # derivatives and we need only recompute the models which depend
377        # on the varying parameters.
378        # Meanwhile, let's compute the numeric derivative using the
379        # three point formula.
380        # We are not checking that the varied parameter in numeric
381        # differentiation is indeed feasible in the interval of interest.
382        range = zip(*[p.range for p in self._fitparameters])
383        lo,hi = [numpy.asarray(v) for v in range]
384        delta = (hi-lo)*step
385        # For infinite ranges, use p*1e-8 for the step size
386        idx = numpy.isinf(delta)
387        #print "J",idx,delta,pvec,type(idx),type(delta),type(pvec)
388        delta[idx] = pvec[idx]*step
389        delta[delta==0] = step
390
391        # Set the initial value
392        for k,v in enumerate(pvec):
393            self._fitparameters[k].value = v
394        # Gather the residuals
395        r = []
396        for k,v in enumerate(pvec):
397            # Center point formula:
398            #     df/dv = lim_{h->0} ( f(v+h)-f(v-h) ) / ( 2h )
399            h = delta[k]
400            self._fitparameters[k].value = v + h
401            self.eval()
402            rk = self.residuals
403            self._fitparameters[k].value = v - h
404            self.eval()
405            rk -= self.residuals
406            self._fitparameters[k].value = v
407            r.append(rk/(2*h))
408        # return the jacobian
409        return numpy.vstack(r).T
410
411
412    def cov(self, pvec):
413        """
414        Return the covariance matrix inv(J'J) at point p.
415        """
416
417        # Find cov of f at p
418        #     cov(f,p) = inv(J'J)
419        # Use SVD
420        #     J = U S V'
421        #     J'J = (U S V')' (U S V')
422        #         = V S' U' U S V'
423        #         = V S S V'
424        #     inv(J'J) = inv(V S S V')
425        #              = inv(V') inv(S S) inv(V)
426        #              = V inv (S S) V'
427        J = self.jacobian(pvec)
428        u,s,vh = numpy.linalg.svd(J,0)
429        JTJinv = numpy.dot(vh.T.conj()/s**2,vh)
430        return JTJinv
431
432    def stderr(self, pvec):
433        """
434        Return parameter uncertainty.
435
436        This is just the sqrt diagonal of covariance matrix inv(J'J) at point p.
437        """
438        return numpy.sqrt(numpy.diag(self.cov(pvec)))
439
440    def isfeasible(self):
441        """
442        Returns true if the parameter set is in a feasible region of the
443        modeling space.
444        """
445        return True
446
447    # Fitting service interface
448    def fit_parameters(self):
449        """
450        Return an alphabetical list of the fitting parameters.
451
452        This function is called once at the beginning of a fit,
453        and serves as a convenient place to precalculate what
454        can be precalculated such as the set of fitting parameters
455        and the parameter expressions evaluator.
456        """
457        self.parameterset.setprefix()
458        self._fitparameters = self.parameterset.fitted
459        self._restraints = self.parameterset.restrained
460        pars = self.parameterset.flatten()
461        context = self.parameterset.gather_context()
462        self._fitexpression = park.expression.build_eval(pars,context)
463        #print "constraints",self._fitexpression.__doc__
464
465        self._fitparameters.sort(lambda a,b: cmp(a.path,b.path))
466        # Convert to fitparameter a object
467        fitpars = [FitParameter(p.path,p.range,p.value)
468                   for p in self._fitparameters]
469        return fitpars
470
471    def set_result(self, result):
472        """
473        Set the parameters resulting from the fit into the parameter set,
474        and update the calculated expression.
475
476        The parameter values may be retrieved by walking the assembly.parameterset
477        tree, checking each parameter for isfitted, iscomputed, or isfixed.
478        For example::
479
480            assembly.set_result(result)
481            for p in assembly.parameterset.flatten():
482                if p.isfitted():
483                    print "%s %g in [%g,%g]"%(p.path,p.value,p.range[0],p.range[1])
484                elif p.iscomputed():
485                    print "%s computed as %g"%(p.path.p.value)
486
487        This does not calculate the function or the residuals for these parameters.
488        You can call assembly.eval() to do this.  The residuals will be set in
489        assembly[i].residuals.  The theory and data are model specific, and can
490        be found in assembly[i].fitness.data.
491        """
492        for n,p in enumerate(result.parameters):
493            self._fitparameters[n] = p.value
494        self._fitexpression()
495
496    def all_results(self, result):
497        """
498        Extend result from the fit with the calculated parameters.
499        """
500        calcpars = [FitParameter(p.path,p.range,p.value)
501                    for p in self.parameterset.computed]
502        result.parameters += calcpars
503
504    def result(self, status='step'):
505        """
506        Details to send back to the fitting client on an improved fit.
507
508        status is 'start', 'step' or 'end' depending if this is the
509        first result to return, an improved result, or the final result.
510
511        [Not implemented]
512        """
513        return None
514
515    def fresiduals(self, pvec):
516        chisq = self.__call__(pvec)
517        return self.residuals
518
519    def __call__(self, pvec):
520        """
521        Cost function.
522
523        Evaluate the system for the parameter vector pvec, returning chisq
524        as the cost function to be minimized.
525
526        Raises a runtime error if the number of fit parameters is
527        different than the length of the vector.
528        """
529        # Plug fit parameters into model
530        #print "Trying",pvec
531        pars = self._fitparameters
532        if len(pvec) != len(pars):
533            raise RuntimeError("Unexpected number of parameters")
534        for n,value in enumerate(pvec):
535            pars[n].value = value
536        # Evaluate model
537        chisq = self.eval()
538        # Evaluate additional restraints based on parameter value
539        # likelihood
540        restraints_penalty = 0
541        for p in self._restraints:
542            restraints_penalty += p.likelihood(p.value)
543        # Return total cost function
544        return self.chisq + restraints_penalty
545
546    def abort(self):
547        """
548        Interrupt the current function evaluation.
549
550        Forward this to the currently executing model if possible.
551        """
552        self._cancel = True
553        if hasattr(self._current_model,'abort'):
554            self._current_model.abort()
555
556class _Exp(park.Model):
557    """
558    Sample model for testing assembly.
559    """
560    parameters = ['a','c']
561    def eval(self,x):
562        return self.a*numpy.exp(self.c*x)
563class _Linear(park.Model):
564    parameters = ['a','c']
565    def eval(self,x):
566        #print "eval",self.a,self.c,x,self.a*x+self.c
567        return self.a*x+self.c
568def example():
569    """
570    Return an example assembly consisting of a pair of functions,
571        M1.a*exp(M1.c*x), M2.a*exp(2*M1.c*x)
572    and ideal data for
573        M1.a=1, M1.c=1.5, M2.a=2.5
574    """
575    import numpy
576    import park
577    from numpy import inf
578    # Make some fake data
579    x1 = numpy.linspace(0,1,11)
580    x2 = numpy.linspace(0,1,12)
581    # Define a shared model
582    if True: # Exp model
583        y1,y2 = numpy.exp(1.5*x1),2.5*numpy.exp(3*x2)
584        M1 = _Exp('M1',a=[1,3],c=[1,3])
585        M2 = _Exp('M2',a=[1,3],c='2*M1.c')
586        #M2 = _Exp('M2',a=[1,3],c=3)
587    else:  # Linear model
588        y1,y2 = x1+1.5, 2.5*x2+3
589        M1 = _Linear('M1',a=[1,3],c=[1,3])
590        M2 = _Linear('M2',a=[1,3],c='2*M1.c')
591    if False: # Unbounded
592        M1.a = [-inf,inf]
593        M1.c = [-inf,inf]
594        M2.a = [-inf,inf]
595    D1 = park.Data1D(x=x1, y=y1)
596    D2 = park.Data1D(x=x2, y=y2)
597    # Construct the assembly
598    assembly = park.Assembly([(M1,D1),(M2,D2)])
599    return assembly
600
601class _Sphere(park.Model):
602    parameters = ['a','b','c','d','e']
603    def eval(self,x):
604        return self.a*x**2+self.b*x+self.c + exp(self.d) - 3*sin(self.e)
605
606def example5():
607    import numpy
608    import park
609    from numpy import inf
610    # Make some fake data
611    x = numpy.linspace(0,1,11)
612    # Define a shared model
613    S = _Sphere(a=1,b=2,c=3,d=4,e=5)
614    y = S.eval(x1)
615    Sfit = _Sphere(a=[-inf,inf],b=[-inf,inf],c=[-inf,inf],d=[-inf,inf],e=[-inf,inf])
616    D = park.Data1D(x=x, y=y)
617    # Construct the assembly
618    assembly = park.Assembly([(Sfit,D)])
619    return assembly
620
621def test():
622    assembly = example()
623    assert assembly[0].parameterset.name == 'M1'
624
625    # extract the fitting parameters
626    pars = [p.name for p in assembly.fit_parameters()]
627    assert set(pars) == set(['M1.a','M1.c','M2.a'])
628    # Compute chisq and verify constraints are updated properly
629    assert assembly([1,1.5,2.5]) == 0
630    assert assembly[0].model.c == 1.5 and assembly[1].model.c == 3
631
632    # Try without constraints
633    assembly[1].set(c=3)
634    assembly.fit_parameters()  # Fit parameters have changed
635    assert assembly([1,1.5,2.5]) == 0
636
637    # Check that assembly.cov runs ... still need to check that it is correct!
638    C = assembly.cov(numpy.array([1,1.5,2.5]))
639
640if __name__ == "__main__": test()
Note: See TracBrowser for help on using the repository browser.