source: sasview/park-1.2.1/park/assembly.py @ b11e127

ESS_GUIESS_GUI_DocsESS_GUI_batch_fittingESS_GUI_bumps_abstractionESS_GUI_iss1116ESS_GUI_iss879ESS_GUI_iss959ESS_GUI_openclESS_GUI_orderingESS_GUI_sync_sascalccostrafo411magnetic_scattrelease-4.1.1release-4.1.2release-4.2.2release_4.0.1ticket-1009ticket-1094-headlessticket-1242-2d-resolutionticket-1243ticket-1249ticket885unittest-saveload
Last change on this file since b11e127 was 3570545, checked in by Mathieu Doucet <doucetm@…>, 13 years ago

Adding park Part 2

  • Property mode set to 100644
File size: 22.8 KB
Line 
1# This program is public domain
2"""
3An assembly is a collection of fitting functions.  This provides
4the model representation that is the basis of the park fitting engine.
5
6Models can range from very simple one dimensional theory functions
7to complex assemblies of multidimensional datasets from different
8experimental techniques, each with their own theory function and
9a common underlying physical model.
10
11Usage
12=====
13
14First define the models you want to work with.  In the example
15below we will use an example of a simple multilayer system measured by
16specular reflection of xrays and neutrons.  The gold depth is the only
17fitting parameter, ranging from 10-30 A.  The interface depths are
18tied together using expressions.  In this case the expression is
19a simple copy, but any standard math functions can be used.  Some
20model developers may provide additional functions for use with the
21expression.
22
23Example models::
24
25    import reflectometry.model1d as refl
26    xray = refl.model('xray')
27    xray.incident('Air',rho=0)
28    xray.interface('iAu',sigma=5)
29    xray.layer('Au',rho=124.68,depth=[10,30])
30    xray.interface('iSi',sigma=5)
31    xray.substrate('Si',rho=20.07)
32    datax = refl.data('xray.dat')
33
34    neutron = refl.model('neutron')
35    neutron.incident('Air',rho=0)
36    neutron.interface('iAu',sigma='xray.iAu')
37    neutron.layer('Au',rho=4.66,depth='xray.Au.depth')
38    neutron.interface('iSi',sigma='xray.iSi')
39    neutron.substrate('Si',rho=2.07)
40    datan = refl.data('neutron.dat')
41
42As you can see from the above, parameters can be set to a value if
43the parameter is fixed, to a range if the parametemr is fitted, or
44to a string expression if the parameter is calculated from other
45parameters.  See park.Parameter.set for further details.
46
47Having constructed the models, we can now create an assembly::
48
49    import park
50    assembly = park.Assembly([(xray,datax), (neutron,datan)])
51
52Note: this would normally be done in the context of a fit
53using fit = park.Fit([(xray,datax), (neutron,datan)]), and later referenced
54using fit.assembly.
55
56Individual parts of the assembly are accessable using the
57model number 0, 1, 2... or by the model name.  In the above,
58assembly[0] and assembly['xray'] refer to the same model.
59Assemblies have insert and append functions for adding new
60models, and "del model[idx]" for removing them.
61
62Once the assembly is created computing the values for the system
63is a matter of calling::
64
65    assembly.eval()
66    print "Chi**2",assembly.chisq
67    print "Reduced chi**2",assembly.chisq/assembly.degrees_of_freedom
68    plot(arange(len(assembly.residuals)), assembly.residuals)
69
70This defines the attributes residuals, degrees_of_freedom and chisq,
71which is what the optimizer uses as the cost function to minimize.
72
73assembly.eval uses the current values for the parameters in the
74individual models.  These parameters can be changed directly
75in the model.  In the reflectometry example above, you could
76set the gold thickness using xray.layer.Au.depth=156, or
77something similar (the details are model specific).  Parameters
78can also be changed through the assembly parameter set.  In the same
79example, this would be assembly.parameterset['xray']['Au']['depth'].
80See parameter set for details.
81
82In the process of modeling data, particularly with multiple
83datasets, you will sometimes want to temporarily ignore
84how well one of the datasets matches so that you
85can more quickly refine the model for the other datasets,
86or see how particular models are influencing the fit.  To
87temporarily ignore the xray data in the example above use::
88
89    assembly.parts[0].isfitted = False
90
91The model itself isn't ignored since its parameters may be
92needed to compute the parameters for other models.  To
93reenable checking against the xray data, you would assign
94a True value instead.  More subtle weighting of the models
95can be controlled using assembly.parts[idx].weight, but
96see below for a note on model weighting.
97
98A note on model weighting
99-------------------------
100
101Changing the weight is equivalent to scaling the error bars
102on the given model by a factor of weight/n where n is the
103number of data points.  It is better to set the correct error
104bars on your data in the first place than to adjust the weights.
105If you have the correct error bars, then you should expect
106roughly 2/3 of the data points to lie within one error bar of
107the theory curve.  If consecutive data points have largely
108overlapping errorbars, then your uncertainty is overestimated.
109
110Another case where weights are adjusted (abused?) is to
111compensate for systematic errors in the data by forcing the
112errorbars to be large enough to cover the systematic bias.
113This is a poor approach to the problem.  A better strategy
114is to capture the systematic effects in the model, and treat
115the measurement of the independent variable as an additional
116data point in the fit.  This is still not statistically sound
117as there is likely to be a large correlation between the
118uncertainty of the measurement and the values of all the
119other variables.
120
121That said, adjusting the weight on a dataset is a quick way
122of reducing its influence on the entire fit.  Please use it
123with care.
124"""
125
126__all__ = ['Assembly', 'Fitness']
127import numpy
128
129import park
130from park.parameter import Parameter,ParameterSet
131from park.fitresult import FitParameter
132import park.expression
133
134
135
136class Fitness(object):
137    """
138    Container for theory and data.
139
140    The fit object compares theory with data.
141
142    TODO: what to do with fittable metadata (e.g., footprint correction)?
143    """
144    data = None
145    model = None
146    def __init__(self, model=None,data=None):
147        self.data,self.model = data,model
148    def _parameterset(self):
149        return self.model.parameterset
150    parameterset = property(_parameterset)
151    def residuals(self):
152        return self.data.residuals(self.model.eval)
153    def residuals_deriv(self, pars=[]):
154        return self.data.residuals_deriv(self.model.eval_derivs,pars=pars)
155    def set(self, **kw):
156        """
157        Set parameters in the model.
158
159        User convenience function.  This allows a user with an assembly
160        of models in a script to for example set the fit range for
161        parameter 'a' of the model::
162            assembly[0].set(a=[5,6])
163
164        Raises KeyError if the parameter is not in parameterset.
165        """
166        self.model.set(**kw)
167    def abort(self):
168        if hasattr(self.model,'abort'): self.model.abort()
169
170class Part(object):
171    """
172    Part of a fitting assembly.  Part holds the model itself and
173    associated data.  The part can be initialized with a fitness
174    object or with a pair (model,data) for the default fitness function.
175
176    fitness (Fitness)
177        object implementing the `park.assembly.Fitness` interface.  In
178        particular, fitness should provide a parameterset attribute
179        containing a ParameterSet and a residuals method returning a vector
180        of residuals.
181    weight (dimensionless)
182        weight for the model.  See comments in assembly.py for details.
183    isfitted (boolean)
184        True if the model residuals should be included in the fit.
185        The model parameters may still be used in parameter
186        expressions, but there will be no comparison to the data.
187    residuals (vector)
188        Residuals for the model if they have been calculated, or None
189    degrees_of_freedom
190        Number of residuals minus number of fitted parameters.
191        Degrees of freedom for individual models does not make
192        sense in the presence of expressions combining models,
193        particularly in the case where a model has many parameters
194        but no data or many computed parameters.  The degrees of
195        freedom for the model is set to be at least one.
196    chisq
197        sum(residuals**2); use chisq/degrees_of_freedom to
198        get the reduced chisq value.
199
200        Get/set the weight on the given model.
201
202        assembly.weight(3) returns the weight on model 3 (0-origin)
203        assembly.weight(3,0.5) sets the weight on model 3 (0-origin)
204    """
205
206    def __init__(self, fitness, weight=1., isfitted=True):
207        if isinstance(fitness, tuple):
208            fitness = park.Fitness(*fitness)
209        self.fitness = fitness
210        self.weight = weight
211        self.isfitted = isfitted
212        self.residuals = None
213        self.chisq = numpy.Inf
214        self.degrees_of_freedom = 1
215
216
217class Assembly(object):
218    """
219    Collection of fit models.
220
221    Assembly implements the `park.fit.Objective` interface.
222
223    See `park.assembly` for usage.
224
225    Instance variables:
226
227    residuals : array
228        a vector of residuals spanning all models, with model
229        weights applied as appropriate.
230    degrees_of_freedom : integer
231        length of the residuals - number of fitted parameters
232    chisq : float
233        sum squared residuals; this is not the reduced chisq, which
234        you can get using chisq/degrees_of_freedom
235
236    These fields are defined for the individual models as well, with
237    degrees of freedom adjusted to the length of the individual data
238    set.  If the model is not fitted or the weight is zero, the residual
239    will not be calculated.
240
241    The residuals fields are available only after the model has been
242    evaluated.
243    """
244
245    def __init__(self, models=[]):
246        """Build an assembly from a list of models."""
247        self.parts = []
248        for m in models:
249            self.parts.append(Part(m))
250        self._reset()
251
252    def __iter__(self):
253        """Iterate through the models in order"""
254        for m in self.parts: yield m
255
256    def __getitem__(self, n):
257        """Return the nth model"""
258        return self.parts[n].fitness
259
260    def __setitem__(self, n, fitness):
261        """Replace the nth model"""
262        self.parts[n].fitness = fitness
263        self._reset()
264
265    def __delitem__(self, n):
266        """Delete the nth model"""
267        del self.parts[n]
268        self._reset()
269
270    def weight(self, idx, value=None):
271        """
272        Query the weight on a particular model.
273
274        Set weight to value if value is supplied.
275
276        :Parameters:
277         idx : integer
278           model number
279         value : float
280           model weight
281        :return: model weight
282        """
283        if value is not None:
284            self.parts[idx].weight = value
285        return self.parts[idx].weight
286
287    def isfitted(self, idx, value=None):
288        """
289        Query if a particular model is fitted.
290
291        Set isfitted to value if value is supplied.
292
293        :param idx: model number
294        :type idx: integer
295        :param value:
296        """
297        if value is not None:
298            self.parts[idx].isfitted = value
299        return self.parts[idx].isfitted
300
301    def append(self, fitness, weight=1.0, isfitted=True):
302        """
303        Add a model to the end of set.
304
305        :param fitness: the fitting model
306        The fitting model can be an instance of `park.assembly.Fitness`,
307        or a tuple of (`park.model.Model`,`park.data.Data1D`)
308        :param weight: model weighting (usually 1.0)
309        :param isfitted: whether model should be fit (equivalent to weight 0.)
310        """
311        self.parts.append(Part(fitness,weight,isfitted))
312        self._reset()
313
314    def insert(self, idx, fitness, weight=1.0, isfitted=True):
315        """Add a model to a particular position in the set."""
316        self.parts.insert(idx,Part(fitness,weight,isfitted))
317        self._reset()
318
319    def _reset(self):
320        """Adjust the parameter set after the addition of a new model."""
321        subsets = [m.fitness.parameterset for m in self]
322        self.parameterset = ParameterSet('root',subsets)
323        self.parameterset.setprefix()
324        #print [p.path for p in self.parameterset.flatten()]
325
326    def eval(self):
327        """
328        Recalculate the theory functions, and from them, the
329        residuals and chisq.
330
331        :note: Call this after the parameters have been updated.
332        """
333        # Handle abort from a separate thread.
334        self._cancel = False
335
336        # Evaluate the computed parameters
337        self._fitexpression()
338
339        # Check that the resulting parameters are in a feasible region.
340        if not self.isfeasible(): return numpy.inf
341
342        resid = []
343        k = len(self._fitparameters)
344        for m in self.parts:
345            # In order to support abort, need to be able to propagate an
346            # external abort signal from self.abort() into an abort signal
347            # for the particular model.  Can't see a way to do this which
348            # doesn't involve setting a state variable.
349            self._current_model = m
350            if self._cancel: return numpy.inf
351            if m.isfitted and m.weight != 0:
352                m.residuals = m.fitness.residuals()
353                N = len(m.residuals)
354                m.degrees_of_freedom = N-k if N>k else 1
355                m.chisq = numpy.sum(m.residuals**2)
356                resid.append(m.weight*m.residuals)
357        self.residuals = numpy.hstack(resid)
358        N = len(self.residuals)
359        self.degrees_of_freedom = N-k if N>k else 1
360        self.chisq = numpy.sum(self.residuals**2)
361        return self.chisq
362
363    def jacobian(self, pvec, step=1e-8):
364        """
365        Returns the derivative wrt the fit parameters at point p.
366
367        Numeric derivatives are calculated based on step, where step is
368        the portion of the total range for parameter j, or the portion of
369        point value p_j if the range on parameter j is infinite.
370        """
371        # Make sure the input vector is an array
372        pvec = numpy.asarray(pvec)
373        # We are being lazy here.  We can precompute the bounds, we can
374        # use the residuals_deriv from the sub-models which have analytic
375        # derivatives and we need only recompute the models which depend
376        # on the varying parameters.
377        # Meanwhile, let's compute the numeric derivative using the
378        # three point formula.
379        # We are not checking that the varied parameter in numeric
380        # differentiation is indeed feasible in the interval of interest.
381        range = zip(*[p.range for p in self._fitparameters])
382        lo,hi = [numpy.asarray(v) for v in range]
383        delta = (hi-lo)*step
384        # For infinite ranges, use p*1e-8 for the step size
385        idx = numpy.isinf(delta)
386        #print "J",idx,delta,pvec,type(idx),type(delta),type(pvec)
387        delta[idx] = pvec[idx]*step
388        delta[delta==0] = step
389
390        # Set the initial value
391        for k,v in enumerate(pvec):
392            self._fitparameters[k].value = v
393        # Gather the residuals
394        r = []
395        for k,v in enumerate(pvec):
396            # Center point formula:
397            #     df/dv = lim_{h->0} ( f(v+h)-f(v-h) ) / ( 2h )
398            h = delta[k]
399            self._fitparameters[k].value = v + h
400            self.eval()
401            rk = self.residuals
402            self._fitparameters[k].value = v - h
403            self.eval()
404            rk -= self.residuals
405            self._fitparameters[k].value = v
406            r.append(rk/(2*h))
407        # return the jacobian
408        return numpy.vstack(r).T
409
410
411    def cov(self, pvec):
412        """
413        Return the covariance matrix inv(J'J) at point p.
414        """
415
416        # Find cov of f at p
417        #     cov(f,p) = inv(J'J)
418        # Use SVD
419        #     J = U S V'
420        #     J'J = (U S V')' (U S V')
421        #         = V S' U' U S V'
422        #         = V S S V'
423        #     inv(J'J) = inv(V S S V')
424        #              = inv(V') inv(S S) inv(V)
425        #              = V inv (S S) V'
426        J = self.jacobian(pvec)
427        u,s,vh = numpy.linalg.svd(J,0)
428        JTJinv = numpy.dot(vh.T.conj()/s**2,vh)
429        return JTJinv
430
431    def stderr(self, pvec):
432        """
433        Return parameter uncertainty.
434
435        This is just the sqrt diagonal of covariance matrix inv(J'J) at point p.
436        """
437        return numpy.sqrt(numpy.diag(self.cov(pvec)))
438
439    def isfeasible(self):
440        """
441        Returns true if the parameter set is in a feasible region of the
442        modeling space.
443        """
444        return True
445
446    # Fitting service interface
447    def fit_parameters(self):
448        """
449        Return an alphabetical list of the fitting parameters.
450
451        This function is called once at the beginning of a fit,
452        and serves as a convenient place to precalculate what
453        can be precalculated such as the set of fitting parameters
454        and the parameter expressions evaluator.
455        """
456        self.parameterset.setprefix()
457        self._fitparameters = self.parameterset.fitted
458        self._restraints = self.parameterset.restrained
459        pars = self.parameterset.flatten()
460        context = self.parameterset.gather_context()
461        self._fitexpression = park.expression.build_eval(pars,context)
462        #print "constraints",self._fitexpression.__doc__
463
464        self._fitparameters.sort(lambda a,b: cmp(a.path,b.path))
465        # Convert to fitparameter a object
466        fitpars = [FitParameter(p.path,p.range,p.value)
467                   for p in self._fitparameters]
468        return fitpars
469
470    def set_result(self, result):
471        """
472        Set the parameters resulting from the fit into the parameter set,
473        and update the calculated expression.
474
475        The parameter values may be retrieved by walking the assembly.parameterset
476        tree, checking each parameter for isfitted, iscomputed, or isfixed.
477        For example::
478
479            assembly.set_result(result)
480            for p in assembly.parameterset.flatten():
481                if p.isfitted():
482                    print "%s %g in [%g,%g]"%(p.path,p.value,p.range[0],p.range[1])
483                elif p.iscomputed():
484                    print "%s computed as %g"%(p.path.p.value)
485
486        This does not calculate the function or the residuals for these parameters.
487        You can call assembly.eval() to do this.  The residuals will be set in
488        assembly[i].residuals.  The theory and data are model specific, and can
489        be found in assembly[i].fitness.data.
490        """
491        for n,p in enumerate(result.parameters):
492            self._fitparameters[n] = p.value
493        self._fitexpression()
494
495    def all_results(self, result):
496        """
497        Extend result from the fit with the calculated parameters.
498        """
499        calcpars = [FitParameter(p.path,p.range,p.value)
500                    for p in self.parameterset.computed]
501        result.parameters += calcpars
502
503    def result(self, status='step'):
504        """
505        Details to send back to the fitting client on an improved fit.
506
507        status is 'start', 'step' or 'end' depending if this is the
508        first result to return, an improved result, or the final result.
509
510        [Not implemented]
511        """
512        return None
513
514    def fresiduals(self, pvec):
515        chisq = self.__call__(pvec)
516        return self.residuals
517
518    def __call__(self, pvec):
519        """
520        Cost function.
521
522        Evaluate the system for the parameter vector pvec, returning chisq
523        as the cost function to be minimized.
524
525        Raises a runtime error if the number of fit parameters is
526        different than the length of the vector.
527        """
528        # Plug fit parameters into model
529        #print "Trying",pvec
530        pars = self._fitparameters
531        if len(pvec) != len(pars):
532            raise RuntimeError("Unexpected number of parameters")
533        for n,value in enumerate(pvec):
534            pars[n].value = value
535        # Evaluate model
536        chisq = self.eval()
537        # Evaluate additional restraints based on parameter value
538        # likelihood
539        restraints_penalty = 0
540        for p in self._restraints:
541            restraints_penalty += p.likelihood(p.value)
542        # Return total cost function
543        return self.chisq + restraints_penalty
544
545    def abort(self):
546        """
547        Interrupt the current function evaluation.
548
549        Forward this to the currently executing model if possible.
550        """
551        self._cancel = True
552        if hasattr(self._current_model,'abort'):
553            self._current_model.abort()
554
555class _Exp(park.Model):
556    """
557    Sample model for testing assembly.
558    """
559    parameters = ['a','c']
560    def eval(self,x):
561        return self.a*numpy.exp(self.c*x)
562class _Linear(park.Model):
563    parameters = ['a','c']
564    def eval(self,x):
565        #print "eval",self.a,self.c,x,self.a*x+self.c
566        return self.a*x+self.c
567def example():
568    """
569    Return an example assembly consisting of a pair of functions,
570        M1.a*exp(M1.c*x), M2.a*exp(2*M1.c*x)
571    and ideal data for
572        M1.a=1, M1.c=1.5, M2.a=2.5
573    """
574    import numpy
575    import park
576    from numpy import inf
577    # Make some fake data
578    x1 = numpy.linspace(0,1,11)
579    x2 = numpy.linspace(0,1,12)
580    # Define a shared model
581    if True: # Exp model
582        y1,y2 = numpy.exp(1.5*x1),2.5*numpy.exp(3*x2)
583        M1 = _Exp('M1',a=[1,3],c=[1,3])
584        M2 = _Exp('M2',a=[1,3],c='2*M1.c')
585        #M2 = _Exp('M2',a=[1,3],c=3)
586    else:  # Linear model
587        y1,y2 = x1+1.5, 2.5*x2+3
588        M1 = _Linear('M1',a=[1,3],c=[1,3])
589        M2 = _Linear('M2',a=[1,3],c='2*M1.c')
590    if False: # Unbounded
591        M1.a = [-inf,inf]
592        M1.c = [-inf,inf]
593        M2.a = [-inf,inf]
594    D1 = park.Data1D(x=x1, y=y1)
595    D2 = park.Data1D(x=x2, y=y2)
596    # Construct the assembly
597    assembly = park.Assembly([(M1,D1),(M2,D2)])
598    return assembly
599
600class _Sphere(park.Model):
601    parameters = ['a','b','c','d','e']
602    def eval(self,x):
603        return self.a*x**2+self.b*x+self.c + exp(self.d) - 3*sin(self.e)
604
605def example5():
606    import numpy
607    import park
608    from numpy import inf
609    # Make some fake data
610    x = numpy.linspace(0,1,11)
611    # Define a shared model
612    S = _Sphere(a=1,b=2,c=3,d=4,e=5)
613    y = S.eval(x1)
614    Sfit = _Sphere(a=[-inf,inf],b=[-inf,inf],c=[-inf,inf],d=[-inf,inf],e=[-inf,inf])
615    D = park.Data1D(x=x, y=y)
616    # Construct the assembly
617    assembly = park.Assembly([(Sfit,D)])
618    return assembly
619
620def test():
621    assembly = example()
622    assert assembly[0].parameterset.name == 'M1'
623
624    # extract the fitting parameters
625    pars = [p.name for p in assembly.fit_parameters()]
626    assert set(pars) == set(['M1.a','M1.c','M2.a'])
627    # Compute chisq and verify constraints are updated properly
628    assert assembly([1,1.5,2.5]) == 0
629    assert assembly[0].model.c == 1.5 and assembly[1].model.c == 3
630
631    # Try without constraints
632    assembly[1].set(c=3)
633    assembly.fit_parameters()  # Fit parameters have changed
634    assert assembly([1,1.5,2.5]) == 0
635
636    # Check that assembly.cov runs ... still need to check that it is correct!
637    C = assembly.cov(numpy.array([1,1.5,2.5]))
638
639if __name__ == "__main__": test()
Note: See TracBrowser for help on using the repository browser.