assembly.py @ a134fc6

ESS_GUIESS_GUI_DocsESS_GUI_batch_fittingESS_GUI_bumps_abstractionESS_GUI_iss1116ESS_GUI_iss879ESS_GUI_iss959ESS_GUI_openclESS_GUI_orderingESS_GUI_sync_sascalccostrafo411magnetic_scattrelease-4.1.1release-4.1.2release-4.2.2release_4.0.1ticket-1009ticket-1094-headlessticket-1242-2d-resolutionticket-1243ticket-1249ticket885unittest-saveload

Last change on this file since a134fc6 was 3570545, checked in by Mathieu Doucet <doucetm@…>, 13 years ago
Adding park Part 2
Property mode set to `100644`
File size: 22.8 KB

Line
1	# This program is public domain
2	"""
3	An assembly is a collection of fitting functions. This provides
4	the model representation that is the basis of the park fitting engine.
5
6	Models can range from very simple one dimensional theory functions
7	to complex assemblies of multidimensional datasets from different
8	experimental techniques, each with their own theory function and
9	a common underlying physical model.
10
11	Usage
12	=====
13
14	First define the models you want to work with. In the example
15	below we will use an example of a simple multilayer system measured by
16	specular reflection of xrays and neutrons. The gold depth is the only
17	fitting parameter, ranging from 10-30 A. The interface depths are
18	tied together using expressions. In this case the expression is
19	a simple copy, but any standard math functions can be used. Some
20	model developers may provide additional functions for use with the
21	expression.
22
23	Example models::
24
25	import reflectometry.model1d as refl
26	xray = refl.model('xray')
27	xray.incident('Air',rho=0)
28	xray.interface('iAu',sigma=5)
29	xray.layer('Au',rho=124.68,depth=[10,30])
30	xray.interface('iSi',sigma=5)
31	xray.substrate('Si',rho=20.07)
32	datax = refl.data('xray.dat')
33
34	neutron = refl.model('neutron')
35	neutron.incident('Air',rho=0)
36	neutron.interface('iAu',sigma='xray.iAu')
37	neutron.layer('Au',rho=4.66,depth='xray.Au.depth')
38	neutron.interface('iSi',sigma='xray.iSi')
39	neutron.substrate('Si',rho=2.07)
40	datan = refl.data('neutron.dat')
41
42	As you can see from the above, parameters can be set to a value if
43	the parameter is fixed, to a range if the parametemr is fitted, or
44	to a string expression if the parameter is calculated from other
45	parameters. See park.Parameter.set for further details.
46
47	Having constructed the models, we can now create an assembly::
48
49	import park
50	assembly = park.Assembly([(xray,datax), (neutron,datan)])
51
52	Note: this would normally be done in the context of a fit
53	using fit = park.Fit([(xray,datax), (neutron,datan)]), and later referenced
54	using fit.assembly.
55
56	Individual parts of the assembly are accessable using the
57	model number 0, 1, 2... or by the model name. In the above,
58	assembly[0] and assembly['xray'] refer to the same model.
59	Assemblies have insert and append functions for adding new
60	models, and "del model[idx]" for removing them.
61
62	Once the assembly is created computing the values for the system
63	is a matter of calling::
64
65	assembly.eval()
66	print "Chi**2",assembly.chisq
67	print "Reduced chi**2",assembly.chisq/assembly.degrees_of_freedom
68	plot(arange(len(assembly.residuals)), assembly.residuals)
69
70	This defines the attributes residuals, degrees_of_freedom and chisq,
71	which is what the optimizer uses as the cost function to minimize.
72
73	assembly.eval uses the current values for the parameters in the
74	individual models. These parameters can be changed directly
75	in the model. In the reflectometry example above, you could
76	set the gold thickness using xray.layer.Au.depth=156, or
77	something similar (the details are model specific). Parameters
78	can also be changed through the assembly parameter set. In the same
79	example, this would be assembly.parameterset['xray']['Au']['depth'].
80	See parameter set for details.
81
82	In the process of modeling data, particularly with multiple
83	datasets, you will sometimes want to temporarily ignore
84	how well one of the datasets matches so that you
85	can more quickly refine the model for the other datasets,
86	or see how particular models are influencing the fit. To
87	temporarily ignore the xray data in the example above use::
88
89	assembly.parts[0].isfitted = False
90
91	The model itself isn't ignored since its parameters may be
92	needed to compute the parameters for other models. To
93	reenable checking against the xray data, you would assign
94	a True value instead. More subtle weighting of the models
95	can be controlled using assembly.parts[idx].weight, but
96	see below for a note on model weighting.
97
98	A note on model weighting
99	-------------------------
100
101	Changing the weight is equivalent to scaling the error bars
102	on the given model by a factor of weight/n where n is the
103	number of data points. It is better to set the correct error
104	bars on your data in the first place than to adjust the weights.
105	If you have the correct error bars, then you should expect
106	roughly 2/3 of the data points to lie within one error bar of
107	the theory curve. If consecutive data points have largely
108	overlapping errorbars, then your uncertainty is overestimated.
109
110	Another case where weights are adjusted (abused?) is to
111	compensate for systematic errors in the data by forcing the
112	errorbars to be large enough to cover the systematic bias.
113	This is a poor approach to the problem. A better strategy
114	is to capture the systematic effects in the model, and treat
115	the measurement of the independent variable as an additional
116	data point in the fit. This is still not statistically sound
117	as there is likely to be a large correlation between the
118	uncertainty of the measurement and the values of all the
119	other variables.
120
121	That said, adjusting the weight on a dataset is a quick way
122	of reducing its influence on the entire fit. Please use it
123	with care.
124	"""
125
126	__all__ = ['Assembly', 'Fitness']
127	import numpy
128
129	import park
130	from park.parameter import Parameter,ParameterSet
131	from park.fitresult import FitParameter
132	import park.expression
133
134
135
136	class Fitness(object):
137	"""
138	Container for theory and data.
139
140	The fit object compares theory with data.
141
142	TODO: what to do with fittable metadata (e.g., footprint correction)?
143	"""
144	data = None
145	model = None
146	def __init__(self, model=None,data=None):
147	self.data,self.model = data,model
148	def _parameterset(self):
149	return self.model.parameterset
150	parameterset = property(_parameterset)
151	def residuals(self):
152	return self.data.residuals(self.model.eval)
153	def residuals_deriv(self, pars=[]):
154	return self.data.residuals_deriv(self.model.eval_derivs,pars=pars)
155	def set(self, **kw):
156	"""
157	Set parameters in the model.
158
159	User convenience function. This allows a user with an assembly
160	of models in a script to for example set the fit range for
161	parameter 'a' of the model::
162	assembly[0].set(a=[5,6])
163
164	Raises KeyError if the parameter is not in parameterset.
165	"""
166	self.model.set(**kw)
167	def abort(self):
168	if hasattr(self.model,'abort'): self.model.abort()
169
170	class Part(object):
171	"""
172	Part of a fitting assembly. Part holds the model itself and
173	associated data. The part can be initialized with a fitness
174	object or with a pair (model,data) for the default fitness function.
175
176	fitness (Fitness)
177	object implementing the `park.assembly.Fitness` interface. In
178	particular, fitness should provide a parameterset attribute
179	containing a ParameterSet and a residuals method returning a vector
180	of residuals.
181	weight (dimensionless)
182	weight for the model. See comments in assembly.py for details.
183	isfitted (boolean)
184	True if the model residuals should be included in the fit.
185	The model parameters may still be used in parameter
186	expressions, but there will be no comparison to the data.
187	residuals (vector)
188	Residuals for the model if they have been calculated, or None
189	degrees_of_freedom
190	Number of residuals minus number of fitted parameters.
191	Degrees of freedom for individual models does not make
192	sense in the presence of expressions combining models,
193	particularly in the case where a model has many parameters
194	but no data or many computed parameters. The degrees of
195	freedom for the model is set to be at least one.
196	chisq
197	sum(residuals**2); use chisq/degrees_of_freedom to
198	get the reduced chisq value.
199
200	Get/set the weight on the given model.
201
202	assembly.weight(3) returns the weight on model 3 (0-origin)
203	assembly.weight(3,0.5) sets the weight on model 3 (0-origin)
204	"""
205
206	def __init__(self, fitness, weight=1., isfitted=True):
207	if isinstance(fitness, tuple):
208	fitness = park.Fitness(*fitness)
209	self.fitness = fitness
210	self.weight = weight
211	self.isfitted = isfitted
212	self.residuals = None
213	self.chisq = numpy.Inf
214	self.degrees_of_freedom = 1
215
216
217	class Assembly(object):
218	"""
219	Collection of fit models.
220
221	Assembly implements the `park.fit.Objective` interface.
222
223	See `park.assembly` for usage.
224
225	Instance variables:
226
227	residuals : array
228	a vector of residuals spanning all models, with model
229	weights applied as appropriate.
230	degrees_of_freedom : integer
231	length of the residuals - number of fitted parameters
232	chisq : float
233	sum squared residuals; this is not the reduced chisq, which
234	you can get using chisq/degrees_of_freedom
235
236	These fields are defined for the individual models as well, with
237	degrees of freedom adjusted to the length of the individual data
238	set. If the model is not fitted or the weight is zero, the residual
239	will not be calculated.
240
241	The residuals fields are available only after the model has been
242	evaluated.
243	"""
244
245	def __init__(self, models=[]):
246	"""Build an assembly from a list of models."""
247	self.parts = []
248	for m in models:
249	self.parts.append(Part(m))
250	self._reset()
251
252	def __iter__(self):
253	"""Iterate through the models in order"""
254	for m in self.parts: yield m
255
256	def __getitem__(self, n):
257	"""Return the nth model"""
258	return self.parts[n].fitness
259
260	def __setitem__(self, n, fitness):
261	"""Replace the nth model"""
262	self.parts[n].fitness = fitness
263	self._reset()
264
265	def __delitem__(self, n):
266	"""Delete the nth model"""
267	del self.parts[n]
268	self._reset()
269
270	def weight(self, idx, value=None):
271	"""
272	Query the weight on a particular model.
273
274	Set weight to value if value is supplied.
275
276	:Parameters:
277	idx : integer
278	model number
279	value : float
280	model weight
281	:return: model weight
282	"""
283	if value is not None:
284	self.parts[idx].weight = value
285	return self.parts[idx].weight
286
287	def isfitted(self, idx, value=None):
288	"""
289	Query if a particular model is fitted.
290
291	Set isfitted to value if value is supplied.
292
293	:param idx: model number
294	:type idx: integer
295	:param value:
296	"""
297	if value is not None:
298	self.parts[idx].isfitted = value
299	return self.parts[idx].isfitted
300
301	def append(self, fitness, weight=1.0, isfitted=True):
302	"""
303	Add a model to the end of set.
304
305	:param fitness: the fitting model
306	The fitting model can be an instance of `park.assembly.Fitness`,
307	or a tuple of (`park.model.Model`,`park.data.Data1D`)
308	:param weight: model weighting (usually 1.0)
309	:param isfitted: whether model should be fit (equivalent to weight 0.)
310	"""
311	self.parts.append(Part(fitness,weight,isfitted))
312	self._reset()
313
314	def insert(self, idx, fitness, weight=1.0, isfitted=True):
315	"""Add a model to a particular position in the set."""
316	self.parts.insert(idx,Part(fitness,weight,isfitted))
317	self._reset()
318
319	def _reset(self):
320	"""Adjust the parameter set after the addition of a new model."""
321	subsets = [m.fitness.parameterset for m in self]
322	self.parameterset = ParameterSet('root',subsets)
323	self.parameterset.setprefix()
324	#print [p.path for p in self.parameterset.flatten()]
325
326	def eval(self):
327	"""
328	Recalculate the theory functions, and from them, the
329	residuals and chisq.
330
331	:note: Call this after the parameters have been updated.
332	"""
333	# Handle abort from a separate thread.
334	self._cancel = False
335
336	# Evaluate the computed parameters
337	self._fitexpression()
338
339	# Check that the resulting parameters are in a feasible region.
340	if not self.isfeasible(): return numpy.inf
341
342	resid = []
343	k = len(self._fitparameters)
344	for m in self.parts:
345	# In order to support abort, need to be able to propagate an
346	# external abort signal from self.abort() into an abort signal
347	# for the particular model. Can't see a way to do this which
348	# doesn't involve setting a state variable.
349	self._current_model = m
350	if self._cancel: return numpy.inf
351	if m.isfitted and m.weight != 0:
352	m.residuals = m.fitness.residuals()
353	N = len(m.residuals)
354	m.degrees_of_freedom = N-k if N>k else 1
355	m.chisq = numpy.sum(m.residuals**2)
356	resid.append(m.weight*m.residuals)
357	self.residuals = numpy.hstack(resid)
358	N = len(self.residuals)
359	self.degrees_of_freedom = N-k if N>k else 1
360	self.chisq = numpy.sum(self.residuals**2)
361	return self.chisq
362
363	def jacobian(self, pvec, step=1e-8):
364	"""
365	Returns the derivative wrt the fit parameters at point p.
366
367	Numeric derivatives are calculated based on step, where step is
368	the portion of the total range for parameter j, or the portion of
369	point value p_j if the range on parameter j is infinite.
370	"""
371	# Make sure the input vector is an array
372	pvec = numpy.asarray(pvec)
373	# We are being lazy here. We can precompute the bounds, we can
374	# use the residuals_deriv from the sub-models which have analytic
375	# derivatives and we need only recompute the models which depend
376	# on the varying parameters.
377	# Meanwhile, let's compute the numeric derivative using the
378	# three point formula.
379	# We are not checking that the varied parameter in numeric
380	# differentiation is indeed feasible in the interval of interest.
381	range = zip(*[p.range for p in self._fitparameters])
382	lo,hi = [numpy.asarray(v) for v in range]
383	delta = (hi-lo)*step
384	# For infinite ranges, use p*1e-8 for the step size
385	idx = numpy.isinf(delta)
386	#print "J",idx,delta,pvec,type(idx),type(delta),type(pvec)
387	delta[idx] = pvec[idx]*step
388	delta[delta==0] = step
389
390	# Set the initial value
391	for k,v in enumerate(pvec):
392	self._fitparameters[k].value = v
393	# Gather the residuals
394	r = []
395	for k,v in enumerate(pvec):
396	# Center point formula:
397	# df/dv = lim_{h->0} ( f(v+h)-f(v-h) ) / ( 2h )
398	h = delta[k]
399	self._fitparameters[k].value = v + h
400	self.eval()
401	rk = self.residuals
402	self._fitparameters[k].value = v - h
403	self.eval()
404	rk -= self.residuals
405	self._fitparameters[k].value = v
406	r.append(rk/(2*h))
407	# return the jacobian
408	return numpy.vstack(r).T
409
410
411	def cov(self, pvec):
412	"""
413	Return the covariance matrix inv(J'J) at point p.
414	"""
415
416	# Find cov of f at p
417	# cov(f,p) = inv(J'J)
418	# Use SVD
419	# J = U S V'
420	# J'J = (U S V')' (U S V')
421	# = V S' U' U S V'
422	# = V S S V'
423	# inv(J'J) = inv(V S S V')
424	# = inv(V') inv(S S) inv(V)
425	# = V inv (S S) V'
426	J = self.jacobian(pvec)
427	u,s,vh = numpy.linalg.svd(J,0)
428	JTJinv = numpy.dot(vh.T.conj()/s**2,vh)
429	return JTJinv
430
431	def stderr(self, pvec):
432	"""
433	Return parameter uncertainty.
434
435	This is just the sqrt diagonal of covariance matrix inv(J'J) at point p.
436	"""
437	return numpy.sqrt(numpy.diag(self.cov(pvec)))
438
439	def isfeasible(self):
440	"""
441	Returns true if the parameter set is in a feasible region of the
442	modeling space.
443	"""
444	return True
445
446	# Fitting service interface
447	def fit_parameters(self):
448	"""
449	Return an alphabetical list of the fitting parameters.
450
451	This function is called once at the beginning of a fit,
452	and serves as a convenient place to precalculate what
453	can be precalculated such as the set of fitting parameters
454	and the parameter expressions evaluator.
455	"""
456	self.parameterset.setprefix()
457	self._fitparameters = self.parameterset.fitted
458	self._restraints = self.parameterset.restrained
459	pars = self.parameterset.flatten()
460	context = self.parameterset.gather_context()
461	self._fitexpression = park.expression.build_eval(pars,context)
462	#print "constraints",self._fitexpression.__doc__
463
464	self._fitparameters.sort(lambda a,b: cmp(a.path,b.path))
465	# Convert to fitparameter a object
466	fitpars = [FitParameter(p.path,p.range,p.value)
467	for p in self._fitparameters]
468	return fitpars
469
470	def set_result(self, result):
471	"""
472	Set the parameters resulting from the fit into the parameter set,
473	and update the calculated expression.
474
475	The parameter values may be retrieved by walking the assembly.parameterset
476	tree, checking each parameter for isfitted, iscomputed, or isfixed.
477	For example::
478
479	assembly.set_result(result)
480	for p in assembly.parameterset.flatten():
481	if p.isfitted():
482	print "%s %g in [%g,%g]"%(p.path,p.value,p.range[0],p.range[1])
483	elif p.iscomputed():
484	print "%s computed as %g"%(p.path.p.value)
485
486	This does not calculate the function or the residuals for these parameters.
487	You can call assembly.eval() to do this. The residuals will be set in
488	assembly[i].residuals. The theory and data are model specific, and can
489	be found in assembly[i].fitness.data.
490	"""
491	for n,p in enumerate(result.parameters):
492	self._fitparameters[n] = p.value
493	self._fitexpression()
494
495	def all_results(self, result):
496	"""
497	Extend result from the fit with the calculated parameters.
498	"""
499	calcpars = [FitParameter(p.path,p.range,p.value)
500	for p in self.parameterset.computed]
501	result.parameters += calcpars
502
503	def result(self, status='step'):
504	"""
505	Details to send back to the fitting client on an improved fit.
506
507	status is 'start', 'step' or 'end' depending if this is the
508	first result to return, an improved result, or the final result.
509
510	[Not implemented]
511	"""
512	return None
513
514	def fresiduals(self, pvec):
515	chisq = self.__call__(pvec)
516	return self.residuals
517
518	def __call__(self, pvec):
519	"""
520	Cost function.
521
522	Evaluate the system for the parameter vector pvec, returning chisq
523	as the cost function to be minimized.
524
525	Raises a runtime error if the number of fit parameters is
526	different than the length of the vector.
527	"""
528	# Plug fit parameters into model
529	#print "Trying",pvec
530	pars = self._fitparameters
531	if len(pvec) != len(pars):
532	raise RuntimeError("Unexpected number of parameters")
533	for n,value in enumerate(pvec):
534	pars[n].value = value
535	# Evaluate model
536	chisq = self.eval()
537	# Evaluate additional restraints based on parameter value
538	# likelihood
539	restraints_penalty = 0
540	for p in self._restraints:
541	restraints_penalty += p.likelihood(p.value)
542	# Return total cost function
543	return self.chisq + restraints_penalty
544
545	def abort(self):
546	"""
547	Interrupt the current function evaluation.
548
549	Forward this to the currently executing model if possible.
550	"""
551	self._cancel = True
552	if hasattr(self._current_model,'abort'):
553	self._current_model.abort()
554
555	class _Exp(park.Model):
556	"""
557	Sample model for testing assembly.
558	"""
559	parameters = ['a','c']
560	def eval(self,x):
561	return self.anumpy.exp(self.cx)
562	class _Linear(park.Model):
563	parameters = ['a','c']
564	def eval(self,x):
565	#print "eval",self.a,self.c,x,self.a*x+self.c
566	return self.a*x+self.c
567	def example():
568	"""
569	Return an example assembly consisting of a pair of functions,
570	M1.aexp(M1.cx), M2.aexp(2M1.c*x)
571	and ideal data for
572	M1.a=1, M1.c=1.5, M2.a=2.5
573	"""
574	import numpy
575	import park
576	from numpy import inf
577	# Make some fake data
578	x1 = numpy.linspace(0,1,11)
579	x2 = numpy.linspace(0,1,12)
580	# Define a shared model
581	if True: # Exp model
582	y1,y2 = numpy.exp(1.5x1),2.5numpy.exp(3*x2)
583	M1 = _Exp('M1',a=[1,3],c=[1,3])
584	M2 = _Exp('M2',a=[1,3],c='2*M1.c')
585	#M2 = _Exp('M2',a=[1,3],c=3)
586	else: # Linear model
587	y1,y2 = x1+1.5, 2.5*x2+3
588	M1 = _Linear('M1',a=[1,3],c=[1,3])
589	M2 = _Linear('M2',a=[1,3],c='2*M1.c')
590	if False: # Unbounded
591	M1.a = [-inf,inf]
592	M1.c = [-inf,inf]
593	M2.a = [-inf,inf]
594	D1 = park.Data1D(x=x1, y=y1)
595	D2 = park.Data1D(x=x2, y=y2)
596	# Construct the assembly
597	assembly = park.Assembly([(M1,D1),(M2,D2)])
598	return assembly
599
600	class _Sphere(park.Model):
601	parameters = ['a','b','c','d','e']
602	def eval(self,x):
603	return self.ax2+self.bx+self.c + exp(self.d) - 3*sin(self.e)
604
605	def example5():
606	import numpy
607	import park
608	from numpy import inf
609	# Make some fake data
610	x = numpy.linspace(0,1,11)
611	# Define a shared model
612	S = _Sphere(a=1,b=2,c=3,d=4,e=5)
613	y = S.eval(x1)
614	Sfit = _Sphere(a=[-inf,inf],b=[-inf,inf],c=[-inf,inf],d=[-inf,inf],e=[-inf,inf])
615	D = park.Data1D(x=x, y=y)
616	# Construct the assembly
617	assembly = park.Assembly([(Sfit,D)])
618	return assembly
619
620	def test():
621	assembly = example()
622	assert assembly[0].parameterset.name == 'M1'
623
624	# extract the fitting parameters
625	pars = [p.name for p in assembly.fit_parameters()]
626	assert set(pars) == set(['M1.a','M1.c','M2.a'])
627	# Compute chisq and verify constraints are updated properly
628	assert assembly([1,1.5,2.5]) == 0
629	assert assembly[0].model.c == 1.5 and assembly[1].model.c == 3
630
631	# Try without constraints
632	assembly[1].set(c=3)
633	assembly.fit_parameters() # Fit parameters have changed
634	assert assembly([1,1.5,2.5]) == 0
635
636	# Check that assembly.cov runs ... still need to check that it is correct!
637	C = assembly.cov(numpy.array([1,1.5,2.5]))
638
639	if __name__ == "__main__": test()

Note: See TracBrowser for help on using the repository browser.

SasView

source: sasview/park-1.2.1/park/assembly.py @ a134fc6

Download in other formats: