calculator.rst @ 9ee2756

core_shell_microgelsmagnetic_modelticket-1257-vesicle-productticket_1156ticket_1265_superballticket_822_more_unit_tests

Last change on this file since 9ee2756 was 9ee2756, checked in by Paul Kienzle <pkienzle@…>, 7 years ago
simplify kernel wrapper code and combine OpenCL with DLL in one file
Property mode set to `100644`
File size: 13.4 KB

Rev	Line
[e822c97]	1	.. currentmodule:: sasmodels
	2
[870a2f4]	3	.. _Calculator_Interface:
	4
[d5ac45f]	5	Calculator Interface
	6	====================
	7
[e822c97]	8	This document describes the layer between the form factor kernels and the
[9ee2756]	9	model calculator which implements the dispersity and magnetic SLD
[e822c97]	10	calculations. There are three separate implementations of this layer,
[2e66ef5]	11	:mod:`kernelcl` for OpenCL, which operates on a single Q value at a time,
	12	:mod:`kerneldll` for the DLL, which loops over a vector of Q values, and
	13	:mod:`kernelpy` for python models which operates on vector Q values.
[e822c97]	14
	15	Each implementation provides three different calls Iq, Iqxy and Imagnetic
[9ee2756]	16	for 1-D, 2-D and 2-D magnetic kernels respectively. The C code is defined
	17	in kernel_iq.c, with the minor differences between OpenCL and DLL handled
	18	by #ifdef statements.
[e822c97]	19
[9ee2756]	20	The kernel call looks as follows::
[e822c97]	21
	22	kernel void KERNEL_NAME(
	23	int nq, // Number of q values in the q vector
[9ee2756]	24	int pd_start, // Starting position in the dispersity loop
	25	int pd_stop, // Ending position in the dispersity loop
	26	ProblemDetails *details, // dispersity info
[e822c97]	27	double *values, // Value and weights vector
	28	double *q, // q or (qx,qy) vector
	29	double *result, // returned I(q), with result[nq] = pd_weight
[9ee2756]	30	double cutoff) // dispersity weight cutoff
[e822c97]	31
	32	The details for OpenCL and the python loop are slightly different, but these
	33	data structures are common.
	34
	35	nq indicates the number of q values that will be calculated.
	36
[9ee2756]	37	The pd_start and pd_stop parameters set the range of the dispersity
	38	loop to compute for the current kernel call. Give a dispersity
[e822c97]	39	calculation with 30 weights for length and 30 weights for radius for example,
	40	there are a total of 900 calls to the form factor required to compute the
	41	kernel. These might be done in chunks of 100, so the first call would start
	42	at zero and stop after 100 iterations. The second call would then have to set
	43	the length index to 3 and the radius index to 10 for a position of 3*30+10=100,
	44	and could then proceed to position 200. This allows us to interrupt the
[9ee2756]	45	calculation in the middle of a long dispersity loop without having to
[e822c97]	46	do special tricks with the C code. More importantly, it stops the OpenCL
	47	kernel in a reasonable time; because the GPU is used by the operating
	48	system to show its windows, if a GPU kernel runs too long then it will be
	49	automatically killed and no results will be returned to the caller.
	50
	51	The ProblemDetails structure is a direct map of the
[9ee2756]	52	:class:`details.CallDetails` buffer. This indicates which parameters have
	53	dispersity, and where in the values vector the values and weights can be
	54	found. For each parameter with dispersity there is a parameter id, the length
	55	of the dispersity loop for that parameter, the offset of the parameter
[e822c97]	56	values in the pd value and pd weight vectors and the 'stride' from one index
	57	to the next, which is used to translate between the position in the
[9ee2756]	58	dispersity loop and the particular parameter indices. The num_eval
	59	field is the total size of the dispersity loop. num_weights is the
[e822c97]	60	number of elements in the pd value and pd weight vectors. num_active is
[9ee2756]	61	the number of non-trivial pd loops (parameters with dispersity should be ordered
	62	by decreasing pd vector length, with a length of 1 meaning no dispersity).
[e822c97]	63	Oriented objects in 2-D need a cos(theta) spherical correction on the angular
	64	variation in order to preserve the 'surface area' of the weight distribution.
	65	theta_par is the id of the polar coordinate parameter if there is one.
	66
	67
	68	The values vector consists of the fixed values for the model plus pd value
	69	and pd weight vectors. There are NUM_VALUES fixed values for the model,
	70	which includes the initial two slots for scale and background, the NUM_PARS
	71	values for the kernel parameters, three slots for the applied magnetism, and
	72	three slots for each of the SLD parameters for the sample magnetiziation
	73	(Mx, My, Mz). Sample magnetization is translated from (M, theta, phi)
	74	to (Mx, My, Mz) before the kernel is called. After the fixed values comes
[9ee2756]	75	the pd value vector, with the dispersity values for each parameter
[e822c97]	76	stacked one after the other. The order isn't important since the location
	77	for each parameter is stored in the pd_offset field of the ProblemDetails
	78	structure, but the values do need to be contiguous. After num_weights
	79	values, the pd weight vector is stored, with the same configuration as the
	80	pd value vector. Note that the pd vectors can contain values that are not
[9ee2756]	81	in the dispersity loop; this is used by :class:`mixture.MixtureKernel`
[e822c97]	82	to make it easier to call the various component kernels.
	83
	84	The q vector contains one value for each q for Iq kernels, or a pair
	85	of (qx,qy) values for each q for Iqxy and Imagnetic kernels. OpenCL
	86	pads all vectors to 32 value boundaries just to be safe, including the
	87	values vector and the the results vector.
	88
	89	The results vector contains one slot for each of the nq values, plus
[9ee2756]	90	one extra slot at the end for the weight normalization accumulated across
	91	all points in the dispersity mesh. This is required when the dispersity
	92	loop is broken across several kernel calls.
[e822c97]	93
	94	cutoff is a importance cutoff so that points which contribute negligibly
	95	to the total scattering can be skipped without calculating them.
	96
	97	:func:`generate.make_source` defines the following C macros:
[d5ac45f]	98
	99	- USE_OPENCL is defined if running in opencl
[9ee2756]	100	- MAX_PD is the maximum depth of the dispersity loop [model specific]
[e822c97]	101	- NUM_PARS is the number of parameter values in the kernel. This may be
	102	more than the number of parameters if some of the parameters are vector
	103	values.
	104	- NUM_VALUES is the number of fixed values, which defines the offset in the
[9ee2756]	105	value list to the dispersity value and weight vectors.
[e822c97]	106	- NUM_MAGNETIC is the number of magnetic SLDs
	107	- MAGNETIC_PARS is a comma separated list of the magnetic SLDs, indicating
	108	their locations in the values vector.
[9ee2756]	109	- MAGNETIC_PAR1, ... are the first three magnetic parameter ids so we can
	110	hard code the setting of magnetic values if there are only a few of them.
[e822c97]	111	- KERNEL_NAME is the name of the function being declared
	112	- PARAMETER_TABLE is the declaration of the parameters to the kernel:
[d5ac45f]	113
[e822c97]	114	Cylinder::
[d5ac45f]	115
	116	#define PARAMETER_TABLE \
	117	double length; \
	118	double radius; \
	119	double sld; \
[e822c97]	120	double sld_solvent;
[d5ac45f]	121
	122	Note: scale and background are never included
	123
[e822c97]	124	Multi-shell cylinder (10 shell max)::
[d5ac45f]	125
	126	#define PARAMETER_TABLE \
	127	double num_shells; \
	128	double length; \
	129	double radius[10]; \
	130	double sld[10]; \
[e822c97]	131	double sld_solvent;
[d5ac45f]	132
[e822c97]	133	- CALL_IQ(q, i, var) is the declaration of a call to the kernel:
[d5ac45f]	134
[e822c97]	135	Cylinder::
[d5ac45f]	136
[03cac08]	137	#define CALL_IQ(q, i, var) Iq(q[i], \
[d5ac45f]	138	var.length, \
	139	var.radius, \
	140	var.sld, \
	141	var.sld_solvent)
	142
[e822c97]	143	Multi-shell cylinder::
[d5ac45f]	144
[03cac08]	145	#define CALL_IQ(q, i, var) Iq(q[i], \
[d5ac45f]	146	var.num_shells, \
	147	var.length, \
	148	var.radius, \
	149	var.sld, \
	150	var.sld_solvent)
	151
[e822c97]	152	Cylinder2D::
[03cac08]	153
[9ee2756]	154	#define CALL_IQ(q, i, var) Iqxy(qa, qc, \
[03cac08]	155	var.length, \
	156	var.radius, \
	157	var.sld, \
[9ee2756]	158	var.sld_solvent)
[03cac08]	159
[d5ac45f]	160	- CALL_VOLUME(var) is similar, but for calling the form volume::
	161
	162	#define CALL_VOLUME(var) \
	163	form_volume(var.length, var.radius)
	164
[e822c97]	165	There is an additional macro that can be defined within the model.c file:
[d5ac45f]	166
[e822c97]	167	- INVALID(var) is a test for model parameters in the correct range:
	168
	169	Cylinder::
[d5ac45f]	170
	171	#define INVALID(var) 0
	172
[e822c97]	173	BarBell::
[d5ac45f]	174
[6fd8de0]	175	#define INVALID(var) (var.bell_radius < var.radius)
[d5ac45f]	176
[e822c97]	177	Model with complicated constraints::
[d5ac45f]	178
	179	inline bool constrained(p1, p2, p3) { return expression; }
	180	#define INVALID(var) constrained(var.p1, var.p2, var.p3)
	181
[9ee2756]	182	Our design supports a limited number of dispersity loops, wherein
	183	we need to cycle through the values of the dispersity, calculate
[d5ac45f]	184	the I(q, p) for each combination of parameters, and perform a normalized
	185	weighted sum across all the weights. Parameters may be passed to the
[9ee2756]	186	underlying calculation engine as scalars or vectors, but the dispersity
[d5ac45f]	187	calculator treats the parameter set as one long vector.
	188
[9ee2756]	189	Let's assume we have 8 parameters in the model, two of which allow dispersity.
	190	Since this is a 1-D model the orientation parameters won't be used::
[d5ac45f]	191
	192	0: scale {scl = constant}
	193	1: background {bkg = constant}
[e822c97]	194	2: radius {r = vector of 10pts}
	195	3: length {l = vector of 30pts}
[9ee2756]	196	4: sld {s1 = constant/(radius*2length)}
[6fd8de0]	197	5: sld_solvent {s2 = constant}
[e822c97]	198	6: theta {not used}
	199	7: phi {not used}
	200
	201	This generates the following call to the kernel. Note that parameters 4 and
[9ee2756]	202	5 are treated as having dispersity even though they don't --- this is because
[e822c97]	203	it is harmless to do so and simplifies the looping code::
	204
	205	MAX_PD = 4
	206	NUM_PARS = 6 // kernel parameters only
	207	NUM_VALUES = 17 // all values, including scale and background
	208	NUM_MAGNETIC = 2 // two parameters might be magnetic
	209	MAGNETIC_PARS = 4, 5 // they are sld and sld_solvent
	210	MAGNETIC_PAR0 = 4 // sld index
	211	MAGNETIC_PAR1 = 5 // solvent index
	212
	213	details {
	214	pd_par = {3, 2, 4, 5} // parameters radius and length vary
	215	pd_length = {30, 10, 1, 1} // length has more, so it is first
	216	pd_offset = {10, 0, 31, 32} // length starts at index 10 in weights
[d5ac45f]	217	pd_stride = {1, 30, 300, 300} // cumulative product of pd length
[9ee2756]	218	num_eval = 300 // 300 values in the dispersity loop
[e822c97]	219	num_weights = 42 // 42 values in the pd vector
	220	num_active = 2 // only the first two pd are active
	221	theta_var = 6 // spherical correction
[d5ac45f]	222	}
	223
[e822c97]	224	values = { scl, bkg, // universal
[9ee2756]	225	r, l, s1, s2, theta, phi, // kernel pars
[e822c97]	226	in spin, out spin, spin angle, // applied magnetism
[9ee2756]	227	mx s1, my s1, mz s1, mx s2, my s2, mz s2, // magnetic slds
[e822c97]	228	r0, .., r9, l0, .., l29, s, s2, // pd values
	229	r0, .., r9, l0, .., l29, s, s2} // pd weights
[d5ac45f]	230
	231	nq = 130
	232	q = { q0, q1, ..., q130, x, x } # pad to 8 element boundary
[e822c97]	233	result = {r1, ..., r130, pd_norm, x }
[d5ac45f]	234
[9ee2756]	235	The dispersity parameters are stored in as an array of parameter
	236	indices, one for each parameter, stored in pd_par[n]. Parameters which do
	237	not support dispersity do not appear in this array. Each dispersity
[6fd8de0]	238	parameter has a weight vector whose length is stored in pd_length[n].
[d5ac45f]	239	The weights are stored in a contiguous vector of weights for all
	240	parameters, with the starting position for the each parameter stored
	241	in pd_offset[n]. The values corresponding to the weights are stored
	242	together in a separate weights[] vector, with offset stored in
[9ee2756]	243	par_offset[pd_par[n]]. Dispersity parameters should be stored in
[d5ac45f]	244	decreasing order of length for highest efficiency.
	245
[9ee2756]	246	We limit the number of dispersity dimensions to MAX_PD (currently 4),
	247	though some models may have fewer if they have fewer dispersity
[e822c97]	248	parameters. The main reason for the limit is to reduce code size.
[9ee2756]	249	Each additional dispersity parameter requires a separate dispersity
	250	loop. If more than 4 levels of dispersity are needed, then we need to
	251	switch to a monte carlo importance sampling algorithm with better
	252	performance for high-dimensional integrals.
[e822c97]	253
	254	Constraints between parameters are not supported. Instead users will
	255	have to define a new model with the constraints built in by making a
	256	copy of the existing model. Mac provides OpenCL and we are supplying
	257	the tinycc compiler for Windows so this won't be a complete limitation,
	258	but it is rather inconvenient. The process could perhaps be automated
	259	so that there is no code copying required, just an alternate CALL_IQ
	260	macro that implements the constraints. Think carefully about constraining
	261	theta since the polar coordinates normalization is tied to this parameter.
[d5ac45f]	262
[9ee2756]	263	If there is no dispersity we pretend that we have a disperisty mesh over
	264	a single parameter with a single point in the distribution, giving
	265	pd_start=0 and pd_stop=1.
[d5ac45f]	266
[e822c97]	267	The problem details structure could be allocated and sent in as an integer
	268	array using the read-only flag. This would allow us to copy it once per fit
[d5ac45f]	269	along with the weights vector, since features such as the number of
[9ee2756]	270	disperity points per pd parameter won't change between function evaluations.
	271	A new parameter vector must be sent for each I(q) evaluation. This is
	272	not currently implemented, and would require some resturcturing of
	273	the :class:`sasview_model.SasviewModel` interface.
[d5ac45f]	274
[9ee2756]	275	The results array will be initialized to zero for dispersity loop
[d5ac45f]	276	entry zero, and preserved between calls to [start, stop] so that the
	277	results accumulate by the time the loop has completed. Background and
	278	scale will be applied when the loop reaches the end. This does require
	279	that the results array be allocated read-write, which is less efficient
	280	for the GPU, but it makes the calling sequence much more manageable.
	281
[03cac08]	282	For accuracy we may want to introduce Kahan summation into the integration::
	283
	284	double accumulated_error = 0.0;
	285	...
	286	#if USE_KAHAN_SUMMATION
	287	const double y = next - accumulated_error;
	288	const double t = ret + y;
	289	accumulated_error = (t - ret) - y;
	290	ret = t;
	291	#else
	292	ret += next;
	293	#endif
[e822c97]	294
	295	This will require accumulated error for each I(q) value to be preserved
[9ee2756]	296	between kernel calls to implement this fully. The kernel_iq.c code, which
	297	loops over q for each parameter set in the dispersity loop, will also need
	298	the accumulation vector.

Note: See TracBrowser for help on using the repository browser.

SasView

source: sasmodels/doc/developer/calculator.rst @ 9ee2756

Download in other formats: