kernel_iq.c @ a10da8b

core_shell_microgelscostrafo411magnetic_modelrelease_v0.94release_v0.95ticket-1257-vesicle-productticket_1156ticket_1265_superballticket_822_more_unit_tests

Last change on this file since a10da8b was a10da8b, checked in by wojciech, 8 years ago
Bug fix in bit mask
Property mode set to `100644`
File size: 11.3 KB

Line
1
2	/*
3	##########################################################
4	# #
5	# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! #
6	# !! !! #
7	# !! KEEP THIS CODE CONSISTENT WITH KERNELPY.PY !! #
8	# !! !! #
9	# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! #
10	# #
11	##########################################################
12	*/
13
14	/*
15	The environment needs to provide the following #defines
16	USE_OPENCL is defined if running in opencl
17	KERNEL declares a function to be available externally
18	KERNEL_NAME is the name of the function being declared
19	NPARS is the number of parameters in the kernel
20	PARAMETER_TABLE is the declaration of the parameters to the kernel.
21
22	Cylinder:
23
24	#define PARAMETER_TABLE \
25	double length; \
26	double radius; \
27	double sld; \
28	double sld_solvent
29
30	Multi-shell cylinder (10 shell max):
31
32	#define PARAMETER_DECL \
33	double num_shells; \
34	double length; \
35	double radius[10]; \
36	double sld[10]; \
37	double sld_solvent
38
39	PARAMETER_CALL(var) is the declaration of a call to the kernel.
40
41	Cylinder:
42
43	#define PARAMETER_CALL(var) \
44	var.length, \
45	var.radius, \
46	var.sld, \
47	var.sld_solvent
48
49	Multi-shell cylinder:
50	#define PARAMETER_CALL(var) \
51	var.num_shells, \
52	var.length, \
53	var.radius, \
54	var.sld, \
55	var.sld_solvent
56
57	Our design supports a limited number of polydispersity loops, wherein
58	we need to cycle through the values of the polydispersity, calculate
59	the I(q, p) for each combination of parameters, and perform a normalized
60	weighted sum across all the weights. Parameters may be passed to the
61	underlying calculation engine as scalars or vectors, but the polydispersity
62	calculator treats the parameter set as one long vector.
63
64	The polydisperse parameters are stored in as an array of parameter
65	indices, one for each polydisperse parameter, stored in pd_par[n].
66	Non-polydisperse parameters do not appear in this array. Each polydisperse
67	parameter has a weight vector whose length is stored in pd_length[n].
68	The weights are stored in a contiguous vector of weights for all
69	parameters, with the starting position for the each parameter stored
70	in pd_offset[n]. The values corresponding to the weights are stored
71	together in a separate weights[] vector, with offset stored in
72	par_offset[pd_par[n]]. Polydisperse parameters should be stored in
73	decreasing order of length for highest efficiency.
74
75	We limit the number of polydisperse dimensions to MAX_PD (currently 4).
76	This cuts the size of the structure in half compared to allowing a
77	separate polydispersity for each parameter. This will help a little
78	bit for models with large numbers of parameters, such as the onion model.
79
80	Parameters may be coordinated. That is, we may have the value of one
81	parameter depend on a set of other parameters, some of which may be
82	polydisperse. For example, if sld is inversely proportional to the
83	volume of a cylinder, and the length and radius are independently
84	polydisperse, then for each combination of length and radius we need a
85	separate value for the sld. The caller must provide a coordination table
86	for each parameter containing the value for each parameter given the
87	value of the polydisperse parameters v1, v2, etc. The tables for each
88	parameter are arranged contiguously in a vector, with offset[k] giving the
89	starting location of parameter k in the vector. Each parameter defines
90	coord[k] as a bit mask indicating which polydispersity parameters the
91	parameter depends upon. Usually this is zero, indicating that the parameter
92	is independent, but for the cylinder example given, the bits for the
93	radius and length polydispersity parameters would both be set, the result
94	being a (#radius x #length) table, or maybe a (#length x #radius) table
95	if length comes first in the polydispersity table.
96
97	NB: If we can guarantee that a compiler and OpenCL driver are available,
98	we could instead create the coordination function on the fly for each
99	parameter, saving memory and transfer time, but requiring a C compiler
100	as part of the environment.
101
102	In ordering the polydisperse parameters by decreasing length we can
103	iterate over the longest dispersion weight vector first. All parameters
104	coordinated with this weight vector (the 'fast' parameters), can be
105	updated with a simple increment to the next position in the parameter
106	value table. The indices of these parameters is stored in fast_coord_index[],
107	with fast_coord_count being the number of fast parameters. A total
108	of NPARS slots is allocated to allow for the case that all parameters
109	are coordinated with the fast index, though this will likely be mostly
110	empty. When the fast increment count reaches the end of the weight
111	vector, then the index of the second polydisperse parameter must be
112	incremented, and all of its coordinated parameters updated. Because this
113	operation is not in the inner loop, a slower algorithm can be used.
114
115	The problem details structure can be allocated and sent in as an integer
116	array using the read-only flag. This allows us to copy it once per fit
117	along with the weights vector, since features such as the number of
118	polydisperity elements per pd parameter or the coordinated won't change
119	between function evaluations. A new parameter vector is sent for
120	each I(q) evaluation.
121
122	To protect against expensive evaluations taking all the GPU resource
123	on large fits, the entire polydispersity will not be computed at once.
124	Instead, a start and stop location will be sent, indicating where in the
125	polydispersity loop the calculation should start and where it should
126	stop. We can do this for arbitrary start/stop points since we have
127	unwound the nested loop. Instead, we use the same technique as array
128	index translation, using div and mod to figure out the i,j,k,...
129	indices in the virtual nested loop.
130
131	The results array will be initialized to zero for polydispersity loop
132	entry zero, and preserved between calls to [start, stop] so that the
133	results accumulate by the time the loop has completed. Background and
134	scale will be applied when the loop reaches the end. This does require
135	that the results array be allocated read-write, which is less efficient
136	for the GPU, but it makes the calling sequence much more manageable.
137	*/
138
139	#define MAX_PD 4 // MAX_PD is the max number of polydisperse parameters
140	#define PD_2N 16 // PD_2N is the size of the coordination step table
141
142	typedef struct {
143	int pd_par[MAX_PD]; // index of the nth polydispersity variable
144	int pd_length[MAX_PD]; // length of the nth polydispersity weight vector
145	int pd_offset[MAX_PD]; // offset of pd weights in the par & weight vector
146	int pd_stride[MAX_PD]; // stride to move to the next index at this level
147	int par_offset[NPARS]; // offset of par values in the par & weight vector
148	int par_coord[NPARS]; // polydispersity coordination bitvector
149	int fast_coord_count; // number of parameters coordinated with pd 1
150	int fast_coord_index[NPARS]; // index of the fast coordination parameters
151	} ProblemDetails;
152
153	typedef struct {
154	PARAMETER_DECL;
155	} ParameterBlock;
156
157
158	KERNEL
159	void KERNEL_NAME(
160	int nq, // number of q values
161	global const ProblemDetails *problem,
162	global const double *weights,
163	global const double *pars,
164	global const double *q, // nq q values, with padding to boundary
165	global double *result, // nq return values, again with padding
166	global double work, // 3(nq+padding) space for normalization
167	global int *int_work, // nq+padding space for current position
168	const double cutoff, // cutoff in the polydispersity weight product
169	const int pd_start, // where we are in the polydispersity loop
170	const int pd_stop, // where we are stopping in the polydispersity loop
171	)
172	{
173	// Storage for the current parameter values. These will be updated as we
174	// walk the polydispersity cube.
175	local ParameterBlock local_pars;
176	const double *parvec = &local_pars; // Alias named parameters with a vector
177
178	// Since we are no longer looping over the entire polydispersity hypercube
179	// for each q, we need to track the normalization values for each q in a
180	// separate work vector.
181	double *norm = work; // contains sum over weights
182	double *vol = norm + (nq + padding); // contains sum over volume
183	double *norm_vol = vol + (nq + padding);
184
185	// Initialize the results to zero
186	if (pd_start == 0) {
187	#ifdef USE_OPENMP
188	#pragma omp parallel for
189	#endif
190	for (int i=0; i < Nq; i++) {
191	norm_vol[i] = 0.0;
192	norm[i] = 0.0;
193	vol[i] = 0.0;
194	result[i] = 0.0;
195	}
196	}
197
198	// Location in the polydispersity cube, one index per dimension.
199	local int pd_index[PD_MAX];
200
201	// Set the initial index greater than its vector length in order to
202	// trigger the reset behaviour that happens at the end the fast loop.
203	pd_index[0] = pd_length[0];
204
205	// Loop over the weights then loop over q, accumulating values
206	// par
207	double partial_weight = NaN;
208	for (int loop_index=pd_start; loop_index < pd_stop; loop_index++) {
209	// check if indices need to be updated
210	if (pd_index[0] >= pd_length[0]) {
211	pd_index[0] = loop_index%pd_length[0];
212	partial_weight = 1.0;
213	for (int k=0; k < MAX_PD; k++) {
214	pd_index[k] = (loop_index%pd_length[k])/pd_stride[k];
215	partial_weight *= weights[pd_offset[k]+pd_index[k]];
216	}
217	weight = partial_weight * weights[pd_offset[0]+pd_index[0]]
218	for (int k=0; k < NPARS; k++) {
219	int coord = par_coord[k];
220	int this_offset = 0;
221	int block_size = 1;
222	for (int bit=0; bit < MAX_PD && coord != 0; bit++) {
223	if (coord&1) {
224	this_offset += block_size * pd_index[bit];
225	block_size *= pd_length[bit];
226	}
227	coord /= 2;
228	}
229	offset[k] = this_offset;
230	}
231	} else {
232	pd_index[0] += 1;
233	weight = partial_weight*weights[pd_offset[0]+pd_index[0]];
234	for (int k=0; k < problem->fast_coord_count; k++) {
235	parvec[ fast_coord_index[k]]
236	= pars[offset[fast_coord_index[k]] + pd_index[0]];
237	}
238	}
239	if (weight > cutoff) {
240	const double vol_weight = VOLUME_WEIGHT_PRODUCT;
241	const double weighted_vol = vol_weight*form_volume(VOLUME_PARAMTERS);
242	#ifdef USE_OPENMP
243	#pragma omp parallel for
244	#endif
245	for (int i=0; i < Nq; i++) {
246	{
247	const double scattering = Iq(qi, IQ_PARAMETERS);
248	// allow kernels to exclude invalid regions by returning NaN
249	if (!isnan(scattering)) {
250	result[i] += weight*scattering;
251	// can almost get away with only having one constant rather than
252	// one constant per q. Maybe want a "is_valid" test?
253	norm[i] += weight;
254	vol[i] += weighted_vol;
255	norm_vol[i] += vol_weight;
256	}
257	}
258	}
259
260	if (pd_stop == pd_stride[MAX_PD-1]) {}
261	#ifdef USE_OPENMP
262	#pragma omp parallel for
263	#endif
264	for (int i=0; i < Nq; i++) {
265	if (vol[i]*norm_vol[i] != 0.0) {
266	result[i] *= norm_vol[i]/vol[i];
267	}
268	//TODO: Ask Richard if scale and background may be corridanted parameters
269	result[i] = scale*result[i]/norm[i]+background;
270	}
271	}
272	}
273	}

Note: See TracBrowser for help on using the repository browser.

SasView

source: sasmodels/sasmodels/kernel_iq.c @ a10da8b

Download in other formats: