source: sasmodels/doc/developer/calculator.rst @ 9ee2756

core_shell_microgelsmagnetic_modelticket-1257-vesicle-productticket_1156ticket_1265_superballticket_822_more_unit_tests
Last change on this file since 9ee2756 was 9ee2756, checked in by Paul Kienzle <pkienzle@…>, 7 years ago

simplify kernel wrapper code and combine OpenCL with DLL in one file

  • Property mode set to 100644
File size: 13.4 KB
RevLine 
[e822c97]1.. currentmodule:: sasmodels
2
[870a2f4]3.. _Calculator_Interface:
4
[d5ac45f]5Calculator Interface
6====================
7
[e822c97]8This document describes the layer between the form factor kernels and the
[9ee2756]9model calculator which implements the dispersity and magnetic SLD
[e822c97]10calculations.  There are three separate implementations of this layer,
[2e66ef5]11:mod:`kernelcl` for OpenCL, which operates on a single Q value at a time,
12:mod:`kerneldll` for the DLL, which loops over a vector of Q values, and
13:mod:`kernelpy` for python models which operates on vector Q values.
[e822c97]14
15Each implementation provides three different calls *Iq*, *Iqxy* and *Imagnetic*
[9ee2756]16for 1-D, 2-D and 2-D magnetic kernels respectively. The C code is defined
17in *kernel_iq.c*, with the minor differences between OpenCL and DLL handled
18by #ifdef statements.
[e822c97]19
[9ee2756]20The kernel call looks as follows::
[e822c97]21
22  kernel void KERNEL_NAME(
23      int nq,                  // Number of q values in the q vector
[9ee2756]24      int pd_start,            // Starting position in the dispersity loop
25      int pd_stop,             // Ending position in the dispersity loop
26      ProblemDetails *details, // dispersity info
[e822c97]27      double *values,          // Value and weights vector
28      double *q,               // q or (qx,qy) vector
29      double *result,          // returned I(q), with result[nq] = pd_weight
[9ee2756]30      double cutoff)           // dispersity weight cutoff
[e822c97]31
32The details for OpenCL and the python loop are slightly different, but these
33data structures are common.
34
35*nq* indicates the number of q values that will be calculated.
36
[9ee2756]37The *pd_start* and *pd_stop* parameters set the range of the dispersity
38loop to compute for the current kernel call.   Give a dispersity
[e822c97]39calculation with 30 weights for length and 30 weights for radius for example,
40there are a total of 900 calls to the form factor required to compute the
41kernel.  These might be done in chunks of 100, so the first call would start
42at zero and stop after 100 iterations.  The second call would then have to set
43the length index to 3 and the radius index to 10 for a position of 3*30+10=100,
44and could then proceed to position 200.  This allows us to interrupt the
[9ee2756]45calculation in the middle of a long dispersity loop without having to
[e822c97]46do special tricks with the C code.  More importantly, it stops the OpenCL
47kernel in a reasonable time; because the GPU is used by the operating
48system to show its windows, if a GPU kernel runs too long then it will be
49automatically killed and no results will be returned to the caller.
50
51The *ProblemDetails* structure is a direct map of the
[9ee2756]52:class:`details.CallDetails` buffer.  This indicates which parameters have
53dispersity, and where in the values vector the values and weights can be
54found.  For each parameter with dispersity there is a parameter id, the length
55of the dispersity loop for that parameter, the offset of the parameter
[e822c97]56values in the pd value and pd weight vectors and the 'stride' from one index
57to the next, which is used to translate between the position in the
[9ee2756]58dispersity loop and the particular parameter indices.  The *num_eval*
59field is the total size of the dispersity loop.  *num_weights* is the
[e822c97]60number of elements in the pd value and pd weight vectors.  *num_active* is
[9ee2756]61the number of non-trivial pd loops (parameters with dispersity should be ordered
62by decreasing pd vector length, with a length of 1 meaning no dispersity).
[e822c97]63Oriented objects in 2-D need a cos(theta) spherical correction on the angular
64variation in order to preserve the 'surface area' of the weight distribution.
65*theta_par* is the id of the polar coordinate parameter if there is one.
66
67
68The *values* vector consists of the fixed values for the model plus pd value
69and pd weight vectors.  There are *NUM_VALUES* fixed values for the model,
70which includes the initial two slots for scale and background, the NUM_PARS
71values for the kernel parameters, three slots for the applied magnetism, and
72three slots for each of the SLD parameters for the sample magnetiziation
73*(Mx, My, Mz)*.  Sample magnetization is translated from *(M, theta, phi)*
74to *(Mx, My, Mz)* before the kernel is called.   After the fixed values comes
[9ee2756]75the pd value vector, with the dispersity values for each parameter
[e822c97]76stacked one after the other.  The order isn't important since the location
77for each parameter is stored in the *pd_offset* field of the *ProblemDetails*
78structure, but the values do need to be contiguous.  After *num_weights*
79values, the pd weight vector is stored, with the same configuration as the
80pd value vector.  Note that the pd vectors can contain values that are not
[9ee2756]81in the dispersity loop; this is used by :class:`mixture.MixtureKernel`
[e822c97]82to make it easier to call the various component kernels.
83
84The *q* vector contains one value for each q for *Iq* kernels, or a pair
85of *(qx,qy)* values for each q for *Iqxy* and *Imagnetic* kernels.  OpenCL
86pads all vectors to 32 value boundaries just to be safe, including the
87*values* vector and the the *results* vector.
88
89The *results* vector contains one slot for each of the *nq* values, plus
[9ee2756]90one extra slot at the end for the weight normalization accumulated across
91all points in the dispersity mesh.  This is required when the dispersity
92loop is broken across several kernel calls.
[e822c97]93
94*cutoff* is a importance cutoff so that points which contribute negligibly
95to the total scattering can be skipped without calculating them.
96
97:func:`generate.make_source` defines the following C macros:
[d5ac45f]98
99- USE_OPENCL is defined if running in opencl
[9ee2756]100- MAX_PD is the maximum depth of the dispersity loop [model specific]
[e822c97]101- NUM_PARS is the number of parameter values in the kernel.  This may be
102  more than the number of parameters if some of the parameters are vector
103  values.
104- NUM_VALUES is the number of fixed values, which defines the offset in the
[9ee2756]105  value list to the dispersity value and weight vectors.
[e822c97]106- NUM_MAGNETIC is the number of magnetic SLDs
107- MAGNETIC_PARS is a comma separated list of the magnetic SLDs, indicating
108  their locations in the values vector.
[9ee2756]109- MAGNETIC_PAR1, ... are the first three magnetic parameter ids so we can
110  hard code the setting of magnetic values if there are only a few of them.
[e822c97]111- KERNEL_NAME is the name of the function being declared
112- PARAMETER_TABLE is the declaration of the parameters to the kernel:
[d5ac45f]113
[e822c97]114    Cylinder::
[d5ac45f]115
116        #define PARAMETER_TABLE \
117        double length; \
118        double radius; \
119        double sld; \
[e822c97]120        double sld_solvent;
[d5ac45f]121
122    Note: scale and background are never included
123
[e822c97]124    Multi-shell cylinder (10 shell max)::
[d5ac45f]125
126        #define PARAMETER_TABLE \
127        double num_shells; \
128        double length; \
129        double radius[10]; \
130        double sld[10]; \
[e822c97]131        double sld_solvent;
[d5ac45f]132
[e822c97]133- CALL_IQ(q, i, var) is the declaration of a call to the kernel:
[d5ac45f]134
[e822c97]135    Cylinder::
[d5ac45f]136
[03cac08]137        #define CALL_IQ(q, i, var) Iq(q[i], \
[d5ac45f]138        var.length, \
139        var.radius, \
140        var.sld, \
141        var.sld_solvent)
142
[e822c97]143    Multi-shell cylinder::
[d5ac45f]144
[03cac08]145        #define CALL_IQ(q, i, var) Iq(q[i], \
[d5ac45f]146        var.num_shells, \
147        var.length, \
148        var.radius, \
149        var.sld, \
150        var.sld_solvent)
151
[e822c97]152    Cylinder2D::
[03cac08]153
[9ee2756]154        #define CALL_IQ(q, i, var) Iqxy(qa, qc, \
[03cac08]155        var.length, \
156        var.radius, \
157        var.sld, \
[9ee2756]158        var.sld_solvent)
[03cac08]159
[d5ac45f]160- CALL_VOLUME(var) is similar, but for calling the form volume::
161
162        #define CALL_VOLUME(var) \
163        form_volume(var.length, var.radius)
164
[e822c97]165There is an additional macro that can be defined within the model.c file:
[d5ac45f]166
[e822c97]167- INVALID(var) is a test for model parameters in the correct range:
168
169    Cylinder::
[d5ac45f]170
171        #define INVALID(var) 0
172
[e822c97]173    BarBell::
[d5ac45f]174
[6fd8de0]175        #define INVALID(var) (var.bell_radius < var.radius)
[d5ac45f]176
[e822c97]177    Model with complicated constraints::
[d5ac45f]178
179        inline bool constrained(p1, p2, p3) { return expression; }
180        #define INVALID(var) constrained(var.p1, var.p2, var.p3)
181
[9ee2756]182Our design supports a limited number of dispersity loops, wherein
183we need to cycle through the values of the dispersity, calculate
[d5ac45f]184the I(q, p) for each combination of parameters, and perform a normalized
185weighted sum across all the weights.  Parameters may be passed to the
[9ee2756]186underlying calculation engine as scalars or vectors, but the dispersity
[d5ac45f]187calculator treats the parameter set as one long vector.
188
[9ee2756]189Let's assume we have 8 parameters in the model, two of which allow dispersity.
190Since this is a 1-D model the orientation parameters won't be used::
[d5ac45f]191
192    0: scale        {scl = constant}
193    1: background   {bkg = constant}
[e822c97]194    2: radius       {r = vector of 10pts}
195    3: length       {l = vector of 30pts}
[9ee2756]196    4: sld          {s1 = constant/(radius**2*length)}
[6fd8de0]197    5: sld_solvent  {s2 = constant}
[e822c97]198    6: theta        {not used}
199    7: phi          {not used}
200
201This generates the following call to the kernel.  Note that parameters 4 and
[9ee2756]2025 are treated as having dispersity even though they don't --- this is because
[e822c97]203it is harmless to do so and simplifies the looping code::
204
205    MAX_PD = 4
206    NUM_PARS = 6          // kernel parameters only
207    NUM_VALUES = 17       // all values, including scale and background
208    NUM_MAGNETIC = 2      // two parameters might be magnetic
209    MAGNETIC_PARS = 4, 5  // they are sld and sld_solvent
210    MAGNETIC_PAR0 = 4     // sld index
211    MAGNETIC_PAR1 = 5     // solvent index
212
213    details {
214        pd_par = {3, 2, 4, 5}         // parameters *radius* and *length* vary
215        pd_length = {30, 10, 1, 1}    // *length* has more, so it is first
216        pd_offset = {10, 0, 31, 32}   // *length* starts at index 10 in weights
[d5ac45f]217        pd_stride = {1, 30, 300, 300} // cumulative product of pd length
[9ee2756]218        num_eval = 300   // 300 values in the dispersity loop
[e822c97]219        num_weights = 42 // 42 values in the pd vector
220        num_active = 2   // only the first two pd are active
221        theta_var =  6   // spherical correction
[d5ac45f]222    }
223
[e822c97]224    values = { scl, bkg,                                  // universal
[9ee2756]225               r, l, s1, s2, theta, phi,                  // kernel pars
[e822c97]226               in spin, out spin, spin angle,             // applied magnetism
[9ee2756]227               mx s1, my s1, mz s1, mx s2, my s2, mz s2,  // magnetic slds
[e822c97]228               r0, .., r9, l0, .., l29, s, s2,            // pd values
229               r0, .., r9, l0, .., l29, s, s2}            // pd weights
[d5ac45f]230
231    nq = 130
232    q = { q0, q1, ..., q130, x, x }  # pad to 8 element boundary
[e822c97]233    result = {r1, ..., r130, pd_norm, x }
[d5ac45f]234
[9ee2756]235The dispersity parameters are stored in as an array of parameter
236indices, one for each parameter, stored in pd_par[n]. Parameters which do
237not support dispersity do not appear in this array. Each dispersity
[6fd8de0]238parameter has a weight vector whose length is stored in pd_length[n].
[d5ac45f]239The weights are stored in a contiguous vector of weights for all
240parameters, with the starting position for the each parameter stored
241in pd_offset[n].  The values corresponding to the weights are stored
242together in a separate weights[] vector, with offset stored in
[9ee2756]243par_offset[pd_par[n]]. Dispersity parameters should be stored in
[d5ac45f]244decreasing order of length for highest efficiency.
245
[9ee2756]246We limit the number of dispersity dimensions to MAX_PD (currently 4),
247though some models may have fewer if they have fewer dispersity
[e822c97]248parameters.  The main reason for the limit is to reduce code size.
[9ee2756]249Each additional dispersity parameter requires a separate dispersity
250loop.  If more than 4 levels of dispersity are needed, then we need to
251switch to a monte carlo importance sampling algorithm with better
252performance for high-dimensional integrals.
[e822c97]253
254Constraints between parameters are not supported.  Instead users will
255have to define a new model with the constraints built in by making a
256copy of the existing model.  Mac provides OpenCL and we are supplying
257the tinycc compiler for Windows so this won't be a complete limitation,
258but it is rather inconvenient.  The process could perhaps be automated
259so that there is no code copying required, just an alternate CALL_IQ
260macro that implements the constraints.  Think carefully about constraining
261theta since the polar coordinates normalization is tied to this parameter.
[d5ac45f]262
[9ee2756]263If there is no dispersity we pretend that we have a disperisty mesh over
264a single parameter with a single point in the distribution, giving
265pd_start=0 and pd_stop=1.
[d5ac45f]266
[e822c97]267The problem details structure could be allocated and sent in as an integer
268array using the read-only flag.  This would allow us to copy it once per fit
[d5ac45f]269along with the weights vector, since features such as the number of
[9ee2756]270disperity points per pd parameter won't change between function evaluations.
271A new parameter vector must be sent for each I(q) evaluation.  This is
272not currently implemented, and would require some resturcturing of
273the :class:`sasview_model.SasviewModel` interface.
[d5ac45f]274
[9ee2756]275The results array will be initialized to zero for dispersity loop
[d5ac45f]276entry zero, and preserved between calls to [start, stop] so that the
277results accumulate by the time the loop has completed.  Background and
278scale will be applied when the loop reaches the end.  This does require
279that the results array be allocated read-write, which is less efficient
280for the GPU, but it makes the calling sequence much more manageable.
281
[03cac08]282For accuracy we may want to introduce Kahan summation into the integration::
283
284    double accumulated_error = 0.0;
285    ...
286    #if USE_KAHAN_SUMMATION
287        const double y = next - accumulated_error;
288        const double t = ret + y;
289        accumulated_error = (t - ret) - y;
290        ret = t;
291    #else
292        ret += next;
293    #endif
[e822c97]294
295This will require accumulated error for each I(q) value to be preserved
[9ee2756]296between kernel calls to implement this fully.  The *kernel_iq.c* code, which
297loops over q for each parameter set in the dispersity loop, will also need
298the accumulation vector.
Note: See TracBrowser for help on using the repository browser.