[e822c97] | 1 | .. currentmodule:: sasmodels |
---|
| 2 | |
---|
[870a2f4] | 3 | .. _Calculator_Interface: |
---|
| 4 | |
---|
[d5ac45f] | 5 | Calculator Interface |
---|
| 6 | ==================== |
---|
| 7 | |
---|
[e822c97] | 8 | This document describes the layer between the form factor kernels and the |
---|
[9ee2756] | 9 | model calculator which implements the dispersity and magnetic SLD |
---|
[e822c97] | 10 | calculations. There are three separate implementations of this layer, |
---|
[2e66ef5] | 11 | :mod:`kernelcl` for OpenCL, which operates on a single Q value at a time, |
---|
| 12 | :mod:`kerneldll` for the DLL, which loops over a vector of Q values, and |
---|
| 13 | :mod:`kernelpy` for python models which operates on vector Q values. |
---|
[e822c97] | 14 | |
---|
| 15 | Each implementation provides three different calls *Iq*, *Iqxy* and *Imagnetic* |
---|
[9ee2756] | 16 | for 1-D, 2-D and 2-D magnetic kernels respectively. The C code is defined |
---|
| 17 | in *kernel_iq.c*, with the minor differences between OpenCL and DLL handled |
---|
| 18 | by #ifdef statements. |
---|
[e822c97] | 19 | |
---|
[9ee2756] | 20 | The kernel call looks as follows:: |
---|
[e822c97] | 21 | |
---|
| 22 | kernel void KERNEL_NAME( |
---|
| 23 | int nq, // Number of q values in the q vector |
---|
[9ee2756] | 24 | int pd_start, // Starting position in the dispersity loop |
---|
| 25 | int pd_stop, // Ending position in the dispersity loop |
---|
| 26 | ProblemDetails *details, // dispersity info |
---|
[e822c97] | 27 | double *values, // Value and weights vector |
---|
| 28 | double *q, // q or (qx,qy) vector |
---|
| 29 | double *result, // returned I(q), with result[nq] = pd_weight |
---|
[9ee2756] | 30 | double cutoff) // dispersity weight cutoff |
---|
[e822c97] | 31 | |
---|
| 32 | The details for OpenCL and the python loop are slightly different, but these |
---|
| 33 | data structures are common. |
---|
| 34 | |
---|
| 35 | *nq* indicates the number of q values that will be calculated. |
---|
| 36 | |
---|
[9ee2756] | 37 | The *pd_start* and *pd_stop* parameters set the range of the dispersity |
---|
| 38 | loop to compute for the current kernel call. Give a dispersity |
---|
[e822c97] | 39 | calculation with 30 weights for length and 30 weights for radius for example, |
---|
| 40 | there are a total of 900 calls to the form factor required to compute the |
---|
| 41 | kernel. These might be done in chunks of 100, so the first call would start |
---|
| 42 | at zero and stop after 100 iterations. The second call would then have to set |
---|
| 43 | the length index to 3 and the radius index to 10 for a position of 3*30+10=100, |
---|
| 44 | and could then proceed to position 200. This allows us to interrupt the |
---|
[9ee2756] | 45 | calculation in the middle of a long dispersity loop without having to |
---|
[e822c97] | 46 | do special tricks with the C code. More importantly, it stops the OpenCL |
---|
| 47 | kernel in a reasonable time; because the GPU is used by the operating |
---|
| 48 | system to show its windows, if a GPU kernel runs too long then it will be |
---|
| 49 | automatically killed and no results will be returned to the caller. |
---|
| 50 | |
---|
| 51 | The *ProblemDetails* structure is a direct map of the |
---|
[9ee2756] | 52 | :class:`details.CallDetails` buffer. This indicates which parameters have |
---|
| 53 | dispersity, and where in the values vector the values and weights can be |
---|
| 54 | found. For each parameter with dispersity there is a parameter id, the length |
---|
| 55 | of the dispersity loop for that parameter, the offset of the parameter |
---|
[e822c97] | 56 | values in the pd value and pd weight vectors and the 'stride' from one index |
---|
| 57 | to the next, which is used to translate between the position in the |
---|
[9ee2756] | 58 | dispersity loop and the particular parameter indices. The *num_eval* |
---|
| 59 | field is the total size of the dispersity loop. *num_weights* is the |
---|
[e822c97] | 60 | number of elements in the pd value and pd weight vectors. *num_active* is |
---|
[9ee2756] | 61 | the number of non-trivial pd loops (parameters with dispersity should be ordered |
---|
| 62 | by decreasing pd vector length, with a length of 1 meaning no dispersity). |
---|
[e822c97] | 63 | Oriented objects in 2-D need a cos(theta) spherical correction on the angular |
---|
| 64 | variation in order to preserve the 'surface area' of the weight distribution. |
---|
| 65 | *theta_par* is the id of the polar coordinate parameter if there is one. |
---|
| 66 | |
---|
| 67 | |
---|
| 68 | The *values* vector consists of the fixed values for the model plus pd value |
---|
| 69 | and pd weight vectors. There are *NUM_VALUES* fixed values for the model, |
---|
| 70 | which includes the initial two slots for scale and background, the NUM_PARS |
---|
| 71 | values for the kernel parameters, three slots for the applied magnetism, and |
---|
| 72 | three slots for each of the SLD parameters for the sample magnetiziation |
---|
| 73 | *(Mx, My, Mz)*. Sample magnetization is translated from *(M, theta, phi)* |
---|
| 74 | to *(Mx, My, Mz)* before the kernel is called. After the fixed values comes |
---|
[9ee2756] | 75 | the pd value vector, with the dispersity values for each parameter |
---|
[e822c97] | 76 | stacked one after the other. The order isn't important since the location |
---|
| 77 | for each parameter is stored in the *pd_offset* field of the *ProblemDetails* |
---|
| 78 | structure, but the values do need to be contiguous. After *num_weights* |
---|
| 79 | values, the pd weight vector is stored, with the same configuration as the |
---|
| 80 | pd value vector. Note that the pd vectors can contain values that are not |
---|
[9ee2756] | 81 | in the dispersity loop; this is used by :class:`mixture.MixtureKernel` |
---|
[e822c97] | 82 | to make it easier to call the various component kernels. |
---|
| 83 | |
---|
| 84 | The *q* vector contains one value for each q for *Iq* kernels, or a pair |
---|
| 85 | of *(qx,qy)* values for each q for *Iqxy* and *Imagnetic* kernels. OpenCL |
---|
| 86 | pads all vectors to 32 value boundaries just to be safe, including the |
---|
| 87 | *values* vector and the the *results* vector. |
---|
| 88 | |
---|
| 89 | The *results* vector contains one slot for each of the *nq* values, plus |
---|
[9ee2756] | 90 | one extra slot at the end for the weight normalization accumulated across |
---|
| 91 | all points in the dispersity mesh. This is required when the dispersity |
---|
| 92 | loop is broken across several kernel calls. |
---|
[e822c97] | 93 | |
---|
| 94 | *cutoff* is a importance cutoff so that points which contribute negligibly |
---|
| 95 | to the total scattering can be skipped without calculating them. |
---|
| 96 | |
---|
| 97 | :func:`generate.make_source` defines the following C macros: |
---|
[d5ac45f] | 98 | |
---|
| 99 | - USE_OPENCL is defined if running in opencl |
---|
[9ee2756] | 100 | - MAX_PD is the maximum depth of the dispersity loop [model specific] |
---|
[e822c97] | 101 | - NUM_PARS is the number of parameter values in the kernel. This may be |
---|
| 102 | more than the number of parameters if some of the parameters are vector |
---|
| 103 | values. |
---|
| 104 | - NUM_VALUES is the number of fixed values, which defines the offset in the |
---|
[9ee2756] | 105 | value list to the dispersity value and weight vectors. |
---|
[e822c97] | 106 | - NUM_MAGNETIC is the number of magnetic SLDs |
---|
| 107 | - MAGNETIC_PARS is a comma separated list of the magnetic SLDs, indicating |
---|
| 108 | their locations in the values vector. |
---|
| 109 | - KERNEL_NAME is the name of the function being declared |
---|
| 110 | - PARAMETER_TABLE is the declaration of the parameters to the kernel: |
---|
[d5ac45f] | 111 | |
---|
[e822c97] | 112 | Cylinder:: |
---|
[d5ac45f] | 113 | |
---|
| 114 | #define PARAMETER_TABLE \ |
---|
| 115 | double length; \ |
---|
| 116 | double radius; \ |
---|
| 117 | double sld; \ |
---|
[e822c97] | 118 | double sld_solvent; |
---|
[d5ac45f] | 119 | |
---|
| 120 | Note: scale and background are never included |
---|
| 121 | |
---|
[e822c97] | 122 | Multi-shell cylinder (10 shell max):: |
---|
[d5ac45f] | 123 | |
---|
| 124 | #define PARAMETER_TABLE \ |
---|
| 125 | double num_shells; \ |
---|
| 126 | double length; \ |
---|
| 127 | double radius[10]; \ |
---|
| 128 | double sld[10]; \ |
---|
[e822c97] | 129 | double sld_solvent; |
---|
[d5ac45f] | 130 | |
---|
[e822c97] | 131 | - CALL_IQ(q, i, var) is the declaration of a call to the kernel: |
---|
[d5ac45f] | 132 | |
---|
[e822c97] | 133 | Cylinder:: |
---|
[d5ac45f] | 134 | |
---|
[03cac08] | 135 | #define CALL_IQ(q, i, var) Iq(q[i], \ |
---|
[d5ac45f] | 136 | var.length, \ |
---|
| 137 | var.radius, \ |
---|
| 138 | var.sld, \ |
---|
| 139 | var.sld_solvent) |
---|
| 140 | |
---|
[e822c97] | 141 | Multi-shell cylinder:: |
---|
[d5ac45f] | 142 | |
---|
[03cac08] | 143 | #define CALL_IQ(q, i, var) Iq(q[i], \ |
---|
[d5ac45f] | 144 | var.num_shells, \ |
---|
| 145 | var.length, \ |
---|
| 146 | var.radius, \ |
---|
| 147 | var.sld, \ |
---|
| 148 | var.sld_solvent) |
---|
| 149 | |
---|
[e822c97] | 150 | Cylinder2D:: |
---|
[03cac08] | 151 | |
---|
[9ee2756] | 152 | #define CALL_IQ(q, i, var) Iqxy(qa, qc, \ |
---|
[03cac08] | 153 | var.length, \ |
---|
| 154 | var.radius, \ |
---|
| 155 | var.sld, \ |
---|
[9ee2756] | 156 | var.sld_solvent) |
---|
[03cac08] | 157 | |
---|
[d5ac45f] | 158 | - CALL_VOLUME(var) is similar, but for calling the form volume:: |
---|
| 159 | |
---|
| 160 | #define CALL_VOLUME(var) \ |
---|
| 161 | form_volume(var.length, var.radius) |
---|
| 162 | |
---|
[e822c97] | 163 | There is an additional macro that can be defined within the model.c file: |
---|
[d5ac45f] | 164 | |
---|
[e822c97] | 165 | - INVALID(var) is a test for model parameters in the correct range: |
---|
| 166 | |
---|
| 167 | Cylinder:: |
---|
[d5ac45f] | 168 | |
---|
| 169 | #define INVALID(var) 0 |
---|
| 170 | |
---|
[e822c97] | 171 | BarBell:: |
---|
[d5ac45f] | 172 | |
---|
[6fd8de0] | 173 | #define INVALID(var) (var.bell_radius < var.radius) |
---|
[d5ac45f] | 174 | |
---|
[e822c97] | 175 | Model with complicated constraints:: |
---|
[d5ac45f] | 176 | |
---|
| 177 | inline bool constrained(p1, p2, p3) { return expression; } |
---|
| 178 | #define INVALID(var) constrained(var.p1, var.p2, var.p3) |
---|
| 179 | |
---|
[9ee2756] | 180 | Our design supports a limited number of dispersity loops, wherein |
---|
| 181 | we need to cycle through the values of the dispersity, calculate |
---|
[d5ac45f] | 182 | the I(q, p) for each combination of parameters, and perform a normalized |
---|
| 183 | weighted sum across all the weights. Parameters may be passed to the |
---|
[9ee2756] | 184 | underlying calculation engine as scalars or vectors, but the dispersity |
---|
[d5ac45f] | 185 | calculator treats the parameter set as one long vector. |
---|
| 186 | |
---|
[9ee2756] | 187 | Let's assume we have 8 parameters in the model, two of which allow dispersity. |
---|
| 188 | Since this is a 1-D model the orientation parameters won't be used:: |
---|
[d5ac45f] | 189 | |
---|
| 190 | 0: scale {scl = constant} |
---|
| 191 | 1: background {bkg = constant} |
---|
[e822c97] | 192 | 2: radius {r = vector of 10pts} |
---|
| 193 | 3: length {l = vector of 30pts} |
---|
[9ee2756] | 194 | 4: sld {s1 = constant/(radius**2*length)} |
---|
[6fd8de0] | 195 | 5: sld_solvent {s2 = constant} |
---|
[e822c97] | 196 | 6: theta {not used} |
---|
| 197 | 7: phi {not used} |
---|
| 198 | |
---|
| 199 | This generates the following call to the kernel. Note that parameters 4 and |
---|
[9ee2756] | 200 | 5 are treated as having dispersity even though they don't --- this is because |
---|
[e822c97] | 201 | it is harmless to do so and simplifies the looping code:: |
---|
| 202 | |
---|
| 203 | MAX_PD = 4 |
---|
| 204 | NUM_PARS = 6 // kernel parameters only |
---|
| 205 | NUM_VALUES = 17 // all values, including scale and background |
---|
| 206 | NUM_MAGNETIC = 2 // two parameters might be magnetic |
---|
| 207 | MAGNETIC_PARS = 4, 5 // they are sld and sld_solvent |
---|
| 208 | |
---|
| 209 | details { |
---|
| 210 | pd_par = {3, 2, 4, 5} // parameters *radius* and *length* vary |
---|
| 211 | pd_length = {30, 10, 1, 1} // *length* has more, so it is first |
---|
| 212 | pd_offset = {10, 0, 31, 32} // *length* starts at index 10 in weights |
---|
[d5ac45f] | 213 | pd_stride = {1, 30, 300, 300} // cumulative product of pd length |
---|
[9ee2756] | 214 | num_eval = 300 // 300 values in the dispersity loop |
---|
[e822c97] | 215 | num_weights = 42 // 42 values in the pd vector |
---|
| 216 | num_active = 2 // only the first two pd are active |
---|
| 217 | theta_var = 6 // spherical correction |
---|
[d5ac45f] | 218 | } |
---|
| 219 | |
---|
[e822c97] | 220 | values = { scl, bkg, // universal |
---|
[9ee2756] | 221 | r, l, s1, s2, theta, phi, // kernel pars |
---|
[e822c97] | 222 | in spin, out spin, spin angle, // applied magnetism |
---|
[9ee2756] | 223 | mx s1, my s1, mz s1, mx s2, my s2, mz s2, // magnetic slds |
---|
[e822c97] | 224 | r0, .., r9, l0, .., l29, s, s2, // pd values |
---|
| 225 | r0, .., r9, l0, .., l29, s, s2} // pd weights |
---|
[d5ac45f] | 226 | |
---|
| 227 | nq = 130 |
---|
| 228 | q = { q0, q1, ..., q130, x, x } # pad to 8 element boundary |
---|
[e822c97] | 229 | result = {r1, ..., r130, pd_norm, x } |
---|
[d5ac45f] | 230 | |
---|
[9ee2756] | 231 | The dispersity parameters are stored in as an array of parameter |
---|
| 232 | indices, one for each parameter, stored in pd_par[n]. Parameters which do |
---|
| 233 | not support dispersity do not appear in this array. Each dispersity |
---|
[6fd8de0] | 234 | parameter has a weight vector whose length is stored in pd_length[n]. |
---|
[d5ac45f] | 235 | The weights are stored in a contiguous vector of weights for all |
---|
| 236 | parameters, with the starting position for the each parameter stored |
---|
| 237 | in pd_offset[n]. The values corresponding to the weights are stored |
---|
| 238 | together in a separate weights[] vector, with offset stored in |
---|
[9ee2756] | 239 | par_offset[pd_par[n]]. Dispersity parameters should be stored in |
---|
[d5ac45f] | 240 | decreasing order of length for highest efficiency. |
---|
| 241 | |
---|
[9ee2756] | 242 | We limit the number of dispersity dimensions to MAX_PD (currently 4), |
---|
| 243 | though some models may have fewer if they have fewer dispersity |
---|
[e822c97] | 244 | parameters. The main reason for the limit is to reduce code size. |
---|
[9ee2756] | 245 | Each additional dispersity parameter requires a separate dispersity |
---|
| 246 | loop. If more than 4 levels of dispersity are needed, then we need to |
---|
| 247 | switch to a monte carlo importance sampling algorithm with better |
---|
| 248 | performance for high-dimensional integrals. |
---|
[e822c97] | 249 | |
---|
| 250 | Constraints between parameters are not supported. Instead users will |
---|
| 251 | have to define a new model with the constraints built in by making a |
---|
| 252 | copy of the existing model. Mac provides OpenCL and we are supplying |
---|
| 253 | the tinycc compiler for Windows so this won't be a complete limitation, |
---|
| 254 | but it is rather inconvenient. The process could perhaps be automated |
---|
| 255 | so that there is no code copying required, just an alternate CALL_IQ |
---|
| 256 | macro that implements the constraints. Think carefully about constraining |
---|
| 257 | theta since the polar coordinates normalization is tied to this parameter. |
---|
[d5ac45f] | 258 | |
---|
[9ee2756] | 259 | If there is no dispersity we pretend that we have a disperisty mesh over |
---|
| 260 | a single parameter with a single point in the distribution, giving |
---|
| 261 | pd_start=0 and pd_stop=1. |
---|
[d5ac45f] | 262 | |
---|
[e822c97] | 263 | The problem details structure could be allocated and sent in as an integer |
---|
| 264 | array using the read-only flag. This would allow us to copy it once per fit |
---|
[d5ac45f] | 265 | along with the weights vector, since features such as the number of |
---|
[9ee2756] | 266 | disperity points per pd parameter won't change between function evaluations. |
---|
| 267 | A new parameter vector must be sent for each I(q) evaluation. This is |
---|
| 268 | not currently implemented, and would require some resturcturing of |
---|
| 269 | the :class:`sasview_model.SasviewModel` interface. |
---|
[d5ac45f] | 270 | |
---|
[9ee2756] | 271 | The results array will be initialized to zero for dispersity loop |
---|
[d5ac45f] | 272 | entry zero, and preserved between calls to [start, stop] so that the |
---|
| 273 | results accumulate by the time the loop has completed. Background and |
---|
| 274 | scale will be applied when the loop reaches the end. This does require |
---|
| 275 | that the results array be allocated read-write, which is less efficient |
---|
| 276 | for the GPU, but it makes the calling sequence much more manageable. |
---|
| 277 | |
---|
[03cac08] | 278 | For accuracy we may want to introduce Kahan summation into the integration:: |
---|
| 279 | |
---|
| 280 | double accumulated_error = 0.0; |
---|
| 281 | ... |
---|
| 282 | #if USE_KAHAN_SUMMATION |
---|
| 283 | const double y = next - accumulated_error; |
---|
| 284 | const double t = ret + y; |
---|
| 285 | accumulated_error = (t - ret) - y; |
---|
| 286 | ret = t; |
---|
| 287 | #else |
---|
| 288 | ret += next; |
---|
| 289 | #endif |
---|
[e822c97] | 290 | |
---|
| 291 | This will require accumulated error for each I(q) value to be preserved |
---|
[9ee2756] | 292 | between kernel calls to implement this fully. The *kernel_iq.c* code, which |
---|
| 293 | loops over q for each parameter set in the dispersity loop, will also need |
---|
| 294 | the accumulation vector. |
---|