[e822c97] | 1 | .. currentmodule:: sasmodels |
---|
| 2 | |
---|
[d5ac45f] | 3 | Calculator Interface |
---|
| 4 | ==================== |
---|
| 5 | |
---|
[e822c97] | 6 | This document describes the layer between the form factor kernels and the |
---|
| 7 | model calculator which implements the polydispersity and magnetic SLD |
---|
| 8 | calculations. There are three separate implementations of this layer, |
---|
| 9 | *kernelcl.py* for OpenCL, which operates on a single Q value at a time, |
---|
| 10 | *kerneldll.c* for the DLL, which loops over a vector of Q values, and |
---|
| 11 | *kernelpy.py* for python models which operates on vector Q values. |
---|
| 12 | |
---|
| 13 | Each implementation provides three different calls *Iq*, *Iqxy* and *Imagnetic* |
---|
| 14 | for 1-D, 2-D and 2-D magnetic kernels respectively. |
---|
| 15 | |
---|
| 16 | The C code is defined in *kernel_iq.c* and *kernel_iq.cl* for DLL and OpenCL |
---|
| 17 | respectively. The kernel call looks as follows:: |
---|
| 18 | |
---|
| 19 | kernel void KERNEL_NAME( |
---|
| 20 | int nq, // Number of q values in the q vector |
---|
| 21 | int pd_start, // Starting position in the polydispersity loop |
---|
| 22 | int pd_stop, // Ending position in the polydispersity loop |
---|
| 23 | ProblemDetails *details, // Polydispersity info |
---|
| 24 | double *values, // Value and weights vector |
---|
| 25 | double *q, // q or (qx,qy) vector |
---|
| 26 | double *result, // returned I(q), with result[nq] = pd_weight |
---|
| 27 | double cutoff) // polydispersity weight cutoff |
---|
| 28 | |
---|
| 29 | The details for OpenCL and the python loop are slightly different, but these |
---|
| 30 | data structures are common. |
---|
| 31 | |
---|
| 32 | *nq* indicates the number of q values that will be calculated. |
---|
| 33 | |
---|
| 34 | The *pd_start* and *pd_stop* parameters set the range of the polydispersity |
---|
| 35 | loop to compute for the current kernel call. Give a polydispersity |
---|
| 36 | calculation with 30 weights for length and 30 weights for radius for example, |
---|
| 37 | there are a total of 900 calls to the form factor required to compute the |
---|
| 38 | kernel. These might be done in chunks of 100, so the first call would start |
---|
| 39 | at zero and stop after 100 iterations. The second call would then have to set |
---|
| 40 | the length index to 3 and the radius index to 10 for a position of 3*30+10=100, |
---|
| 41 | and could then proceed to position 200. This allows us to interrupt the |
---|
| 42 | calculation in the middle of a long polydispersity loop without having to |
---|
| 43 | do special tricks with the C code. More importantly, it stops the OpenCL |
---|
| 44 | kernel in a reasonable time; because the GPU is used by the operating |
---|
| 45 | system to show its windows, if a GPU kernel runs too long then it will be |
---|
| 46 | automatically killed and no results will be returned to the caller. |
---|
| 47 | |
---|
| 48 | The *ProblemDetails* structure is a direct map of the |
---|
| 49 | :class:`details.CallDetails` buffer. This indicates which parameters are |
---|
| 50 | polydisperse, and where in the values vector the values and weights can be |
---|
| 51 | found. For each polydisperse parameter there is a parameter id, the length |
---|
| 52 | of the polydispersity loop for that parameter, the offset of the parameter |
---|
| 53 | values in the pd value and pd weight vectors and the 'stride' from one index |
---|
| 54 | to the next, which is used to translate between the position in the |
---|
| 55 | polydispersity loop and the particular parameter indices. The *num_eval* |
---|
| 56 | field is the total size of the polydispersity loop. *num_weights* is the |
---|
| 57 | number of elements in the pd value and pd weight vectors. *num_active* is |
---|
| 58 | the number of non-trivial pd loops (polydisperse parameters should be ordered |
---|
| 59 | by decreasing pd vector length, with a length of 1 meaning no polydispersity). |
---|
| 60 | Oriented objects in 2-D need a cos(theta) spherical correction on the angular |
---|
| 61 | variation in order to preserve the 'surface area' of the weight distribution. |
---|
| 62 | *theta_par* is the id of the polar coordinate parameter if there is one. |
---|
| 63 | |
---|
| 64 | |
---|
| 65 | The *values* vector consists of the fixed values for the model plus pd value |
---|
| 66 | and pd weight vectors. There are *NUM_VALUES* fixed values for the model, |
---|
| 67 | which includes the initial two slots for scale and background, the NUM_PARS |
---|
| 68 | values for the kernel parameters, three slots for the applied magnetism, and |
---|
| 69 | three slots for each of the SLD parameters for the sample magnetiziation |
---|
| 70 | *(Mx, My, Mz)*. Sample magnetization is translated from *(M, theta, phi)* |
---|
| 71 | to *(Mx, My, Mz)* before the kernel is called. After the fixed values comes |
---|
| 72 | the pd value vector, with the polydispersity values for each parameter |
---|
| 73 | stacked one after the other. The order isn't important since the location |
---|
| 74 | for each parameter is stored in the *pd_offset* field of the *ProblemDetails* |
---|
| 75 | structure, but the values do need to be contiguous. After *num_weights* |
---|
| 76 | values, the pd weight vector is stored, with the same configuration as the |
---|
| 77 | pd value vector. Note that the pd vectors can contain values that are not |
---|
| 78 | in the polydispersity loop; this is used by :class:`mixture.MixtureKernel` |
---|
| 79 | to make it easier to call the various component kernels. |
---|
| 80 | |
---|
| 81 | The *q* vector contains one value for each q for *Iq* kernels, or a pair |
---|
| 82 | of *(qx,qy)* values for each q for *Iqxy* and *Imagnetic* kernels. OpenCL |
---|
| 83 | pads all vectors to 32 value boundaries just to be safe, including the |
---|
| 84 | *values* vector and the the *results* vector. |
---|
| 85 | |
---|
| 86 | The *results* vector contains one slot for each of the *nq* values, plus |
---|
| 87 | one extra slot at the end for the current polydisperse normalization. This |
---|
| 88 | is required when the polydispersity loop is broken across several kernel |
---|
| 89 | calls. |
---|
| 90 | |
---|
| 91 | *cutoff* is a importance cutoff so that points which contribute negligibly |
---|
| 92 | to the total scattering can be skipped without calculating them. |
---|
| 93 | |
---|
| 94 | :func:`generate.make_source` defines the following C macros: |
---|
[d5ac45f] | 95 | |
---|
| 96 | - USE_OPENCL is defined if running in opencl |
---|
[6fd8de0] | 97 | - MAX_PD is the maximum depth of the polydispersity loop [model specific] |
---|
[e822c97] | 98 | - NUM_PARS is the number of parameter values in the kernel. This may be |
---|
| 99 | more than the number of parameters if some of the parameters are vector |
---|
| 100 | values. |
---|
| 101 | - NUM_VALUES is the number of fixed values, which defines the offset in the |
---|
| 102 | value list to the polydisperse value and weight vectors. |
---|
| 103 | - NUM_MAGNETIC is the number of magnetic SLDs |
---|
| 104 | - MAGNETIC_PARS is a comma separated list of the magnetic SLDs, indicating |
---|
| 105 | their locations in the values vector. |
---|
| 106 | - MAGNETIC_PAR0 to MAGNETIC_PAR2 are the first three magnetic parameter ids |
---|
| 107 | so we can hard code the setting of magnetic values if there are only a |
---|
| 108 | few of them. |
---|
| 109 | - KERNEL_NAME is the name of the function being declared |
---|
| 110 | - PARAMETER_TABLE is the declaration of the parameters to the kernel: |
---|
[d5ac45f] | 111 | |
---|
[e822c97] | 112 | Cylinder:: |
---|
[d5ac45f] | 113 | |
---|
| 114 | #define PARAMETER_TABLE \ |
---|
| 115 | double length; \ |
---|
| 116 | double radius; \ |
---|
| 117 | double sld; \ |
---|
[e822c97] | 118 | double sld_solvent; |
---|
[d5ac45f] | 119 | |
---|
| 120 | Note: scale and background are never included |
---|
| 121 | |
---|
[e822c97] | 122 | Multi-shell cylinder (10 shell max):: |
---|
[d5ac45f] | 123 | |
---|
| 124 | #define PARAMETER_TABLE \ |
---|
| 125 | double num_shells; \ |
---|
| 126 | double length; \ |
---|
| 127 | double radius[10]; \ |
---|
| 128 | double sld[10]; \ |
---|
[e822c97] | 129 | double sld_solvent; |
---|
[d5ac45f] | 130 | |
---|
[e822c97] | 131 | - CALL_IQ(q, i, var) is the declaration of a call to the kernel: |
---|
[d5ac45f] | 132 | |
---|
[e822c97] | 133 | Cylinder:: |
---|
[d5ac45f] | 134 | |
---|
[03cac08] | 135 | #define CALL_IQ(q, i, var) Iq(q[i], \ |
---|
[d5ac45f] | 136 | var.length, \ |
---|
| 137 | var.radius, \ |
---|
| 138 | var.sld, \ |
---|
| 139 | var.sld_solvent) |
---|
| 140 | |
---|
[e822c97] | 141 | Multi-shell cylinder:: |
---|
[d5ac45f] | 142 | |
---|
[03cac08] | 143 | #define CALL_IQ(q, i, var) Iq(q[i], \ |
---|
[d5ac45f] | 144 | var.num_shells, \ |
---|
| 145 | var.length, \ |
---|
| 146 | var.radius, \ |
---|
| 147 | var.sld, \ |
---|
| 148 | var.sld_solvent) |
---|
| 149 | |
---|
[e822c97] | 150 | Cylinder2D:: |
---|
[03cac08] | 151 | |
---|
| 152 | #define CALL_IQ(q, i, var) Iqxy(q[2*i], q[2*i+1], \ |
---|
| 153 | var.length, \ |
---|
| 154 | var.radius, \ |
---|
| 155 | var.sld, \ |
---|
| 156 | var.sld_solvent, \ |
---|
| 157 | var.theta, \ |
---|
| 158 | var.phi) |
---|
| 159 | |
---|
[d5ac45f] | 160 | - CALL_VOLUME(var) is similar, but for calling the form volume:: |
---|
| 161 | |
---|
| 162 | #define CALL_VOLUME(var) \ |
---|
| 163 | form_volume(var.length, var.radius) |
---|
| 164 | |
---|
[e822c97] | 165 | There is an additional macro that can be defined within the model.c file: |
---|
[d5ac45f] | 166 | |
---|
[e822c97] | 167 | - INVALID(var) is a test for model parameters in the correct range: |
---|
| 168 | |
---|
| 169 | Cylinder:: |
---|
[d5ac45f] | 170 | |
---|
| 171 | #define INVALID(var) 0 |
---|
| 172 | |
---|
[e822c97] | 173 | BarBell:: |
---|
[d5ac45f] | 174 | |
---|
[6fd8de0] | 175 | #define INVALID(var) (var.bell_radius < var.radius) |
---|
[d5ac45f] | 176 | |
---|
[e822c97] | 177 | Model with complicated constraints:: |
---|
[d5ac45f] | 178 | |
---|
| 179 | inline bool constrained(p1, p2, p3) { return expression; } |
---|
| 180 | #define INVALID(var) constrained(var.p1, var.p2, var.p3) |
---|
| 181 | |
---|
| 182 | Our design supports a limited number of polydispersity loops, wherein |
---|
| 183 | we need to cycle through the values of the polydispersity, calculate |
---|
| 184 | the I(q, p) for each combination of parameters, and perform a normalized |
---|
| 185 | weighted sum across all the weights. Parameters may be passed to the |
---|
| 186 | underlying calculation engine as scalars or vectors, but the polydispersity |
---|
| 187 | calculator treats the parameter set as one long vector. |
---|
| 188 | |
---|
[e822c97] | 189 | Let's assume we have 8 parameters in the model, with two polydisperse. Since |
---|
| 190 | this is a 1-D model the orientation parameters won't be used:: |
---|
[d5ac45f] | 191 | |
---|
| 192 | 0: scale {scl = constant} |
---|
| 193 | 1: background {bkg = constant} |
---|
[e822c97] | 194 | 2: radius {r = vector of 10pts} |
---|
| 195 | 3: length {l = vector of 30pts} |
---|
[6fd8de0] | 196 | 4: sld {s = constant/(radius**2*length)} |
---|
| 197 | 5: sld_solvent {s2 = constant} |
---|
[e822c97] | 198 | 6: theta {not used} |
---|
| 199 | 7: phi {not used} |
---|
| 200 | |
---|
| 201 | This generates the following call to the kernel. Note that parameters 4 and |
---|
| 202 | 5 are treated as polydisperse even though they are not --- this is because |
---|
| 203 | it is harmless to do so and simplifies the looping code:: |
---|
| 204 | |
---|
| 205 | MAX_PD = 4 |
---|
| 206 | NUM_PARS = 6 // kernel parameters only |
---|
| 207 | NUM_VALUES = 17 // all values, including scale and background |
---|
| 208 | NUM_MAGNETIC = 2 // two parameters might be magnetic |
---|
| 209 | MAGNETIC_PARS = 4, 5 // they are sld and sld_solvent |
---|
| 210 | MAGNETIC_PAR0 = 4 // sld index |
---|
| 211 | MAGNETIC_PAR1 = 5 // solvent index |
---|
| 212 | |
---|
| 213 | details { |
---|
| 214 | pd_par = {3, 2, 4, 5} // parameters *radius* and *length* vary |
---|
| 215 | pd_length = {30, 10, 1, 1} // *length* has more, so it is first |
---|
| 216 | pd_offset = {10, 0, 31, 32} // *length* starts at index 10 in weights |
---|
[d5ac45f] | 217 | pd_stride = {1, 30, 300, 300} // cumulative product of pd length |
---|
[e822c97] | 218 | num_eval = 300 // 300 values in the polydispersity loop |
---|
| 219 | num_weights = 42 // 42 values in the pd vector |
---|
| 220 | num_active = 2 // only the first two pd are active |
---|
| 221 | theta_var = 6 // spherical correction |
---|
[d5ac45f] | 222 | } |
---|
| 223 | |
---|
[e822c97] | 224 | values = { scl, bkg, // universal |
---|
| 225 | r, l, s, s2, theta, phi, // kernel pars |
---|
| 226 | in spin, out spin, spin angle, // applied magnetism |
---|
| 227 | mx s, my s, mz s, mx s2, my s2, mz s2, // magnetic slds |
---|
| 228 | r0, .., r9, l0, .., l29, s, s2, // pd values |
---|
| 229 | r0, .., r9, l0, .., l29, s, s2} // pd weights |
---|
[d5ac45f] | 230 | |
---|
| 231 | nq = 130 |
---|
| 232 | q = { q0, q1, ..., q130, x, x } # pad to 8 element boundary |
---|
[e822c97] | 233 | result = {r1, ..., r130, pd_norm, x } |
---|
[d5ac45f] | 234 | |
---|
| 235 | The polydisperse parameters are stored in as an array of parameter |
---|
| 236 | indices, one for each polydisperse parameter, stored in pd_par[n]. |
---|
| 237 | Non-polydisperse parameters do not appear in this array. Each polydisperse |
---|
[6fd8de0] | 238 | parameter has a weight vector whose length is stored in pd_length[n]. |
---|
[d5ac45f] | 239 | The weights are stored in a contiguous vector of weights for all |
---|
| 240 | parameters, with the starting position for the each parameter stored |
---|
| 241 | in pd_offset[n]. The values corresponding to the weights are stored |
---|
| 242 | together in a separate weights[] vector, with offset stored in |
---|
| 243 | par_offset[pd_par[n]]. Polydisperse parameters should be stored in |
---|
| 244 | decreasing order of length for highest efficiency. |
---|
| 245 | |
---|
[48fbd50] | 246 | We limit the number of polydisperse dimensions to MAX_PD (currently 4), |
---|
| 247 | though some models may have fewer if they have fewer polydisperse |
---|
[e822c97] | 248 | parameters. The main reason for the limit is to reduce code size. |
---|
| 249 | Each additional polydisperse parameter requires a separate polydispersity |
---|
| 250 | loop. If more than 4 levels of polydispersity are needed, then kernel_iq.c |
---|
| 251 | and kernel_iq.cl will need to be extended. |
---|
| 252 | |
---|
| 253 | Constraints between parameters are not supported. Instead users will |
---|
| 254 | have to define a new model with the constraints built in by making a |
---|
| 255 | copy of the existing model. Mac provides OpenCL and we are supplying |
---|
| 256 | the tinycc compiler for Windows so this won't be a complete limitation, |
---|
| 257 | but it is rather inconvenient. The process could perhaps be automated |
---|
| 258 | so that there is no code copying required, just an alternate CALL_IQ |
---|
| 259 | macro that implements the constraints. Think carefully about constraining |
---|
| 260 | theta since the polar coordinates normalization is tied to this parameter. |
---|
[d5ac45f] | 261 | |
---|
| 262 | If there is no polydispersity we pretend that it is polydisperisty with one |
---|
| 263 | parameter, pd_start=0 and pd_stop=1. We may or may not short circuit the |
---|
| 264 | calculation in this case, depending on how much time it saves. |
---|
| 265 | |
---|
[e822c97] | 266 | The problem details structure could be allocated and sent in as an integer |
---|
| 267 | array using the read-only flag. This would allow us to copy it once per fit |
---|
[d5ac45f] | 268 | along with the weights vector, since features such as the number of |
---|
| 269 | polydisperity elements per pd parameter or the coordinated won't change |
---|
[e822c97] | 270 | between function evaluations. A new parameter vector must be sent for |
---|
| 271 | each I(q) evaluation. This is not currently implemented, and would require |
---|
| 272 | some resturcturing of the :class:`sasview_model.SasviewModel` interface. |
---|
[d5ac45f] | 273 | |
---|
| 274 | The results array will be initialized to zero for polydispersity loop |
---|
| 275 | entry zero, and preserved between calls to [start, stop] so that the |
---|
| 276 | results accumulate by the time the loop has completed. Background and |
---|
| 277 | scale will be applied when the loop reaches the end. This does require |
---|
| 278 | that the results array be allocated read-write, which is less efficient |
---|
| 279 | for the GPU, but it makes the calling sequence much more manageable. |
---|
| 280 | |
---|
[03cac08] | 281 | For accuracy we may want to introduce Kahan summation into the integration:: |
---|
| 282 | |
---|
| 283 | double accumulated_error = 0.0; |
---|
| 284 | ... |
---|
| 285 | #if USE_KAHAN_SUMMATION |
---|
| 286 | const double y = next - accumulated_error; |
---|
| 287 | const double t = ret + y; |
---|
| 288 | accumulated_error = (t - ret) - y; |
---|
| 289 | ret = t; |
---|
| 290 | #else |
---|
| 291 | ret += next; |
---|
| 292 | #endif |
---|
[e822c97] | 293 | |
---|
| 294 | This will require accumulated error for each I(q) value to be preserved |
---|
| 295 | between kernel calls to implement this fully. The kernel_iq.c code, which |
---|
| 296 | loops over q for each parameter set in the polydispersity loop, will need |
---|
| 297 | also need the accumalation vector. |
---|