[d5ac45f] | 1 | Calculator Interface |
---|
| 2 | ==================== |
---|
| 3 | |
---|
| 4 | The environment needs to provide the following #defines: |
---|
| 5 | |
---|
| 6 | - USE_OPENCL is defined if running in opencl |
---|
| 7 | - KERNEL declares a function to be available externally |
---|
| 8 | - KERNEL_NAME is the name of the function being declared |
---|
[6fd8de0] | 9 | - MAX_PD is the maximum depth of the polydispersity loop [model specific] |
---|
[d5ac45f] | 10 | - NPARS is the number of parameters in the kernel |
---|
| 11 | - PARAMETER_TABLE is the declaration of the parameters to the kernel:: |
---|
| 12 | |
---|
| 13 | Cylinder: |
---|
| 14 | |
---|
| 15 | #define PARAMETER_TABLE \ |
---|
| 16 | double length; \ |
---|
| 17 | double radius; \ |
---|
| 18 | double sld; \ |
---|
| 19 | double sld_solvent |
---|
| 20 | |
---|
| 21 | Note: scale and background are never included |
---|
| 22 | |
---|
| 23 | Multi-shell cylinder (10 shell max): |
---|
| 24 | |
---|
| 25 | #define PARAMETER_TABLE \ |
---|
| 26 | double num_shells; \ |
---|
| 27 | double length; \ |
---|
| 28 | double radius[10]; \ |
---|
| 29 | double sld[10]; \ |
---|
| 30 | double sld_solvent |
---|
| 31 | |
---|
[6fd8de0] | 32 | - CALL_IQ(q, i, var) is the declaration of a call to the kernel:: |
---|
[d5ac45f] | 33 | |
---|
| 34 | Cylinder: |
---|
| 35 | |
---|
[03cac08] | 36 | #define CALL_IQ(q, i, var) Iq(q[i], \ |
---|
[d5ac45f] | 37 | var.length, \ |
---|
| 38 | var.radius, \ |
---|
| 39 | var.sld, \ |
---|
| 40 | var.sld_solvent) |
---|
| 41 | |
---|
| 42 | Multi-shell cylinder: |
---|
| 43 | |
---|
[03cac08] | 44 | #define CALL_IQ(q, i, var) Iq(q[i], \ |
---|
[d5ac45f] | 45 | var.num_shells, \ |
---|
| 46 | var.length, \ |
---|
| 47 | var.radius, \ |
---|
| 48 | var.sld, \ |
---|
| 49 | var.sld_solvent) |
---|
| 50 | |
---|
[03cac08] | 51 | Cylinder2D: |
---|
| 52 | |
---|
| 53 | #define CALL_IQ(q, i, var) Iqxy(q[2*i], q[2*i+1], \ |
---|
| 54 | var.length, \ |
---|
| 55 | var.radius, \ |
---|
| 56 | var.sld, \ |
---|
| 57 | var.sld_solvent, \ |
---|
| 58 | var.theta, \ |
---|
| 59 | var.phi) |
---|
| 60 | |
---|
[d5ac45f] | 61 | - CALL_VOLUME(var) is similar, but for calling the form volume:: |
---|
| 62 | |
---|
| 63 | #define CALL_VOLUME(var) \ |
---|
| 64 | form_volume(var.length, var.radius) |
---|
| 65 | |
---|
| 66 | - INVALID(var) is a test for model parameters in the correct range:: |
---|
| 67 | |
---|
| 68 | Cylinder: |
---|
| 69 | |
---|
| 70 | #define INVALID(var) 0 |
---|
| 71 | |
---|
| 72 | BarBell: |
---|
| 73 | |
---|
[6fd8de0] | 74 | #define INVALID(var) (var.bell_radius < var.radius) |
---|
[d5ac45f] | 75 | |
---|
| 76 | Model with complicated constraints: |
---|
| 77 | |
---|
| 78 | inline bool constrained(p1, p2, p3) { return expression; } |
---|
| 79 | #define INVALID(var) constrained(var.p1, var.p2, var.p3) |
---|
| 80 | |
---|
| 81 | Our design supports a limited number of polydispersity loops, wherein |
---|
| 82 | we need to cycle through the values of the polydispersity, calculate |
---|
| 83 | the I(q, p) for each combination of parameters, and perform a normalized |
---|
| 84 | weighted sum across all the weights. Parameters may be passed to the |
---|
| 85 | underlying calculation engine as scalars or vectors, but the polydispersity |
---|
| 86 | calculator treats the parameter set as one long vector. |
---|
| 87 | |
---|
| 88 | Let's assume we have 6 parameters in the model, with two polydisperse:: |
---|
| 89 | |
---|
| 90 | 0: scale {scl = constant} |
---|
| 91 | 1: background {bkg = constant} |
---|
[6fd8de0] | 92 | 2: length {l = vector of 30pts} |
---|
| 93 | 3: radius {r = vector of 10pts} |
---|
| 94 | 4: sld {s = constant/(radius**2*length)} |
---|
| 95 | 5: sld_solvent {s2 = constant} |
---|
[d5ac45f] | 96 | |
---|
| 97 | This generates the following call to the kernel (where x stands for an |
---|
| 98 | arbitrary value that is not used by the kernel evaluator):: |
---|
| 99 | |
---|
| 100 | NPARS = 4 // scale and background are in all models |
---|
| 101 | problem { |
---|
[6fd8de0] | 102 | pd_par = {3, 2, x, x} // parameters *radius* and *length* vary |
---|
[d5ac45f] | 103 | pd_length = {30, 10, 0, 0} // *length* has more, so it is first |
---|
| 104 | pd_offset = {10, 0, x, x} // *length* starts at index 10 in weights |
---|
| 105 | pd_stride = {1, 30, 300, 300} // cumulative product of pd length |
---|
[6fd8de0] | 106 | pd_isvol = {True, True, x, x} // true if weight is a volume weight |
---|
[d5ac45f] | 107 | par_offset = {2, 3, 303, 313} // parameter offsets |
---|
| 108 | par_coord = {0, 3, 2, 1} // bitmap of parameter dependencies |
---|
[6fd8de0] | 109 | fast_coord_index = {3, 5, x, x} // radius and sld have fast index |
---|
[d5ac45f] | 110 | fast_coord_count = 2 // two parameters vary with *length* distribution |
---|
| 111 | theta_var = -1 // no spherical correction |
---|
| 112 | fast_theta = 0 // spherical correction angle is not pd 1 |
---|
| 113 | } |
---|
| 114 | |
---|
[6fd8de0] | 115 | weight = { l0, .., l29, r0, .., r9} //length comes first as the longest vec |
---|
[d5ac45f] | 116 | pars = { scl, bkg, l0, ..., l29, r0, r1, ..., r9, |
---|
| 117 | s[l0,r0], ... s[l0,r9], s[l1,r0], ... s[l29,r9] , s2} |
---|
[6fd8de0] | 118 | //where s[x,y] stands for material sld, s2 = solvent sld |
---|
[d5ac45f] | 119 | |
---|
| 120 | nq = 130 |
---|
| 121 | q = { q0, q1, ..., q130, x, x } # pad to 8 element boundary |
---|
| 122 | result = {r1, ..., r130, norm, vol, vol_norm, x, x, x, x, x, x, x} |
---|
| 123 | |
---|
| 124 | |
---|
| 125 | The polydisperse parameters are stored in as an array of parameter |
---|
| 126 | indices, one for each polydisperse parameter, stored in pd_par[n]. |
---|
| 127 | Non-polydisperse parameters do not appear in this array. Each polydisperse |
---|
[6fd8de0] | 128 | parameter has a weight vector whose length is stored in pd_length[n]. |
---|
[d5ac45f] | 129 | The weights are stored in a contiguous vector of weights for all |
---|
| 130 | parameters, with the starting position for the each parameter stored |
---|
| 131 | in pd_offset[n]. The values corresponding to the weights are stored |
---|
| 132 | together in a separate weights[] vector, with offset stored in |
---|
| 133 | par_offset[pd_par[n]]. Polydisperse parameters should be stored in |
---|
| 134 | decreasing order of length for highest efficiency. |
---|
| 135 | |
---|
[48fbd50] | 136 | We limit the number of polydisperse dimensions to MAX_PD (currently 4), |
---|
| 137 | though some models may have fewer if they have fewer polydisperse |
---|
| 138 | parameters. This cuts the size of the structure in half compared to |
---|
| 139 | allowing a separate polydispersity for each parameter. This will |
---|
| 140 | help a little bit for models with large numbers of parameters, such |
---|
| 141 | as the onion model. |
---|
[d5ac45f] | 142 | |
---|
| 143 | Parameters may be coordinated. That is, we may have the value of one |
---|
| 144 | parameter depend on a set of other parameters, some of which may be |
---|
| 145 | polydisperse. For example, if sld is inversely proportional to the |
---|
| 146 | volume of a cylinder, and the length and radius are independently |
---|
| 147 | polydisperse, then for each combination of length and radius we need a |
---|
| 148 | separate value for the sld. The caller must provide a coordination table |
---|
| 149 | for each parameter containing the value for each parameter given the |
---|
| 150 | value of the polydisperse parameters v1, v2, etc. The tables for each |
---|
| 151 | parameter are arranged contiguously in a vector, with offset[k] giving the |
---|
| 152 | starting location of parameter k in the vector. Each parameter defines |
---|
[6fd8de0] | 153 | par_coord[k] as a bit mask indicating which polydispersity parameters the |
---|
[d5ac45f] | 154 | parameter depends upon. Usually this is zero, indicating that the parameter |
---|
| 155 | is independent, but for the cylinder example given, the bits for the |
---|
| 156 | radius and length polydispersity parameters would both be set, the result |
---|
| 157 | being a (#radius x #length) table, or maybe a (#length x #radius) table |
---|
| 158 | if length comes first in the polydispersity table. |
---|
| 159 | |
---|
| 160 | NB: If we can guarantee that a compiler and OpenCL driver are available, |
---|
| 161 | we could instead create the coordination function on the fly for each |
---|
| 162 | parameter, saving memory and transfer time, but requiring a C compiler |
---|
| 163 | as part of the environment. |
---|
| 164 | |
---|
| 165 | In ordering the polydisperse parameters by decreasing length we can |
---|
| 166 | iterate over the longest dispersion weight vector first. All parameters |
---|
| 167 | coordinated with this weight vector (the 'fast' parameters), can be |
---|
| 168 | updated with a simple increment to the next position in the parameter |
---|
| 169 | value table. The indices of these parameters is stored in fast_coord_index[], |
---|
| 170 | with fast_coord_count being the number of fast parameters. A total |
---|
| 171 | of NPARS slots is allocated to allow for the case that all parameters |
---|
| 172 | are coordinated with the fast index, though this will likely be mostly |
---|
| 173 | empty. When the fast increment count reaches the end of the weight |
---|
| 174 | vector, then the index of the second polydisperse parameter must be |
---|
| 175 | incremented, and all of its coordinated parameters updated. Because this |
---|
| 176 | operation is not in the inner loop, a slower algorithm can be used. |
---|
| 177 | |
---|
| 178 | If there is no polydispersity we pretend that it is polydisperisty with one |
---|
| 179 | parameter, pd_start=0 and pd_stop=1. We may or may not short circuit the |
---|
| 180 | calculation in this case, depending on how much time it saves. |
---|
| 181 | |
---|
| 182 | The problem details structure can be allocated and sent in as an integer |
---|
| 183 | array using the read-only flag. This allows us to copy it once per fit |
---|
| 184 | along with the weights vector, since features such as the number of |
---|
| 185 | polydisperity elements per pd parameter or the coordinated won't change |
---|
| 186 | between function evaluations. A new parameter vector is sent for |
---|
| 187 | each I(q) evaluation. |
---|
| 188 | |
---|
| 189 | To protect against expensive evaluations taking all the GPU resource |
---|
| 190 | on large fits, the entire polydispersity will not be computed at once. |
---|
| 191 | Instead, a start and stop location will be sent, indicating where in the |
---|
| 192 | polydispersity loop the calculation should start and where it should |
---|
| 193 | stop. We can do this for arbitrary start/stop points since we have |
---|
| 194 | unwound the nested loop. Instead, we use the same technique as array |
---|
| 195 | index translation, using div and mod to figure out the i,j,k,... |
---|
| 196 | indices in the virtual nested loop. |
---|
| 197 | |
---|
| 198 | The results array will be initialized to zero for polydispersity loop |
---|
| 199 | entry zero, and preserved between calls to [start, stop] so that the |
---|
| 200 | results accumulate by the time the loop has completed. Background and |
---|
| 201 | scale will be applied when the loop reaches the end. This does require |
---|
| 202 | that the results array be allocated read-write, which is less efficient |
---|
| 203 | for the GPU, but it makes the calling sequence much more manageable. |
---|
| 204 | |
---|
| 205 | Scale and background cannot be coordinated with other polydisperse parameters |
---|
| 206 | |
---|
| 207 | Oriented objects in 2-D need a spherical correction on the angular variation |
---|
| 208 | in order to preserve the 'surface area' of the weight distribution. |
---|
| 209 | |
---|
[6fd8de0] | 210 | cutoff parameter limits integration area within polydispersity hypercude, |
---|
| 211 | which speeds calculations |
---|
[03cac08] | 212 | |
---|
| 213 | For accuracy we may want to introduce Kahan summation into the integration:: |
---|
| 214 | |
---|
| 215 | |
---|
| 216 | double accumulated_error = 0.0; |
---|
| 217 | ... |
---|
| 218 | #if USE_KAHAN_SUMMATION |
---|
| 219 | const double y = next - accumulated_error; |
---|
| 220 | const double t = ret + y; |
---|
| 221 | accumulated_error = (t - ret) - y; |
---|
| 222 | ret = t; |
---|
| 223 | #else |
---|
| 224 | ret += next; |
---|
| 225 | #endif |
---|