- Timestamp:
- Aug 30, 2016 6:21:51 PM (8 years ago)
- Branches:
- master, core_shell_microgels, costrafo411, magnetic_model, release_v0.94, release_v0.95, ticket-1257-vesicle-product, ticket_1156, ticket_1265_superball, ticket_822_more_unit_tests
- Children:
- 3a45c2c
- Parents:
- 51241113
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/developer/calculator.rst
r6fd8de0 re822c97 1 .. currentmodule:: sasmodels 2 1 3 Calculator Interface 2 4 ==================== 3 5 4 The environment needs to provide the following #defines: 6 This document describes the layer between the form factor kernels and the 7 model calculator which implements the polydispersity and magnetic SLD 8 calculations. There are three separate implementations of this layer, 9 *kernelcl.py* for OpenCL, which operates on a single Q value at a time, 10 *kerneldll.c* for the DLL, which loops over a vector of Q values, and 11 *kernelpy.py* for python models which operates on vector Q values. 12 13 Each implementation provides three different calls *Iq*, *Iqxy* and *Imagnetic* 14 for 1-D, 2-D and 2-D magnetic kernels respectively. 15 16 The C code is defined in *kernel_iq.c* and *kernel_iq.cl* for DLL and OpenCL 17 respectively. The kernel call looks as follows:: 18 19 kernel void KERNEL_NAME( 20 int nq, // Number of q values in the q vector 21 int pd_start, // Starting position in the polydispersity loop 22 int pd_stop, // Ending position in the polydispersity loop 23 ProblemDetails *details, // Polydispersity info 24 double *values, // Value and weights vector 25 double *q, // q or (qx,qy) vector 26 double *result, // returned I(q), with result[nq] = pd_weight 27 double cutoff) // polydispersity weight cutoff 28 29 The details for OpenCL and the python loop are slightly different, but these 30 data structures are common. 31 32 *nq* indicates the number of q values that will be calculated. 33 34 The *pd_start* and *pd_stop* parameters set the range of the polydispersity 35 loop to compute for the current kernel call. Give a polydispersity 36 calculation with 30 weights for length and 30 weights for radius for example, 37 there are a total of 900 calls to the form factor required to compute the 38 kernel. These might be done in chunks of 100, so the first call would start 39 at zero and stop after 100 iterations. The second call would then have to set 40 the length index to 3 and the radius index to 10 for a position of 3*30+10=100, 41 and could then proceed to position 200. This allows us to interrupt the 42 calculation in the middle of a long polydispersity loop without having to 43 do special tricks with the C code. More importantly, it stops the OpenCL 44 kernel in a reasonable time; because the GPU is used by the operating 45 system to show its windows, if a GPU kernel runs too long then it will be 46 automatically killed and no results will be returned to the caller. 47 48 The *ProblemDetails* structure is a direct map of the 49 :class:`details.CallDetails` buffer. This indicates which parameters are 50 polydisperse, and where in the values vector the values and weights can be 51 found. For each polydisperse parameter there is a parameter id, the length 52 of the polydispersity loop for that parameter, the offset of the parameter 53 values in the pd value and pd weight vectors and the 'stride' from one index 54 to the next, which is used to translate between the position in the 55 polydispersity loop and the particular parameter indices. The *num_eval* 56 field is the total size of the polydispersity loop. *num_weights* is the 57 number of elements in the pd value and pd weight vectors. *num_active* is 58 the number of non-trivial pd loops (polydisperse parameters should be ordered 59 by decreasing pd vector length, with a length of 1 meaning no polydispersity). 60 Oriented objects in 2-D need a cos(theta) spherical correction on the angular 61 variation in order to preserve the 'surface area' of the weight distribution. 62 *theta_par* is the id of the polar coordinate parameter if there is one. 63 64 65 The *values* vector consists of the fixed values for the model plus pd value 66 and pd weight vectors. There are *NUM_VALUES* fixed values for the model, 67 which includes the initial two slots for scale and background, the NUM_PARS 68 values for the kernel parameters, three slots for the applied magnetism, and 69 three slots for each of the SLD parameters for the sample magnetiziation 70 *(Mx, My, Mz)*. Sample magnetization is translated from *(M, theta, phi)* 71 to *(Mx, My, Mz)* before the kernel is called. After the fixed values comes 72 the pd value vector, with the polydispersity values for each parameter 73 stacked one after the other. The order isn't important since the location 74 for each parameter is stored in the *pd_offset* field of the *ProblemDetails* 75 structure, but the values do need to be contiguous. After *num_weights* 76 values, the pd weight vector is stored, with the same configuration as the 77 pd value vector. Note that the pd vectors can contain values that are not 78 in the polydispersity loop; this is used by :class:`mixture.MixtureKernel` 79 to make it easier to call the various component kernels. 80 81 The *q* vector contains one value for each q for *Iq* kernels, or a pair 82 of *(qx,qy)* values for each q for *Iqxy* and *Imagnetic* kernels. OpenCL 83 pads all vectors to 32 value boundaries just to be safe, including the 84 *values* vector and the the *results* vector. 85 86 The *results* vector contains one slot for each of the *nq* values, plus 87 one extra slot at the end for the current polydisperse normalization. This 88 is required when the polydispersity loop is broken across several kernel 89 calls. 90 91 *cutoff* is a importance cutoff so that points which contribute negligibly 92 to the total scattering can be skipped without calculating them. 93 94 :func:`generate.make_source` defines the following C macros: 5 95 6 96 - USE_OPENCL is defined if running in opencl 7 - KERNEL declares a function to be available externally 97 - MAX_PD is the maximum depth of the polydispersity loop [model specific] 98 - NUM_PARS is the number of parameter values in the kernel. This may be 99 more than the number of parameters if some of the parameters are vector 100 values. 101 - NUM_VALUES is the number of fixed values, which defines the offset in the 102 value list to the polydisperse value and weight vectors. 103 - NUM_MAGNETIC is the number of magnetic SLDs 104 - MAGNETIC_PARS is a comma separated list of the magnetic SLDs, indicating 105 their locations in the values vector. 106 - MAGNETIC_PAR0 to MAGNETIC_PAR2 are the first three magnetic parameter ids 107 so we can hard code the setting of magnetic values if there are only a 108 few of them. 8 109 - KERNEL_NAME is the name of the function being declared 9 - MAX_PD is the maximum depth of the polydispersity loop [model specific] 10 - NPARS is the number of parameters in the kernel 11 - PARAMETER_TABLE is the declaration of the parameters to the kernel:: 12 13 Cylinder: 110 - PARAMETER_TABLE is the declaration of the parameters to the kernel: 111 112 Cylinder:: 14 113 15 114 #define PARAMETER_TABLE \ … … 17 116 double radius; \ 18 117 double sld; \ 19 double sld_solvent 118 double sld_solvent; 20 119 21 120 Note: scale and background are never included 22 121 23 Multi-shell cylinder (10 shell max): 122 Multi-shell cylinder (10 shell max):: 24 123 25 124 #define PARAMETER_TABLE \ … … 28 127 double radius[10]; \ 29 128 double sld[10]; \ 30 double sld_solvent 31 32 - CALL_IQ(q, i, var) is the declaration of a call to the kernel: :33 34 Cylinder: 129 double sld_solvent; 130 131 - CALL_IQ(q, i, var) is the declaration of a call to the kernel: 132 133 Cylinder:: 35 134 36 135 #define CALL_IQ(q, i, var) Iq(q[i], \ … … 40 139 var.sld_solvent) 41 140 42 Multi-shell cylinder: 141 Multi-shell cylinder:: 43 142 44 143 #define CALL_IQ(q, i, var) Iq(q[i], \ … … 49 148 var.sld_solvent) 50 149 51 Cylinder2D: 150 Cylinder2D:: 52 151 53 152 #define CALL_IQ(q, i, var) Iqxy(q[2*i], q[2*i+1], \ … … 64 163 form_volume(var.length, var.radius) 65 164 66 - INVALID(var) is a test for model parameters in the correct range:: 67 68 Cylinder: 165 There is an additional macro that can be defined within the model.c file: 166 167 - INVALID(var) is a test for model parameters in the correct range: 168 169 Cylinder:: 69 170 70 171 #define INVALID(var) 0 71 172 72 BarBell: 173 BarBell:: 73 174 74 175 #define INVALID(var) (var.bell_radius < var.radius) 75 176 76 Model with complicated constraints: 177 Model with complicated constraints:: 77 178 78 179 inline bool constrained(p1, p2, p3) { return expression; } … … 86 187 calculator treats the parameter set as one long vector. 87 188 88 Let's assume we have 6 parameters in the model, with two polydisperse:: 189 Let's assume we have 8 parameters in the model, with two polydisperse. Since 190 this is a 1-D model the orientation parameters won't be used:: 89 191 90 192 0: scale {scl = constant} 91 193 1: background {bkg = constant} 92 2: length {l = vector of 30pts}93 3: radius {r = vector of 10pts}194 2: radius {r = vector of 10pts} 195 3: length {l = vector of 30pts} 94 196 4: sld {s = constant/(radius**2*length)} 95 197 5: sld_solvent {s2 = constant} 96 97 This generates the following call to the kernel (where x stands for an 98 arbitrary value that is not used by the kernel evaluator):: 99 100 NPARS = 4 // scale and background are in all models 101 problem { 102 pd_par = {3, 2, x, x} // parameters *radius* and *length* vary 103 pd_length = {30, 10, 0, 0} // *length* has more, so it is first 104 pd_offset = {10, 0, x, x} // *length* starts at index 10 in weights 198 6: theta {not used} 199 7: phi {not used} 200 201 This generates the following call to the kernel. Note that parameters 4 and 202 5 are treated as polydisperse even though they are not --- this is because 203 it is harmless to do so and simplifies the looping code:: 204 205 MAX_PD = 4 206 NUM_PARS = 6 // kernel parameters only 207 NUM_VALUES = 17 // all values, including scale and background 208 NUM_MAGNETIC = 2 // two parameters might be magnetic 209 MAGNETIC_PARS = 4, 5 // they are sld and sld_solvent 210 MAGNETIC_PAR0 = 4 // sld index 211 MAGNETIC_PAR1 = 5 // solvent index 212 213 details { 214 pd_par = {3, 2, 4, 5} // parameters *radius* and *length* vary 215 pd_length = {30, 10, 1, 1} // *length* has more, so it is first 216 pd_offset = {10, 0, 31, 32} // *length* starts at index 10 in weights 105 217 pd_stride = {1, 30, 300, 300} // cumulative product of pd length 106 pd_isvol = {True, True, x, x} // true if weight is a volume weight 107 par_offset = {2, 3, 303, 313} // parameter offsets 108 par_coord = {0, 3, 2, 1} // bitmap of parameter dependencies 109 fast_coord_index = {3, 5, x, x} // radius and sld have fast index 110 fast_coord_count = 2 // two parameters vary with *length* distribution 111 theta_var = -1 // no spherical correction 112 fast_theta = 0 // spherical correction angle is not pd 1 218 num_eval = 300 // 300 values in the polydispersity loop 219 num_weights = 42 // 42 values in the pd vector 220 num_active = 2 // only the first two pd are active 221 theta_var = 6 // spherical correction 113 222 } 114 223 115 weight = { l0, .., l29, r0, .., r9} //length comes first as the longest vec 116 pars = { scl, bkg, l0, ..., l29, r0, r1, ..., r9, 117 s[l0,r0], ... s[l0,r9], s[l1,r0], ... s[l29,r9] , s2} 118 //where s[x,y] stands for material sld, s2 = solvent sld 224 values = { scl, bkg, // universal 225 r, l, s, s2, theta, phi, // kernel pars 226 in spin, out spin, spin angle, // applied magnetism 227 mx s, my s, mz s, mx s2, my s2, mz s2, // magnetic slds 228 r0, .., r9, l0, .., l29, s, s2, // pd values 229 r0, .., r9, l0, .., l29, s, s2} // pd weights 119 230 120 231 nq = 130 121 232 q = { q0, q1, ..., q130, x, x } # pad to 8 element boundary 122 result = {r1, ..., r130, norm, vol, vol_norm, x, x, x, x, x, x, x} 123 233 result = {r1, ..., r130, pd_norm, x } 124 234 125 235 The polydisperse parameters are stored in as an array of parameter … … 136 246 We limit the number of polydisperse dimensions to MAX_PD (currently 4), 137 247 though some models may have fewer if they have fewer polydisperse 138 parameters. This cuts the size of the structure in half compared to 139 allowing a separate polydispersity for each parameter. This will 140 help a little bit for models with large numbers of parameters, such 141 as the onion model. 142 143 Parameters may be coordinated. That is, we may have the value of one 144 parameter depend on a set of other parameters, some of which may be 145 polydisperse. For example, if sld is inversely proportional to the 146 volume of a cylinder, and the length and radius are independently 147 polydisperse, then for each combination of length and radius we need a 148 separate value for the sld. The caller must provide a coordination table 149 for each parameter containing the value for each parameter given the 150 value of the polydisperse parameters v1, v2, etc. The tables for each 151 parameter are arranged contiguously in a vector, with offset[k] giving the 152 starting location of parameter k in the vector. Each parameter defines 153 par_coord[k] as a bit mask indicating which polydispersity parameters the 154 parameter depends upon. Usually this is zero, indicating that the parameter 155 is independent, but for the cylinder example given, the bits for the 156 radius and length polydispersity parameters would both be set, the result 157 being a (#radius x #length) table, or maybe a (#length x #radius) table 158 if length comes first in the polydispersity table. 159 160 NB: If we can guarantee that a compiler and OpenCL driver are available, 161 we could instead create the coordination function on the fly for each 162 parameter, saving memory and transfer time, but requiring a C compiler 163 as part of the environment. 164 165 In ordering the polydisperse parameters by decreasing length we can 166 iterate over the longest dispersion weight vector first. All parameters 167 coordinated with this weight vector (the 'fast' parameters), can be 168 updated with a simple increment to the next position in the parameter 169 value table. The indices of these parameters is stored in fast_coord_index[], 170 with fast_coord_count being the number of fast parameters. A total 171 of NPARS slots is allocated to allow for the case that all parameters 172 are coordinated with the fast index, though this will likely be mostly 173 empty. When the fast increment count reaches the end of the weight 174 vector, then the index of the second polydisperse parameter must be 175 incremented, and all of its coordinated parameters updated. Because this 176 operation is not in the inner loop, a slower algorithm can be used. 248 parameters. The main reason for the limit is to reduce code size. 249 Each additional polydisperse parameter requires a separate polydispersity 250 loop. If more than 4 levels of polydispersity are needed, then kernel_iq.c 251 and kernel_iq.cl will need to be extended. 252 253 Constraints between parameters are not supported. Instead users will 254 have to define a new model with the constraints built in by making a 255 copy of the existing model. Mac provides OpenCL and we are supplying 256 the tinycc compiler for Windows so this won't be a complete limitation, 257 but it is rather inconvenient. The process could perhaps be automated 258 so that there is no code copying required, just an alternate CALL_IQ 259 macro that implements the constraints. Think carefully about constraining 260 theta since the polar coordinates normalization is tied to this parameter. 177 261 178 262 If there is no polydispersity we pretend that it is polydisperisty with one … … 180 264 calculation in this case, depending on how much time it saves. 181 265 182 The problem details structure c anbe allocated and sent in as an integer183 array using the read-only flag. This allowsus to copy it once per fit266 The problem details structure could be allocated and sent in as an integer 267 array using the read-only flag. This would allow us to copy it once per fit 184 268 along with the weights vector, since features such as the number of 185 269 polydisperity elements per pd parameter or the coordinated won't change 186 between function evaluations. A new parameter vector is sent for 187 each I(q) evaluation. 188 189 To protect against expensive evaluations taking all the GPU resource 190 on large fits, the entire polydispersity will not be computed at once. 191 Instead, a start and stop location will be sent, indicating where in the 192 polydispersity loop the calculation should start and where it should 193 stop. We can do this for arbitrary start/stop points since we have 194 unwound the nested loop. Instead, we use the same technique as array 195 index translation, using div and mod to figure out the i,j,k,... 196 indices in the virtual nested loop. 270 between function evaluations. A new parameter vector must be sent for 271 each I(q) evaluation. This is not currently implemented, and would require 272 some resturcturing of the :class:`sasview_model.SasviewModel` interface. 197 273 198 274 The results array will be initialized to zero for polydispersity loop … … 203 279 for the GPU, but it makes the calling sequence much more manageable. 204 280 205 Scale and background cannot be coordinated with other polydisperse parameters206 207 Oriented objects in 2-D need a spherical correction on the angular variation208 in order to preserve the 'surface area' of the weight distribution.209 210 cutoff parameter limits integration area within polydispersity hypercude,211 which speeds calculations212 213 281 For accuracy we may want to introduce Kahan summation into the integration:: 214 215 282 216 283 double accumulated_error = 0.0; … … 224 291 ret += next; 225 292 #endif 293 294 This will require accumulated error for each I(q) value to be preserved 295 between kernel calls to implement this fully. The kernel_iq.c code, which 296 loops over q for each parameter set in the polydispersity loop, will need 297 also need the accumalation vector.
Note: See TracChangeset
for help on using the changeset viewer.