id summary reporter owner description type status priority milestone component resolution keywords cc workpackage 1230 parallelize polydispersity loops pkienzle "There is unexploited parallelism in the polydiserpsity calculation. This limits the speedup available on high end graphics cards which are operating on 1-D dataset, since the current implementation is limited to only using one processor per q value. A card with 5000 separate processors will be mostly idle. This is particularly important for mcSAS, which needs to evaluate {{{ I(q_j) = sum_{i=1}^m w_i P(q_j, r_i) }}} where P(q,r) is the sphere form, q is length n and the distribution w,r is length m, for a total of m x n total evaluations. The current sasmodels code does this in parallel over q_j, with the sum over w_i, r_i running in serial. Instead, we could break up the loop into non-intersecting stripes, first computing {{{ I_k(q_j) = sum_{i=1}^{m/p} w_{k+p*i} P(q_j, r_{k+p*i}) }}} with the number of stripes p, then we can keep n x p processors busy at the same time, and the cost of n x p intermediate results. With 256 q points, the 5120 processors on an nvidia V100 can compute 20 batches in parallel simultaneously, using minimal extra memory (256 x 20 x 8 bytes). May be faster to use p=16 so that memory accesses align better. Next turn the problem on its side, compute the following: {{{ I(q_j) = sum_k I_k(q_j) }}} with one processor for each q value. Can perhaps do better by computing pairs in parallel, then pairs of pairs, requiring four cycles for p=16 rather than 16, though the overhead of managing this may outweigh any benefit. Looking at the graphs on page 5 of the following: https://www.cl.cam.ac.uk/teaching/1617/AdvGraph/07_OpenCL.pdf I'm guessing the 4k reductions is too small to warrant a fast algorithm. The existing kernel_iq.c would benefit from this, at least for the inner polydispersity loop, if you are willing to tackle it. Generating a specialized kernel for the particular problem of a distribution of spheres in mcSAS will probably be easier. " defect new major SasView 4.3.0 SasView McSAS Integration Project