id,summary,reporter,owner,description,type,status,priority,milestone,component,resolution,keywords,cc,workpackage 1230,parallelize polydispersity loops,pkienzle,,"There is unexploited parallelism in the polydiserpsity calculation. This limits the speedup available on high end graphics cards which are operating on 1-D dataset, since the current implementation is limited to only using one processor per q value. A card with 5000 separate processors will be mostly idle. This is particularly important for mcSAS, which needs to evaluate {{{ I(q_j) = sum_{i=1}^m w_i P(q_j, r_i) }}} where P(q,r) is the sphere form, q is length n and the distribution w,r is length m, for a total of m x n total evaluations. The current sasmodels code does this in parallel over q_j, with the sum over w_i, r_i running in serial. Instead, we could break up the loop into non-intersecting stripes, first computing {{{ I_k(q_j) = sum_{i=1}^{m/p} w_{k+p*i} P(q_j, r_{k+p*i}) }}} with the number of stripes p, then we can keep n x p processors busy at the same time, and the cost of n x p intermediate results. With 256 q points, the 5120 processors on an nvidia V100 can compute 20 batches in parallel simultaneously, using minimal extra memory (256 x 20 x 8 bytes). May be faster to use p=16 so that memory accesses align better. Next turn the problem on its side, compute the following: {{{ I(q_j) = sum_k I_k(q_j) }}} with one processor for each q value. Can perhaps do better by computing pairs in parallel, then pairs of pairs, requiring four cycles for p=16 rather than 16, though the overhead of managing this may outweigh any benefit. Looking at the graphs on page 5 of the following: https://www.cl.cam.ac.uk/teaching/1617/AdvGraph/07_OpenCL.pdf I'm guessing the 4k reductions is too small to warrant a fast algorithm. The existing kernel_iq.c would benefit from this, at least for the inner polydispersity loop, if you are willing to tackle it. Generating a specialized kernel for the particular problem of a distribution of spheres in mcSAS will probably be easier. ",defect,new,major,SasView 4.3.0,SasView,,,,McSAS Integration Project