Opened 10 years ago
Closed 8 years ago
#363 closed defect (fixed)
break OpenCL calculations into smaller chunks.
Reported by: | pkienzle | Owned by: | pkienzle |
---|---|---|---|
Priority: | major | Milestone: | SasView 4.0.0 |
Component: | SasView | Keywords: | |
Cc: | Work Package: | SasModels Redesign |
Description
OpenCL fails if the kernel takes too long to execute. Need to restructure the kernel so that it does the polydispersity calculation in chunks.
The technique will be similar to what is used in the python kernel: increment through the polydispersity hypercube item by item, and determine the cube coordinates with combinations of div and mod. Add an offset and length parameter to the kernel call, and put a for loop around the call stepping offset by N with a length of N until you have traversed the entire set of coordinates. Instead of setting ret[i] each time, increment it with the partial result. If offset starts with 0, then set ret[i]=0 before the loop. Each kernel call should be queued to wait_for the previous kernel call so that you don't get collisions when updating ret[i]. Obviously, offset 0 needs to be called first to initialize the result.
Probably an afternoon of programming.
Attachments (1)
Change History (11)
comment:1 Changed 10 years ago by butler
- Milestone changed from SasView 3.1 to SasView Next Release +1
comment:2 Changed 9 years ago by butler
- Milestone changed from SasView Next Release +1 to SasView 4.0.0
comment:3 Changed 9 years ago by ajj
- Owner set to pkienzle
- Status changed from new to assigned
comment:4 Changed 9 years ago by butler
- Priority changed from critical to blocker
part of requirements for 4.0 so make blocker
comment:5 Changed 8 years ago by pkienzle
- Resolution set to fixed
- Status changed from assigned to closed
comment:6 Changed 8 years ago by pkienzle
- Priority changed from blocker to major
- Resolution fixed deleted
- Status changed from closed to reopened
Despite feeding only 100 PD steps at a time in sasmodels/kernelcl, the 2-D barbell with 35x35 polydispersity on radius and length caused my macbook pro to reboot.
sasmodels/kernelcl.py(566) Maybe we could add last_call.wait() at the end of the loop in GpuKernel.call()?
comment:7 Changed 8 years ago by pkienzle
The above patch sleeps for 50ms every 500ms, leaving time for OS X to deal with other window events.
Checking every 50 steps introduces a 10% performance penalty for the 1D cylinder model with 35x35 polydispersity on my machine (107 ms to 118 ms); when running barbell 2D with this step size, performance is very laggy. Dropping step to 10 rather than 50 increases the 1D performance penalty to 50% (107 ms to 168 ms), but makes other windows much more responsive.
Would you rather wait 50% less for a fit rather than having responsive behaviour on a model that is otherwise unusable? The top end gaming card calculates the full model in 500 ms with no problems, even without the built-in naps, so the advice to those doing serious 2-D fitting is to get a better machine.
We could probably make steps a function self.q_input.nq and avoid this problem. Even better would be to have an estimate of the compute cost of the kernel, so we could tune this value to be close to 100ms without having to check the clock each cycle.
comment:8 Changed 8 years ago by pkienzle
Indeed, setting
step = 1000000self.q_input.nq + 1
gives the best of both worlds. The patch needs to be updated.
comment:9 Changed 8 years ago by pkienzle
Turned into a pull request:
comment:10 Changed 8 years ago by ajj
- Resolution set to fixed
- Status changed from reopened to closed
This is part of sasmodels project so should move to that release