Opened 4 years ago

Closed 2 years ago

#363 closed defect (fixed)

break OpenCL calculations into smaller chunks.

Reported by: pkienzle Owned by: pkienzle
Priority: major Milestone: SasView 4.0.0
Component: SasView Keywords:
Cc: Work Package: SasModels Redesign

Description

OpenCL fails if the kernel takes too long to execute. Need to restructure the kernel so that it does the polydispersity calculation in chunks.

The technique will be similar to what is used in the python kernel: increment through the polydispersity hypercube item by item, and determine the cube coordinates with combinations of div and mod. Add an offset and length parameter to the kernel call, and put a for loop around the call stepping offset by N with a length of N until you have traversed the entire set of coordinates. Instead of setting ret[i] each time, increment it with the partial result. If offset starts with 0, then set ret[i]=0 before the loop. Each kernel call should be queued to wait_for the previous kernel call so that you don't get collisions when updating ret[i]. Obviously, offset 0 needs to be called first to initialize the result.

Probably an afternoon of programming.

Attachments (1)

large_kernel.patch (1.5 KB) - added by pkienzle 2 years ago.
possible fix, but with a 10% performance penalty

Download all attachments as: .zip

Change History (11)

comment:1 Changed 4 years ago by butler

  • Milestone changed from SasView 3.1 to SasView Next Release +1

This is part of sasmodels project so should move to that release

comment:2 Changed 3 years ago by butler

  • Milestone changed from SasView Next Release +1 to SasView 4.0.0

comment:3 Changed 3 years ago by ajj

  • Owner set to pkienzle
  • Status changed from new to assigned

comment:4 Changed 3 years ago by butler

  • Priority changed from critical to blocker

part of requirements for 4.0 so make blocker

comment:5 Changed 2 years ago by pkienzle

  • Resolution set to fixed
  • Status changed from assigned to closed

comment:6 Changed 2 years ago by pkienzle

  • Priority changed from blocker to major
  • Resolution fixed deleted
  • Status changed from closed to reopened

Despite feeding only 100 PD steps at a time in sasmodels/kernelcl, the 2-D barbell with 35x35 polydispersity on radius and length caused my macbook pro to reboot.

sasmodels/kernelcl.py(566) Maybe we could add last_call.wait() at the end of the loop in GpuKernel.call()?

Changed 2 years ago by pkienzle

possible fix, but with a 10% performance penalty

comment:7 Changed 2 years ago by pkienzle

The above patch sleeps for 50ms every 500ms, leaving time for OS X to deal with other window events.

Checking every 50 steps introduces a 10% performance penalty for the 1D cylinder model with 35x35 polydispersity on my machine (107 ms to 118 ms); when running barbell 2D with this step size, performance is very laggy. Dropping step to 10 rather than 50 increases the 1D performance penalty to 50% (107 ms to 168 ms), but makes other windows much more responsive.

Would you rather wait 50% less for a fit rather than having responsive behaviour on a model that is otherwise unusable? The top end gaming card calculates the full model in 500 ms with no problems, even without the built-in naps, so the advice to those doing serious 2-D fitting is to get a better machine.

We could probably make steps a function self.q_input.nq and avoid this problem. Even better would be to have an estimate of the compute cost of the kernel, so we could tune this value to be close to 100ms without having to check the clock each cycle.

comment:8 Changed 2 years ago by pkienzle

Indeed, setting

    step = 1000000//self.q_input.nq + 1

gives the best of both worlds. The patch needs to be updated.

Last edited 2 years ago by pkienzle (previous) (diff)

comment:9 Changed 2 years ago by pkienzle

comment:10 Changed 2 years ago by ajj

  • Resolution set to fixed
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.