Changeset 8b31efa in sasmodels


Ignore:
Timestamp:
Oct 15, 2018 1:27:14 PM (5 weeks ago)
Author:
pkienzle
Branches:
beta_approx, cuda-test, py3, ticket-1015-gpu-mem-error, ticket-1157, ticket-608-user-defined-weights, ticket_1156
Children:
508475a, d5ce7fa
Parents:
4de14584
Message:

document cuda device selection; fix cuda speed issue

Files:
3 edited

Legend:

Unmodified
Added
Removed
  • doc/guide/gpu_setup.rst

    r63602b1 r8b31efa  
    9494Device Selection 
    9595================ 
     96**OpenCL drivers** 
     97 
    9698If you have multiple GPU devices you can tell the program which device to use. 
    9799By default, the program looks for one GPU and one CPU device from available 
     
    104106was used to run the model. 
    105107 
    106 **If you don't want to use OpenCL, you can set** *SAS_OPENCL=None* 
    107 **in your environment settings, and it will only use normal programs.** 
    108  
    109 If you want to use one of the other devices, you can run the following 
     108If you want to use a specific driver and devices, you can run the following 
    110109from the python console:: 
    111110 
     
    115114This will provide a menu of different OpenCL drivers available. 
    116115When one is selected, it will say "set PYOPENCL_CTX=..." 
    117 Use that value as the value of *SAS_OPENCL*. 
     116Use that value as the value of *SAS_OPENCL=driver:device*. 
     117 
     118To use the default OpenCL device (rather than CUDA or None), 
     119set *SAS_OPENCL=opencl*. 
     120 
     121In batch queues, you may need to set *XDG_CACHE_HOME=~/.cache*  
     122(Linux only) to a different directory, depending on how the filesystem  
     123is configured.  You should also set *SAS_DLL_PATH* for CPU-only modules. 
     124 
     125    -DSAS_MODELPATH=path sets directory containing custom models 
     126    -DSAS_OPENCL=vendor:device|cuda:device|none sets the target GPU device 
     127    -DXDG_CACHE_HOME=~/.cache sets the pyopencl cache root (linux only) 
     128    -DSAS_COMPILER=tinycc|msvc|mingw|unix sets the DLL compiler 
     129    -DSAS_OPENMP=1 turns on OpenMP for the DLLs 
     130    -DSAS_DLL_PATH=path sets the path to the compiled modules 
     131 
     132 
     133**CUDA drivers** 
     134 
     135If OpenCL drivers are not available on your system, but NVidia CUDA 
     136drivers are available, then set *SAS_OPENCL=cuda* or 
     137*SAS_OPENCL=cuda:n* for a particular device number *n*.  If no device 
     138number is specified, then the CUDA drivers looks for look for 
     139*CUDA_DEVICE=n* or a file ~/.cuda-device containing n for the device number. 
     140 
     141In batch queues, the SLURM command *sbatch --gres=gpu:1 ...* will set 
     142*CUDA_VISIBLE_DEVICES=n*, which ought to set the correct device 
     143number for *SAS_OPENCL=cuda*.  If not, then set 
     144*CUDA_DEVICE=$CUDA_VISIBLE_DEVICES* within the batch script.  You may 
     145need to set the CUDA cache directory to a folder accessible across the 
     146cluster with *PYCUDA_CACHE_DIR* (or *PYCUDA_DISABLE_CACHE* to disable 
     147caching), and you may need to set environment specific compiler flags 
     148with *PYCUDA_DEFAULT_NVCC_FLAGS*.  You should also set *SAS_DLL_PATH*  
     149for CPU-only modules. 
     150 
     151**No GPU support** 
     152 
     153If you don't want to use OpenCL or CUDA, you can set *SAS_OPENCL=None* 
     154in your environment settings, and it will only use normal programs. 
     155 
     156In batch queues, you may need to set *SAS_DLL_PATH* to a directory 
     157accessible on the compute node. 
     158 
    118159 
    119160Device Testing 
     
    154195*Document History* 
    155196 
    156 | 2017-09-27 Paul Kienzle 
     197| 2018-10-15 Paul Kienzle 
  • sasmodels/kernelcl.py

    rb0de252 r8b31efa  
    227227        self.context = None 
    228228        if 'SAS_OPENCL' in os.environ: 
    229             #Setting PYOPENCL_CTX as a SAS_OPENCL to create cl context 
    230             os.environ["PYOPENCL_CTX"] = os.environ["SAS_OPENCL"] 
     229            # Set the PyOpenCL environment variable PYOPENCL_CTX  
     230            # from SAS_OPENCL=driver:device.  Ignore the generic 
     231            # SAS_OPENCL=opencl, which is used to select the default  
     232            # OpenCL device.  Don't need to check for "none" or 
     233            # "cuda" since use_opencl() would return False if they 
     234            # were defined, and we wouldn't get here. 
     235            dev_str = os.environ["SAS_OPENCL"] 
     236            if dev_str and dev_str.lower() != "opencl": 
     237                os.environ["PYOPENCL_CTX"] = dev_str 
     238 
    231239        if 'PYOPENCL_CTX' in os.environ: 
    232240            self._create_some_context() 
     
    568576                current_time = time.clock() 
    569577                if current_time - last_nap > 0.5: 
    570                     time.sleep(0.05) 
     578                    time.sleep(0.001) 
    571579                    last_nap = current_time 
    572580        cl.enqueue_copy(self.queue, self.result, self.result_b) 
  • sasmodels/kernelcuda.py

    r74e9b5f r8b31efa  
    444444        self.q_input = q_input # allocated by GpuInput above 
    445445 
    446         self._need_release = [self.result_b, self.q_input] 
     446        self._need_release = [self.result_b] 
    447447        self.real = (np.float32 if dtype == generate.F32 
    448448                     else np.float64 if dtype == generate.F64 
     
    467467        # Call kernel and retrieve results 
    468468        last_nap = time.clock() 
    469         step = 1000000//self.q_input.nq + 1 
     469        step = 100000000//self.q_input.nq + 1 
    470470        #step = 1000000000 
    471471        for start in range(0, call_details.num_eval, step): 
     
    479479                current_time = time.clock() 
    480480                if current_time - last_nap > 0.5: 
    481                     time.sleep(0.05) 
     481                    time.sleep(0.001) 
    482482                    last_nap = current_time 
    483483        sync() 
     
    500500        Release resources associated with the kernel. 
    501501        """ 
    502         if self.result_b is not None: 
    503             self.result_b.free() 
    504             self.result_b = None 
     502        for p in self._need_release: 
     503            p.free() 
     504        self._need_release = [] 
    505505 
    506506    def __del__(self): 
Note: See TracChangeset for help on using the changeset viewer.