Changeset b0de252 in sasmodels for sasmodels/kernelcl.py


Ignore:
Timestamp:
Oct 12, 2018 5:31:24 PM (6 years ago)
Author:
pkienzle
Branches:
master, core_shell_microgels, magnetic_model, ticket-1257-vesicle-product, ticket_1156, ticket_1265_superball, ticket_822_more_unit_tests
Children:
74e9b5f
Parents:
47fb816
Message:

improve control over cuda context

File:
1 edited

Legend:

Unmodified
Added
Removed
  • sasmodels/kernelcl.py

    rd86f0fc rb0de252  
    11""" 
    22GPU driver for C kernels 
     3 
     4TODO: docs are out of date 
    35 
    46There should be a single GPU environment running on the system.  This 
     
    5961 
    6062 
    61 # Attempt to setup opencl. This may fail if the opencl package is not 
     63# Attempt to setup opencl. This may fail if the pyopencl package is not 
    6264# installed or if it is installed but there are no devices available. 
    6365try: 
     
    131133 
    132134def use_opencl(): 
    133     return HAVE_OPENCL and os.environ.get("SAS_OPENCL", "").lower() != "none" 
     135    env = os.environ.get("SAS_OPENCL", "").lower() 
     136    return HAVE_OPENCL and env != "none" and not env.startswith("cuda") 
    134137 
    135138ENV = None 
     
    179182        cl.kernel_work_group_info.PREFERRED_WORK_GROUP_SIZE_MULTIPLE, 
    180183        queue.device) 
    181  
    182 def _stretch_input(vector, dtype, extra=1e-3, boundary=32): 
    183     # type: (np.ndarray, np.dtype, float, int) -> np.ndarray 
    184     """ 
    185     Stretch an input vector to the correct boundary. 
    186  
    187     Performance on the kernels can drop by a factor of two or more if the 
    188     number of values to compute does not fall on a nice power of two 
    189     boundary.   The trailing additional vector elements are given a 
    190     value of *extra*, and so f(*extra*) will be computed for each of 
    191     them.  The returned array will thus be a subset of the computed array. 
    192  
    193     *boundary* should be a power of 2 which is at least 32 for good 
    194     performance on current platforms (as of Jan 2015).  It should 
    195     probably be the max of get_warp(kernel,queue) and 
    196     device.min_data_type_align_size//4. 
    197     """ 
    198     remainder = vector.size % boundary 
    199     if remainder != 0: 
    200         size = vector.size + (boundary - remainder) 
    201         vector = np.hstack((vector, [extra] * (size - vector.size))) 
    202     return np.ascontiguousarray(vector, dtype=dtype) 
    203  
    204184 
    205185def compile_model(context, source, dtype, fast=False): 
Note: See TracChangeset for help on using the changeset viewer.