Changeset b0de252 in sasmodels for sasmodels/kernelcl.py
- Timestamp:
- Oct 12, 2018 5:31:24 PM (6 years ago)
- Branches:
- master, core_shell_microgels, magnetic_model, ticket-1257-vesicle-product, ticket_1156, ticket_1265_superball, ticket_822_more_unit_tests
- Children:
- 74e9b5f
- Parents:
- 47fb816
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
sasmodels/kernelcl.py
rd86f0fc rb0de252 1 1 """ 2 2 GPU driver for C kernels 3 4 TODO: docs are out of date 3 5 4 6 There should be a single GPU environment running on the system. This … … 59 61 60 62 61 # Attempt to setup opencl. This may fail if the opencl package is not63 # Attempt to setup opencl. This may fail if the pyopencl package is not 62 64 # installed or if it is installed but there are no devices available. 63 65 try: … … 131 133 132 134 def use_opencl(): 133 return HAVE_OPENCL and os.environ.get("SAS_OPENCL", "").lower() != "none" 135 env = os.environ.get("SAS_OPENCL", "").lower() 136 return HAVE_OPENCL and env != "none" and not env.startswith("cuda") 134 137 135 138 ENV = None … … 179 182 cl.kernel_work_group_info.PREFERRED_WORK_GROUP_SIZE_MULTIPLE, 180 183 queue.device) 181 182 def _stretch_input(vector, dtype, extra=1e-3, boundary=32):183 # type: (np.ndarray, np.dtype, float, int) -> np.ndarray184 """185 Stretch an input vector to the correct boundary.186 187 Performance on the kernels can drop by a factor of two or more if the188 number of values to compute does not fall on a nice power of two189 boundary. The trailing additional vector elements are given a190 value of *extra*, and so f(*extra*) will be computed for each of191 them. The returned array will thus be a subset of the computed array.192 193 *boundary* should be a power of 2 which is at least 32 for good194 performance on current platforms (as of Jan 2015). It should195 probably be the max of get_warp(kernel,queue) and196 device.min_data_type_align_size//4.197 """198 remainder = vector.size % boundary199 if remainder != 0:200 size = vector.size + (boundary - remainder)201 vector = np.hstack((vector, [extra] * (size - vector.size)))202 return np.ascontiguousarray(vector, dtype=dtype)203 204 184 205 185 def compile_model(context, source, dtype, fast=False):
Note: See TracChangeset
for help on using the changeset viewer.