source: sasmodels/sasmodels/alignment.py @ 14de349

core_shell_microgelscostrafo411magnetic_modelrelease_v0.94release_v0.95ticket-1257-vesicle-productticket_1156ticket_1265_superballticket_822_more_unit_tests
Last change on this file since 14de349 was 14de349, checked in by Paul Kienzle <pkienzle@…>, 10 years ago

add autogenerated polydispersity loops

  • Property mode set to 100644
File size: 1.7 KB
Line 
1"""
2GPU data alignment.
3
4Some web sites say that maximizing performance for OpenCL code requires
5aligning data on certain memory boundaries.
6
7:func:`data_alignment` queries all devices in the OpenCL context, returning
8the most restrictive alignment.
9
10:func:`align_data` aligns an existing array.
11
12:func:`align_empty` to create a new array of the correct alignment.
13
14Note:  This code is unused. So far, tests have not demonstrated any
15improvement from forcing correct alignment.  The tests should
16be repeated with arrays forced away from the target boundaries
17to decide whether it is really required.
18"""
19
20import numpy as np
21import pyopencl as cl
22
23def data_alignment(context):
24    """
25    Return the desired byte alignment across all devices.
26    """
27    # Note: rely on the fact that alignment will be 2^k
28    return max(d.min_data_type_align_size for d in context.devices)
29
30def align_empty(shape, dtype, alignment=128):
31    size = np.prod(shape)
32    dtype = np.dtype(dtype)
33    # allocate array with extra space for alignment
34    extra = alignment//dtype.itemsize - 1
35    result = np.empty(size+extra, dtype)
36    # build a view into allocated array which starts on a boundary
37    offset = (result.ctypes.data%alignment)//dtype.itemsize
38    view = np.reshape(result[offset:offset+size], shape)
39    return view
40
41def align_data(v, dtype, alignment=128):
42    # if v is contiguous, aligned, and of the correct type then just return v
43    view = align_empty(v.shape, dtype, alignment=alignment)
44    view[:] = v
45    return view
46
47def work_group_size(context, kernel):
48    """
49    Return the desired work group size for the context.
50    """
51    max(kernel.get_work_group_info(cl.kernel_work_group_info.PREFERRED_WORK_GROUP_SIZE_MULTIPLE, d)
52        for d in context.devices)
53
54
Note: See TracBrowser for help on using the repository browser.