← Previous Changeset
Next Changeset →

Changeset 599993b9 in sasmodels

Timestamp:

Oct 30, 2018 10:51:20 AM (6 years ago)

Author:

Paul Kienzle <pkienzle@…>

Branches:

master, core_shell_microgels, magnetic_model, ticket-1257-vesicle-product, ticket_1156, ticket_1265_superball, ticket_822_more_unit_tests

Children:

Parents:

2a12d8d8 (diff), c6084f1 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge branch 'master' into ticket-1015-gpu-mem-error

Files:

: 1 added
: 7 edited

doc/guide/plugin.rst (modified) (6 diffs)
explore/precision.py (modified) (10 diffs)
sasmodels/kernelpy.py (modified) (1 diff)
sasmodels/model_test.py (modified) (8 diffs)
sasmodels/models/lib/sas_gammainc.c (added)
sasmodels/sasview_model.py (modified) (1 diff)
sasmodels/special.py (modified) (2 diffs)
sasmodels/kernelcl.py (modified) (15 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/guide/plugin.rst

-                      r2015f02
+                      r57c609b
         def random():
         ...
 This function provides a model-specific random parameter set which shows model
 features in the USANS to SANS range.  For example, core-shell sphere sets the
 outer radius of the sphere logarithmically in `[20, 20,000]`, which sets the Q
 value for the transition from flat to falling.  It then uses a beta distribution
 to set the percentage of the shape which is shell, giving a preference for very
 thin or very thick shells (but never 0% or 100%).  Using `-sets=10` in sascomp
 should show a reasonable variety of curves over the default sascomp q range.
 The parameter set is returned as a dictionary of `{parameter: value, ...}`.
 Any model parameters not included in the dictionary will default according to
+This function provides a model-specific random parameter set which shows model
+features in the USANS to SANS range.  For example, core-shell sphere sets the
+outer radius of the sphere logarithmically in `[20, 20,000]`, which sets the Q
+value for the transition from flat to falling.  It then uses a beta distribution
+to set the percentage of the shape which is shell, giving a preference for very
+thin or very thick shells (but never 0% or 100%).  Using `-sets=10` in sascomp
+should show a reasonable variety of curves over the default sascomp q range.
+The parameter set is returned as a dictionary of `{parameter: value, ...}`.
+Any model parameters not included in the dictionary will default according to
 the code in the `_randomize_one()` function from sasmodels/compare.py.
 …
     erf, erfc, tgamma, lgamma:  **do not use**
         Special functions that should be part of the standard, but are missing
         or inaccurate on some platforms. Use sas_erf, sas_erfc and sas_gamma
         instead (see below). Note: lgamma(x) has not yet been tested.
+        or inaccurate on some platforms. Use sas_erf, sas_erfc, sas_gamma
+        and sas_lgamma instead (see below).
 Some non-standard constants and functions are also provided:
 …
         Gamma function sas_gamma\ $(x) = \Gamma(x)$.
         The standard math function, tgamma(x) is unstable for $x < 1$
+        The standard math function, tgamma(x), is unstable for $x < 1$
         on some platforms.
         :code:`source = ["lib/sas_gamma.c", ...]`
         (`sas_gamma.c <https://github.com/SasView/sasmodels/tree/master/sasmodels/models/lib/sas_gamma.c>`_)
+    sas_gammaln(x):
+        log gamma function sas_gammaln\ $(x) = \log \Gamma(|x|)$.
+        The standard math function, lgamma(x), is incorrect for single
+        precision on some platforms.
+        :code:`source = ["lib/sas_gammainc.c", ...]`
+        (`sas_gammainc.c <https://github.com/SasView/sasmodels/tree/master/sasmodels/models/lib/sas_gammainc.c>`_)
+    sas_gammainc(a, x), sas_gammaincc(a, x):
+        Incomplete gamma function
+        sas_gammainc\ $(a, x) = \int_0^x t^{a-1}e^{-t}\,dt / \Gamma(a)$
+        and complementary incomplete gamma function
+        sas_gammaincc\ $(a, x) = \int_x^\infty t^{a-1}e^{-t}\,dt / \Gamma(a)$
+        :code:`source = ["lib/sas_gammainc.c", ...]`
+        (`sas_gammainc.c <https://github.com/SasView/sasmodels/tree/master/sasmodels/models/lib/sas_gammainc.c>`_)
     sas_erf(x), sas_erfc(x):
 …
         If $n$ = 0 or 1, it uses sas_J0($x$) or sas_J1($x$), respectively.
+        Warning: JN(n,x) can be very inaccurate (0.1%) for x not in [0.1, 100].
         The standard math function jn(n, x) is not available on all platforms.
 …
         Sine integral Si\ $(x) = \int_0^x \tfrac{\sin t}{t}\,dt$.
+        Warning: Si(x) can be very inaccurate (0.1%) for x in [0.1, 100].
         This function uses Taylor series for small and large arguments:
         For large arguments,
+        For large arguments use the following Taylor series,
         .. math::
 …
              - \frac{\sin(x)}{x}\left(\frac{1}{x} - \frac{3!}{x^3} + \frac{5!}{x^5} - \frac{7!}{x^7}\right)
         For small arguments,
+        For small arguments ,
         .. math::

explore/precision.py

-                      r2a7e20e
+                      rfba9ca0
             neg:    [-100,100]
+        For arbitrary range use "start:stop:steps:scale" where scale is
+        one of log, lin, or linear.
         *diff* is "relative", "absolute" or "none"
 …
         linear = not xrange.startswith("log")
         if xrange == "zoom":
             lin_min, lin_max, lin_steps = 1000, 1010, 2000
+            start, stop, steps = 1000, 1010, 2000
         elif xrange == "neg":
             lin_min, lin_max, lin_steps = -100.1, 100.1, 2000
+            start, stop, steps = -100.1, 100.1, 2000
         elif xrange == "linear":
             lin_min, lin_max, lin_steps = 1, 1000, 2000
             lin_min, lin_max, lin_steps = 0.001, 2, 2000
+            start, stop, steps = 1, 1000, 2000
+            start, stop, steps = 0.001, 2, 2000
         elif xrange == "log":
             log_min, log_max, log_steps = -3, 5, 400
+            start, stop, steps = -3, 5, 400
         elif xrange == "logq":
+            log_min, log_max, log_steps = -4, 1, 400
+            start, stop, steps = -4, 1, 400
+        elif ':' in xrange:
+            parts = xrange.split(':')
+            linear = parts[3] != "log" if len(parts) == 4 else True
+            steps = int(parts[2]) if len(parts) > 2 else 400
+            start = float(parts[0])
+            stop = float(parts[1])
         else:
             raise ValueError("unknown range "+xrange)
 …
             # value to x in the given precision.
             if linear:
                 lin_min = max(lin_min, self.limits[0])
                 lin_max = min(lin_max, self.limits[1])
                 qrf = np.linspace(lin_min, lin_max, lin_steps, dtype='single')
                 #qrf = np.linspace(lin_min, lin_max, lin_steps, dtype='double')
+                start = max(start, self.limits[0])
+                stop = min(stop, self.limits[1])
+                qrf = np.linspace(start, stop, steps, dtype='single')
+                #qrf = np.linspace(start, stop, steps, dtype='double')
                 qr = [mp.mpf(float(v)) for v in qrf]
                 #qr = mp.linspace(lin_min, lin_max, lin_steps)
+                #qr = mp.linspace(start, stop, steps)
             else:
                 log_min = np.log10(max(10**log_min, self.limits[0]))
                 log_max = np.log10(min(10**log_max, self.limits[1]))
                 qrf = np.logspace(log_min, log_max, log_steps, dtype='single')
                 #qrf = np.logspace(log_min, log_max, log_steps, dtype='double')
+                start = np.log10(max(10**start, self.limits[0]))
+                stop = np.log10(min(10**stop, self.limits[1]))
+                qrf = np.logspace(start, stop, steps, dtype='single')
+                #qrf = np.logspace(start, stop, steps, dtype='double')
                 qr = [mp.mpf(float(v)) for v in qrf]
                 #qr = [10**v for v in mp.linspace(log_min, log_max, log_steps)]
+                #qr = [10**v for v in mp.linspace(start, stop, steps)]
         target = self.call_mpmath(qr, bits=500)
 …
     """
     if diff == "relative":
         err = np.array([abs((t-a)/t) for t, a in zip(target, actual)], 'd')
+        err = np.array([(abs((t-a)/t) if t != 0 else a) for t, a in zip(target, actual)], 'd')
         #err = np.clip(err, 0, 1)
         pylab.loglog(x, err, '-', label=label)
 …
     return model_info
+# Hack to allow second parameter A in two parameter functions
+A = 1
+def parse_extra_pars():
+    global A
+    A_str = str(A)
+    pop = []
+    for k, v in enumerate(sys.argv[1:]):
+        if v.startswith("A="):
+            A_str = v[2:]
+            pop.append(k+1)
+    if pop:
+        sys.argv = [v for k, v in enumerate(sys.argv) if k not in pop]
+        A = float(A_str)
+parse_extra_pars()
 # =============== FUNCTION DEFINITIONS ================
 …
     ocl_function=make_ocl("return sas_gamma(q);", "sas_gamma", ["lib/sas_gamma.c"]),
     limits=(-3.1, 10),
+)
+add_function(
+    name="gammaln(x)",
+    mp_function=mp.loggamma,
+    np_function=scipy.special.gammaln,
+    ocl_function=make_ocl("return sas_gammaln(q);", "sas_gammaln", ["lib/sas_gammainc.c"]),
+    #ocl_function=make_ocl("return lgamma(q);", "sas_gammaln"),
+)
+add_function(
+    name="gammainc(x)",
+    mp_function=lambda x, a=A: mp.gammainc(a, a=0, b=x)/mp.gamma(a),
+    np_function=lambda x, a=A: scipy.special.gammainc(a, x),
+    ocl_function=make_ocl("return sas_gammainc(%.15g,q);"%A, "sas_gammainc", ["lib/sas_gammainc.c"]),
+)
+add_function(
+    name="gammaincc(x)",
+    mp_function=lambda x, a=A: mp.gammainc(a, a=x, b=mp.inf)/mp.gamma(a),
+    np_function=lambda x, a=A: scipy.special.gammaincc(a, x),
+    ocl_function=make_ocl("return sas_gammaincc(%.15g,q);"%A, "sas_gammaincc", ["lib/sas_gammainc.c"]),
+)
 add_function(
 …
 lanczos_gamma = """\
     const double coeff[] = {
 .18009172947146,     -86.50532032941677,
 .01409824083091,     -1.231739572450155,
+.18009172947146, -86.50532032941677,
+.01409824083091, -1.231739572450155,
 .1208650973866179e-2,-0.5395239384953e-5
             };
 …
 """
 add_function(
     name="log gamma(x)",
+    name="loggamma(x)",
     mp_function=mp.loggamma,
     np_function=scipy.special.gammaln,
 …
 ALL_FUNCTIONS = set(FUNCTIONS.keys())
 ALL_FUNCTIONS.discard("loggamma")  # OCL version not ready yet
+ALL_FUNCTIONS.discard("loggamma")  # use cephes-based gammaln instead
 ALL_FUNCTIONS.discard("3j1/x:taylor")
 ALL_FUNCTIONS.discard("3j1/x:trig")
 …
     -r indicates that the relative error should be plotted (default),
     -x<range> indicates the steps in x, where <range> is one of the following
+      log indicates log stepping in [10^-3, 10^5] (default)
+      logq indicates log stepping in [10^-4, 10^1]
+      linear indicates linear stepping in [1, 1000]
+      zoom indicates linear stepping in [1000, 1010]
+      neg indicates linear stepping in [-100.1, 100.1]
+and name is "all" or one of:
+        log indicates log stepping in [10^-3, 10^5] (default)
+        logq indicates log stepping in [10^-4, 10^1]
+        linear indicates linear stepping in [1, 1000]
+        zoom indicates linear stepping in [1000, 1010]
+        neg indicates linear stepping in [-100.1, 100.1]
+        start:stop:n[:stepping] indicates an n-step plot in [start, stop]
+            or [10^start, 10^stop] if stepping is "log" (default n=400)
+Some functions (notably gammainc/gammaincc) have an additional parameter A
+which can be set from the command line as A=value.  Default is A=1.
+Name is one of:
     """+names)
     sys.exit(1)

sasmodels/kernelpy.py

r91bd550	r12eec1e
37	37	self.info = model_info
38	38	self.dtype = np.dtype('d')
	39	logger.info("make python model " + self.info.name)
39	40
40	41	def make_kernel(self, q_vectors):

sasmodels/model_test.py

-                      r012cd34
+                      r12eec1e
 import sys
 import unittest
+import traceback
 try:
 …
 # pylint: enable=unused-import
 def make_suite(loaders, models):
     # type: (List[str], List[str]) -> unittest.TestSuite
 …
     *models* is the list of models to test, or *["all"]* to test all models.
     """
-    ModelTestCase = _hide_model_case_from_nose()
     suite = unittest.TestSuite()
 …
         skip = []
     for model_name in models:
+        if model_name in skip:
+            continue
+        model_info = load_model_info(model_name)
+        #print('------')
+        #print('found tests in', model_name)
+        #print('------')
+        # if ispy then use the dll loader to call pykernel
+        # don't try to call cl kernel since it will not be
+        # available in some environmentes.
+        is_py = callable(model_info.Iq)
+        # Some OpenCL drivers seem to be flaky, and are not producing the
+        # expected result.  Since we don't have known test values yet for
+        # all of our models, we are instead going to compare the results
+        # for the 'smoke test' (that is, evaluation at q=0.1 for the default
+        # parameters just to see that the model runs to completion) between
+        # the OpenCL and the DLL.  To do this, we define a 'stash' which is
+        # shared between OpenCL and DLL tests.  This is just a list.  If the
+        # list is empty (which it will be when DLL runs, if the DLL runs
+        # first), then the results are appended to the list.  If the list
+        # is not empty (which it will be when OpenCL runs second), the results
+        # are compared to the results stored in the first element of the list.
+        # This is a horrible stateful hack which only makes sense because the
+        # test suite is thrown away after being run once.
+        stash = []
+        if is_py:  # kernel implemented in python
+            test_name = "%s-python"%model_name
+            test_method_name = "test_%s_python" % model_info.id
+        if model_name not in skip:
+            model_info = load_model_info(model_name)
+            _add_model_to_suite(loaders, suite, model_info)
+    return suite
+def _add_model_to_suite(loaders, suite, model_info):
+    ModelTestCase = _hide_model_case_from_nose()
+    #print('------')
+    #print('found tests in', model_name)
+    #print('------')
+    # if ispy then use the dll loader to call pykernel
+    # don't try to call cl kernel since it will not be
+    # available in some environmentes.
+    is_py = callable(model_info.Iq)
+    # Some OpenCL drivers seem to be flaky, and are not producing the
+    # expected result.  Since we don't have known test values yet for
+    # all of our models, we are instead going to compare the results
+    # for the 'smoke test' (that is, evaluation at q=0.1 for the default
+    # parameters just to see that the model runs to completion) between
+    # the OpenCL and the DLL.  To do this, we define a 'stash' which is
+    # shared between OpenCL and DLL tests.  This is just a list.  If the
+    # list is empty (which it will be when DLL runs, if the DLL runs
+    # first), then the results are appended to the list.  If the list
+    # is not empty (which it will be when OpenCL runs second), the results
+    # are compared to the results stored in the first element of the list.
+    # This is a horrible stateful hack which only makes sense because the
+    # test suite is thrown away after being run once.
+    stash = []
+    if is_py:  # kernel implemented in python
+        test_name = "%s-python"%model_info.name
+        test_method_name = "test_%s_python" % model_info.id
+        test = ModelTestCase(test_name, model_info,
+                                test_method_name,
+                                platform="dll",  # so that
+                                dtype="double",
+                                stash=stash)
+        suite.addTest(test)
+    else:   # kernel implemented in C
+        # test using dll if desired
+        if 'dll' in loaders or not use_opencl():
+            test_name = "%s-dll"%model_info.name
+            test_method_name = "test_%s_dll" % model_info.id
             test = ModelTestCase(test_name, model_info,
                                  test_method_name,
                                  platform="dll",  # so that
                                  dtype="double",
                                  stash=stash)
+                                    test_method_name,
+                                    platform="dll",
+                                    dtype="double",
+                                    stash=stash)
             suite.addTest(test)
+        else:   # kernel implemented in C
+            # test using dll if desired
+            if 'dll' in loaders or not use_opencl():
+                test_name = "%s-dll"%model_name
+                test_method_name = "test_%s_dll" % model_info.id
+                test = ModelTestCase(test_name, model_info,
+                                     test_method_name,
+                                     platform="dll",
+                                     dtype="double",
+                                     stash=stash)
+                suite.addTest(test)
+            # test using opencl if desired and available
+            if 'opencl' in loaders and use_opencl():
+                test_name = "%s-opencl"%model_name
+                test_method_name = "test_%s_opencl" % model_info.id
+                # Using dtype=None so that the models that are only
+                # correct for double precision are not tested using
+                # single precision.  The choice is determined by the
+                # presence of *single=False* in the model file.
+                test = ModelTestCase(test_name, model_info,
+                                     test_method_name,
+                                     platform="ocl", dtype=None,
+                                     stash=stash)
+                #print("defining", test_name)
+                suite.addTest(test)
+    return suite
+        # test using opencl if desired and available
+        if 'opencl' in loaders and use_opencl():
+            test_name = "%s-opencl"%model_info.name
+            test_method_name = "test_%s_opencl" % model_info.id
+            # Using dtype=None so that the models that are only
+            # correct for double precision are not tested using
+            # single precision.  The choice is determined by the
+            # presence of *single=False* in the model file.
+            test = ModelTestCase(test_name, model_info,
+                                    test_method_name,
+                                    platform="ocl", dtype=None,
+                                    stash=stash)
+            #print("defining", test_name)
+            suite.addTest(test)
 def _hide_model_case_from_nose():
 …
     return abs(target-actual)/shift < 1.5*10**-digits
+def run_one(model):
+    # type: (str) -> str
+    """
+    Run the tests for a single model, printing the results to stdout.
+    *model* can by a python file, which is handy for checking user defined
+    plugin models.
+# CRUFT: old interface; should be deprecated and removed
+def run_one(model_name):
+    # msg = "use check_model(model_info) rather than run_one(model_name)"
+    # warnings.warn(msg, category=DeprecationWarning, stacklevel=2)
+    try:
+        model_info = load_model_info(model_name)
+    except Exception:
+        output = traceback.format_exc()
+        return output
+    success, output = check_model(model_info)
+    return output
+def check_model(model_info):
+    # type: (ModelInfo) -> str
+    """
+    Run the tests for a single model, capturing the output.
+    Returns success status and the output string.
     """
     # Note that running main() directly did not work from within the
 …
     # Build a test suite containing just the model
     loaders = ['opencl'] if use_opencl() else ['dll']
+    models = [model]
+    try:
+        suite = make_suite(loaders, models)
+    except Exception:
+        import traceback
+        stream.writeln(traceback.format_exc())
+        return
+    suite = unittest.TestSuite()
+    _add_model_to_suite(loaders, suite, model_info)
     # Warn if there are no user defined tests.
 …
     for test in suite:
         if not test.info.tests:
             stream.writeln("Note: %s has no user defined tests."%model)
+            stream.writeln("Note: %s has no user defined tests."%model_info.name)
         break
     else:
 …
     output = stream.getvalue()
     stream.close()
     return output
+    return result.wasSuccessful(), output

sasmodels/sasview_model.py

-                      rbd547d0
+                      rce1eed5
             return value, [value], [1.0]
+    @classmethod
+    def runTests(cls):
+        """
+        Run any tests built into the model and captures the test output.
+        Returns success flag and output
+        """
+        from .model_test import check_model
+        return check_model(cls._model_info)
 def test_cylinder():
     # type: () -> float

sasmodels/special.py

-                      rdf69efa
+                      rfba9ca0
         The standard math function, tgamma(x) is unstable for $x < 1$
         on some platforms.
+    sas_gammaln(x):
+        log gamma function sas_gammaln\ $(x) = \log \Gamma(|x|)$.
+        The standard math function, lgamma(x), is incorrect for single
+        precision on some platforms.
+    sas_gammainc(a, x), sas_gammaincc(a, x):
+        Incomplete gamma function
+        sas_gammainc\ $(a, x) = \int_0^x t^{a-1}e^{-t}\,dt / \Gamma(a)$
+        and complementary incomplete gamma function
+        sas_gammaincc\ $(a, x) = \int_x^\infty t^{a-1}e^{-t}\,dt / \Gamma(a)$
     sas_erf(x), sas_erfc(x):
 …
 from numpy import pi, nan, inf
 from scipy.special import gamma as sas_gamma
+from scipy.special import gammaln as sas_gammaln
+from scipy.special import gammainc as sas_gammainc
+from scipy.special import gammaincc as sas_gammaincc
 from scipy.special import erf as sas_erf
 from scipy.special import erfc as sas_erfc

sasmodels/kernelcl.py

-                      rd86f0fc
+                      r95f62aa
 from . import generate
+from .generate import F32, F64
 from .kernel import KernelModel, Kernel
 …
     Return true if device supports the requested precision.
     """
     if dtype == generate.F32:
+    if dtype == F32:
         return True
     elif dtype == generate.F64:
 …
     """
     GPU context, with possibly many devices, and one queue per device.
+    Because the environment can be reset during a live program (e.g., if the
+    user changes the active GPU device in the GUI), everything associated
+    with the device context must be cached in the environment and recreated
+    if the environment changes.  The *cache* attribute is a simple dictionary
+    which holds keys and references to objects, such as compiled kernels and
+    allocated buffers.  The running program should check in the cache for
+    long lived objects and create them if they are not there.  The program
+    should not hold onto cached objects, but instead only keep them active
+    for the duration of a function call.  When the environment is destroyed
+    then the *release* method for each active cache item is called before
+    the environment is freed.  This means that each cl buffer should be
+    in its own cache entry.
     """
     def __init__(self):
         # type: () -> None
         # find gpu context
+        #self.context = cl.create_some_context()
+        self.context = None
+        if 'SAS_OPENCL' in os.environ:
+            #Setting PYOPENCL_CTX as a SAS_OPENCL to create cl context
+            os.environ["PYOPENCL_CTX"] = os.environ["SAS_OPENCL"]
+        if 'PYOPENCL_CTX' in os.environ:
+            self._create_some_context()
+        if not self.context:
+            self.context = _get_default_context()
+        context_list = _create_some_context()
+        # Find a context for F32 and for F64 (maybe the same one).
+        # F16 isn't good enough.
+        self.context = {}
+        for dtype in (F32, F64):
+            for context in context_list:
+                if has_type(context.devices[0], dtype):
+                    self.context[dtype] = context
+                    break
+            else:
+                self.context[dtype] = None
+        # Build a queue for each context
+        self.queue = {}
+        context = self.context[F32]
+        self.queue[F32] = cl.CommandQueue(context, context.devices[0])
+        if self.context[F64] == self.context[F32]:
+            self.queue[F64] = self.queue[F32]
+        else:
+            context = self.context[F64]
+            self.queue[F64] = cl.CommandQueue(context, context.devices[0])
         # Byte boundary for data alignment
         #self.data_boundary = max(d.min_data_type_align_size
         #                         for d in self.context.devices)
+        self.queues = [cl.CommandQueue(context, context.devices[0])
                        for context in self.context]
+        #self.data_boundary = max(context.devices[0].min_data_type_align_size
+        #                         for context in self.context.values())
+        # Cache for compiled programs, and for items in context
         self.compiled = {}
+        self.cache = {}
     def has_type(self, dtype):
 …
         Return True if all devices support a given type.
         """
+        return any(has_type(d, dtype)
+                   for context in self.context
+                   for d in context.devices)
+    def get_queue(self, dtype):
+        # type: (np.dtype) -> cl.CommandQueue
+        """
+        Return a command queue for the kernels of type dtype.
+        """
+        for context, queue in zip(self.context, self.queues):
+            if all(has_type(d, dtype) for d in context.devices):
+                return queue
+    def get_context(self, dtype):
+        # type: (np.dtype) -> cl.Context
+        """
+        Return a OpenCL context for the kernels of type dtype.
+        """
+        for context in self.context:
+            if all(has_type(d, dtype) for d in context.devices):
+                return context
+    def _create_some_context(self):
+        # type: () -> cl.Context
+        """
+        Protected call to cl.create_some_context without interactivity.  Use
+        this if SAS_OPENCL is set in the environment.  Sets the *context*
+        attribute.
+        """
+        try:
+            self.context = [cl.create_some_context(interactive=False)]
+        except Exception as exc:
+            warnings.warn(str(exc))
+            warnings.warn("pyopencl.create_some_context() failed")
+            warnings.warn("the environment variable 'SAS_OPENCL' might not be set correctly")
+        return self.context.get(dtype, None) is not None
     def compile_program(self, name, source, dtype, fast, timestamp):
 …
             del self.compiled[key]
         if key not in self.compiled:
             context = self.get_context(dtype)
+            context = self.context[dtype]
             logging.info("building %s for OpenCL %s", key,
                          context.devices[0].name.strip())
             program = compile_model(self.get_context(dtype),
+            program = compile_model(self.context[dtype],
                                     str(source), dtype, fast)
             self.compiled[key] = (program, timestamp)
         return program
+    def free_buffer(self, key):
+        if key in self.cache:
+            self.cache[key].release()
+            del self.cache[key]
+    def __del__(self):
+        for v in self.cache.values():
+            release = getattr(v, 'release', lambda: None)
+            release()
+        self.cache = {}
+_CURRENT_ID = 0
+def unique_id():
+    global _CURRENT_ID
+    _CURRENT_ID += 1
+    return _CURRENT_ID
+def _create_some_context():
+    # type: () -> cl.Context
+    """
+    Protected call to cl.create_some_context without interactivity.
+    Uses SAS_OPENCL or PYOPENCL_CTX if they are set in the environment,
+    otherwise scans for the most appropriate device using
+    :func:`_get_default_context`
+    """
+    if 'SAS_OPENCL' in os.environ:
+        #Setting PYOPENCL_CTX as a SAS_OPENCL to create cl context
+        os.environ["PYOPENCL_CTX"] = os.environ["SAS_OPENCL"]
+    if 'PYOPENCL_CTX' in os.environ:
+        try:
+            return [cl.create_some_context(interactive=False)]
+        except Exception as exc:
+            warnings.warn(str(exc))
+            warnings.warn("pyopencl.create_some_context() failed")
+            warnings.warn("the environment variable 'SAS_OPENCL' or 'PYOPENCL_CTX' might not be set correctly")
+    return _get_default_context()
 def _get_default_context():
 …
         self.dtype = dtype
         self.fast = fast
         self.program = None # delay program creation
         self._kernels = None
+        self.timestamp = generate.ocl_timestamp(self.info)
+        self._cache_key = unique_id()
     def __getstate__(self):
 …
         # type: (Tuple[ModelInfo, str, np.dtype, bool]) -> None
         self.info, self.source, self.dtype, self.fast = state
-        self.program = None
     def make_kernel(self, q_vectors):
         # type: (List[np.ndarray]) -> "GpuKernel"
+        if self.program is None:
+            compile_program = environment().compile_program
+            timestamp = generate.ocl_timestamp(self.info)
+            self.program = compile_program(
+        return GpuKernel(self, q_vectors)
+    @property
+    def Iq(self):
+        return self._fetch_kernel('Iq')
+    def fetch_kernel(self, name):
+        # type: (str) -> cl.Kernel
+        """
+        Fetch the kernel from the environment by name, compiling it if it
+        does not already exist.
+        """
+        gpu = environment()
+        key = self._cache_key
+        if key not in gpu.cache:
+            program = gpu.compile_program(
                 self.info.name,
                 self.source['opencl'],
                 self.dtype,
                 self.fast,
                 timestamp)
+                self.timestamp)
             variants = ['Iq', 'Iqxy', 'Imagnetic']
             names = [generate.kernel_name(self.info, k) for k in variants]
             kernels = [getattr(self.program, k) for k in names]
             self._kernels = dict((k, v) for k, v in zip(variants, kernels))
         is_2d = len(q_vectors) == 2
         if is_2d:
             kernel = [self._kernels['Iqxy'], self._kernels['Imagnetic']]
+            kernels = [getattr(program, k) for k in names]
+            data = dict((k, v) for k, v in zip(variants, kernels))
+            # keep a handle to program so GC doesn't collect
+            data['program'] = program
+            gpu.cache[key] = data
         else:
+            kernel = [self._kernels['Iq']]*2
+        return GpuKernel(kernel, self.dtype, self.info, q_vectors)
+    def release(self):
+        # type: () -> None
+        """
+        Free the resources associated with the model.
+        """
+        if self.program is not None:
+            self.program = None
+    def __del__(self):
+        # type: () -> None
+        self.release()
+            data = gpu.cache[key]
+        return data[name]
 # TODO: check that we don't need a destructor for buffers which go out of scope
 …
         # type: (List[np.ndarray], np.dtype) -> None
         # TODO: do we ever need double precision q?
-        env = environment()
         self.nq = q_vectors[0].size
         self.dtype = np.dtype(dtype)
 …
             self.q[:self.nq] = q_vectors[0]
         self.global_size = [self.q.shape[0]]
+        context = env.get_context(self.dtype)
+        #print("creating inputs of size", self.global_size)
+        self.q_b = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR,
+                             hostbuf=self.q)
+        self._cache_key = unique_id()
+    @property
+    def q_b(self):
+        """Lazy creation of q buffer so it can survive context reset"""
+        env = environment()
+        key = self._cache_key
+        if key not in env.cache:
+            context = env.context[self.dtype]
+            #print("creating inputs of size", self.global_size)
+            buffer = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR,
+                               hostbuf=self.q)
+            env.cache[key] = buffer
+        return env.cache[key]
     def release(self):
         # type: () -> None
         """
+        Free the memory.
+        """
+        if self.q_b is not None:
+            self.q_b.release()
+            self.q_b = None
+        Free the buffer associated with the q value
+        """
+        environment().free_buffer(id(self))
     def __del__(self):
 …
     Callable SAS kernel.
     *kernel* is the GpuKernel object to call
     *model_info* is the module information
     *q_vectors* is the q vectors at which the kernel should be evaluated
+    *model* is the GpuModel object to call
+    The following attributes are defined:
+    *info* is the module information
     *dtype* is the kernel precision
+    *dim* is '1d' or '2d'
+    *result* is a vector to contain the results of the call
     The resulting call method takes the *pars*, a list of values for
 …
     Call :meth:`release` when done with the kernel instance.
     """
     def __init__(self, kernel, dtype, model_info, q_vectors):
+    def __init__(self, model, q_vectors):
         # type: (cl.Kernel, np.dtype, ModelInfo, List[np.ndarray]) -> None
+        q_input = GpuInput(q_vectors, dtype)
+        self.kernel = kernel
+        self.info = model_info
+        self.dtype = dtype
+        self.dim = '2d' if q_input.is_2d else '1d'
+        # plus three for the normalization values
+        self.result = np.empty(q_input.nq+1, dtype)
+        # Inputs and outputs for each kernel call
+        # Note: res may be shorter than res_b if global_size != nq
+        dtype = model.dtype
+        self.q_input = GpuInput(q_vectors, dtype)
+        self._model = model
+        self._as_dtype = (np.float32 if dtype == generate.F32
+                          else np.float64 if dtype == generate.F64
+                          else np.float16 if dtype == generate.F16
+                          else np.float32)  # will never get here, so use np.float32
+        self._cache_key = unique_id()
+        # attributes accessed from the outside
+        self.dim = '2d' if self.q_input.is_2d else '1d'
+        self.info = model.info
+        self.dtype = model.dtype
+        # holding place for the returned value
+        # plus one for the normalization values
+        self.result = np.empty(self.q_input.nq+1, dtype)
+    @property
+    def _result_b(self):
+        """Lazy creation of result buffer so it can survive context reset"""
         env = environment()
+        self.queue = env.get_queue(dtype)
+        self.result_b = cl.Buffer(self.queue.context, mf.READ_WRITE,
+                                  q_input.global_size[0] * dtype.itemsize)
+        self.q_input = q_input # allocated by GpuInput above
+        self._need_release = [self.result_b, self.q_input]
+        self.real = (np.float32 if dtype == generate.F32
+                     else np.float64 if dtype == generate.F64
+                     else np.float16 if dtype == generate.F16
+                     else np.float32)  # will never get here, so use np.float32
+        key = self._cache_key
+        if key not in env.cache:
+            context = env.context[self.dtype]
+            #print("creating inputs of size", self.global_size)
+            buffer = cl.Buffer(context, mf.READ_WRITE,
+                               self.q_input.global_size[0] * self.dtype.itemsize)
+            env.cache[key] = buffer
+        return env.cache[key]
     def __call__(self, call_details, values, cutoff, magnetic):
         # type: (CallDetails, np.ndarray, np.ndarray, float, bool) -> np.ndarray
+        context = self.queue.context
+        # Arrange data transfer to card
+        env = environment()
+        queue = env.queue[self._model.dtype]
+        context = queue.context
+        # Arrange data transfer to/from card
+        q_b = self.q_input.q_b
+        result_b = self._result_b
         details_b = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR,
                               hostbuf=call_details.buffer)
 …
                              hostbuf=values)
+        kernel = self.kernel[1 if magnetic else 0]
+        args = [
+        name = 'Iq' if self.dim == '1d' else 'Imagnetic' if magnetic else 'Iqxy'
+        kernel = self._model.fetch_kernel(name)
+        kernel_args = [
             np.uint32(self.q_input.nq), None, None,
             details_b, values_b, self.q_input.q_b, self.result_b,
             self.real(cutoff),
+            details_b, values_b, q_b, result_b,
+            self._as_dtype(cutoff),
+        ]
         #print("Calling OpenCL")
 …
             stop = min(start + step, call_details.num_eval)
             #print("queuing",start,stop)
             args[1:3] = [np.int32(start), np.int32(stop)]
             wait_for = [kernel(self.queue, self.q_input.global_size, None,
                                *args, wait_for=wait_for)]
+            kernel_args[1:3] = [np.int32(start), np.int32(stop)]
+            wait_for = [kernel(queue, self.q_input.global_size, None,
+                               *kernel_args, wait_for=wait_for)]
             if stop < call_details.num_eval:
                 # Allow other processes to run
 …
                     time.sleep(0.05)
                     last_nap = current_time
         cl.enqueue_copy(self.queue, self.result, self.result_b)
+        cl.enqueue_copy(queue, self.result, result_b, wait_for=wait_for)
         #print("result", self.result)
 …
         Release resources associated with the kernel.
         """
+        for v in self._need_release:
+            v.release()
+        self._need_release = []
+        environment().free_buffer(id(self))
+        self.q_input.release()
     def __del__(self):

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: