Changeset 599993b9 in sasmodels


Ignore:
Timestamp:
Oct 30, 2018 8:51:20 AM (13 days ago)
Author:
Paul Kienzle <pkienzle@…>
Branches:
ticket-1015-gpu-mem-error
Children:
63d4dd1
Parents:
2a12d8d8 (diff), c6084f1 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' into ticket-1015-gpu-mem-error

Files:
1 added
7 edited

Legend:

Unmodified
Added
Removed
  • doc/guide/plugin.rst

    r2015f02 r57c609b  
    428428        def random(): 
    429429        ... 
    430          
    431 This function provides a model-specific random parameter set which shows model  
    432 features in the USANS to SANS range.  For example, core-shell sphere sets the  
    433 outer radius of the sphere logarithmically in `[20, 20,000]`, which sets the Q  
    434 value for the transition from flat to falling.  It then uses a beta distribution  
    435 to set the percentage of the shape which is shell, giving a preference for very  
    436 thin or very thick shells (but never 0% or 100%).  Using `-sets=10` in sascomp  
    437 should show a reasonable variety of curves over the default sascomp q range.   
    438 The parameter set is returned as a dictionary of `{parameter: value, ...}`.   
    439 Any model parameters not included in the dictionary will default according to  
     430 
     431This function provides a model-specific random parameter set which shows model 
     432features in the USANS to SANS range.  For example, core-shell sphere sets the 
     433outer radius of the sphere logarithmically in `[20, 20,000]`, which sets the Q 
     434value for the transition from flat to falling.  It then uses a beta distribution 
     435to set the percentage of the shape which is shell, giving a preference for very 
     436thin or very thick shells (but never 0% or 100%).  Using `-sets=10` in sascomp 
     437should show a reasonable variety of curves over the default sascomp q range. 
     438The parameter set is returned as a dictionary of `{parameter: value, ...}`. 
     439Any model parameters not included in the dictionary will default according to 
    440440the code in the `_randomize_one()` function from sasmodels/compare.py. 
    441441 
     
    701701    erf, erfc, tgamma, lgamma:  **do not use** 
    702702        Special functions that should be part of the standard, but are missing 
    703         or inaccurate on some platforms. Use sas_erf, sas_erfc and sas_gamma 
    704         instead (see below). Note: lgamma(x) has not yet been tested. 
     703        or inaccurate on some platforms. Use sas_erf, sas_erfc, sas_gamma 
     704        and sas_lgamma instead (see below). 
    705705 
    706706Some non-standard constants and functions are also provided: 
     
    769769        Gamma function sas_gamma\ $(x) = \Gamma(x)$. 
    770770 
    771         The standard math function, tgamma(x) is unstable for $x < 1$ 
     771        The standard math function, tgamma(x), is unstable for $x < 1$ 
    772772        on some platforms. 
    773773 
    774774        :code:`source = ["lib/sas_gamma.c", ...]` 
    775775        (`sas_gamma.c <https://github.com/SasView/sasmodels/tree/master/sasmodels/models/lib/sas_gamma.c>`_) 
     776 
     777    sas_gammaln(x): 
     778        log gamma function sas_gammaln\ $(x) = \log \Gamma(|x|)$. 
     779 
     780        The standard math function, lgamma(x), is incorrect for single 
     781        precision on some platforms. 
     782 
     783        :code:`source = ["lib/sas_gammainc.c", ...]` 
     784        (`sas_gammainc.c <https://github.com/SasView/sasmodels/tree/master/sasmodels/models/lib/sas_gammainc.c>`_) 
     785 
     786    sas_gammainc(a, x), sas_gammaincc(a, x): 
     787        Incomplete gamma function 
     788        sas_gammainc\ $(a, x) = \int_0^x t^{a-1}e^{-t}\,dt / \Gamma(a)$ 
     789        and complementary incomplete gamma function 
     790        sas_gammaincc\ $(a, x) = \int_x^\infty t^{a-1}e^{-t}\,dt / \Gamma(a)$ 
     791 
     792        :code:`source = ["lib/sas_gammainc.c", ...]` 
     793        (`sas_gammainc.c <https://github.com/SasView/sasmodels/tree/master/sasmodels/models/lib/sas_gammainc.c>`_) 
    776794 
    777795    sas_erf(x), sas_erfc(x): 
     
    811829        If $n$ = 0 or 1, it uses sas_J0($x$) or sas_J1($x$), respectively. 
    812830 
     831        Warning: JN(n,x) can be very inaccurate (0.1%) for x not in [0.1, 100]. 
     832 
    813833        The standard math function jn(n, x) is not available on all platforms. 
    814834 
     
    819839        Sine integral Si\ $(x) = \int_0^x \tfrac{\sin t}{t}\,dt$. 
    820840 
     841        Warning: Si(x) can be very inaccurate (0.1%) for x in [0.1, 100]. 
     842 
    821843        This function uses Taylor series for small and large arguments: 
    822844 
    823         For large arguments, 
     845        For large arguments use the following Taylor series, 
    824846 
    825847        .. math:: 
     
    829851             - \frac{\sin(x)}{x}\left(\frac{1}{x} - \frac{3!}{x^3} + \frac{5!}{x^5} - \frac{7!}{x^7}\right) 
    830852 
    831         For small arguments, 
     853        For small arguments , 
    832854 
    833855        .. math:: 
  • explore/precision.py

    r2a7e20e rfba9ca0  
    9595            neg:    [-100,100] 
    9696 
     97        For arbitrary range use "start:stop:steps:scale" where scale is 
     98        one of log, lin, or linear. 
     99 
    97100        *diff* is "relative", "absolute" or "none" 
    98101 
     
    102105        linear = not xrange.startswith("log") 
    103106        if xrange == "zoom": 
    104             lin_min, lin_max, lin_steps = 1000, 1010, 2000 
     107            start, stop, steps = 1000, 1010, 2000 
    105108        elif xrange == "neg": 
    106             lin_min, lin_max, lin_steps = -100.1, 100.1, 2000 
     109            start, stop, steps = -100.1, 100.1, 2000 
    107110        elif xrange == "linear": 
    108             lin_min, lin_max, lin_steps = 1, 1000, 2000 
    109             lin_min, lin_max, lin_steps = 0.001, 2, 2000 
     111            start, stop, steps = 1, 1000, 2000 
     112            start, stop, steps = 0.001, 2, 2000 
    110113        elif xrange == "log": 
    111             log_min, log_max, log_steps = -3, 5, 400 
     114            start, stop, steps = -3, 5, 400 
    112115        elif xrange == "logq": 
    113             log_min, log_max, log_steps = -4, 1, 400 
     116            start, stop, steps = -4, 1, 400 
     117        elif ':' in xrange: 
     118            parts = xrange.split(':') 
     119            linear = parts[3] != "log" if len(parts) == 4 else True 
     120            steps = int(parts[2]) if len(parts) > 2 else 400 
     121            start = float(parts[0]) 
     122            stop = float(parts[1]) 
     123 
    114124        else: 
    115125            raise ValueError("unknown range "+xrange) 
     
    121131            # value to x in the given precision. 
    122132            if linear: 
    123                 lin_min = max(lin_min, self.limits[0]) 
    124                 lin_max = min(lin_max, self.limits[1]) 
    125                 qrf = np.linspace(lin_min, lin_max, lin_steps, dtype='single') 
    126                 #qrf = np.linspace(lin_min, lin_max, lin_steps, dtype='double') 
     133                start = max(start, self.limits[0]) 
     134                stop = min(stop, self.limits[1]) 
     135                qrf = np.linspace(start, stop, steps, dtype='single') 
     136                #qrf = np.linspace(start, stop, steps, dtype='double') 
    127137                qr = [mp.mpf(float(v)) for v in qrf] 
    128                 #qr = mp.linspace(lin_min, lin_max, lin_steps) 
     138                #qr = mp.linspace(start, stop, steps) 
    129139            else: 
    130                 log_min = np.log10(max(10**log_min, self.limits[0])) 
    131                 log_max = np.log10(min(10**log_max, self.limits[1])) 
    132                 qrf = np.logspace(log_min, log_max, log_steps, dtype='single') 
    133                 #qrf = np.logspace(log_min, log_max, log_steps, dtype='double') 
     140                start = np.log10(max(10**start, self.limits[0])) 
     141                stop = np.log10(min(10**stop, self.limits[1])) 
     142                qrf = np.logspace(start, stop, steps, dtype='single') 
     143                #qrf = np.logspace(start, stop, steps, dtype='double') 
    134144                qr = [mp.mpf(float(v)) for v in qrf] 
    135                 #qr = [10**v for v in mp.linspace(log_min, log_max, log_steps)] 
     145                #qr = [10**v for v in mp.linspace(start, stop, steps)] 
    136146 
    137147        target = self.call_mpmath(qr, bits=500) 
     
    176186    """ 
    177187    if diff == "relative": 
    178         err = np.array([abs((t-a)/t) for t, a in zip(target, actual)], 'd') 
     188        err = np.array([(abs((t-a)/t) if t != 0 else a) for t, a in zip(target, actual)], 'd') 
    179189        #err = np.clip(err, 0, 1) 
    180190        pylab.loglog(x, err, '-', label=label) 
     
    197207    return model_info 
    198208 
     209# Hack to allow second parameter A in two parameter functions 
     210A = 1 
     211def parse_extra_pars(): 
     212    global A 
     213 
     214    A_str = str(A) 
     215    pop = [] 
     216    for k, v in enumerate(sys.argv[1:]): 
     217        if v.startswith("A="): 
     218            A_str = v[2:] 
     219            pop.append(k+1) 
     220    if pop: 
     221        sys.argv = [v for k, v in enumerate(sys.argv) if k not in pop] 
     222        A = float(A_str) 
     223 
     224parse_extra_pars() 
     225 
    199226 
    200227# =============== FUNCTION DEFINITIONS ================ 
     
    297324    ocl_function=make_ocl("return sas_gamma(q);", "sas_gamma", ["lib/sas_gamma.c"]), 
    298325    limits=(-3.1, 10), 
     326) 
     327add_function( 
     328    name="gammaln(x)", 
     329    mp_function=mp.loggamma, 
     330    np_function=scipy.special.gammaln, 
     331    ocl_function=make_ocl("return sas_gammaln(q);", "sas_gammaln", ["lib/sas_gammainc.c"]), 
     332    #ocl_function=make_ocl("return lgamma(q);", "sas_gammaln"), 
     333) 
     334add_function( 
     335    name="gammainc(x)", 
     336    mp_function=lambda x, a=A: mp.gammainc(a, a=0, b=x)/mp.gamma(a), 
     337    np_function=lambda x, a=A: scipy.special.gammainc(a, x), 
     338    ocl_function=make_ocl("return sas_gammainc(%.15g,q);"%A, "sas_gammainc", ["lib/sas_gammainc.c"]), 
     339) 
     340add_function( 
     341    name="gammaincc(x)", 
     342    mp_function=lambda x, a=A: mp.gammainc(a, a=x, b=mp.inf)/mp.gamma(a), 
     343    np_function=lambda x, a=A: scipy.special.gammaincc(a, x), 
     344    ocl_function=make_ocl("return sas_gammaincc(%.15g,q);"%A, "sas_gammaincc", ["lib/sas_gammainc.c"]), 
    299345) 
    300346add_function( 
     
    463509lanczos_gamma = """\ 
    464510    const double coeff[] = { 
    465             76.18009172947146,     -86.50532032941677, 
    466             24.01409824083091,     -1.231739572450155, 
     511            76.18009172947146, -86.50532032941677, 
     512            24.01409824083091, -1.231739572450155, 
    467513            0.1208650973866179e-2,-0.5395239384953e-5 
    468514            }; 
     
    475521""" 
    476522add_function( 
    477     name="log gamma(x)", 
     523    name="loggamma(x)", 
    478524    mp_function=mp.loggamma, 
    479525    np_function=scipy.special.gammaln, 
     
    599645 
    600646ALL_FUNCTIONS = set(FUNCTIONS.keys()) 
    601 ALL_FUNCTIONS.discard("loggamma")  # OCL version not ready yet 
     647ALL_FUNCTIONS.discard("loggamma")  # use cephes-based gammaln instead 
    602648ALL_FUNCTIONS.discard("3j1/x:taylor") 
    603649ALL_FUNCTIONS.discard("3j1/x:trig") 
     
    615661    -r indicates that the relative error should be plotted (default), 
    616662    -x<range> indicates the steps in x, where <range> is one of the following 
    617       log indicates log stepping in [10^-3, 10^5] (default) 
    618       logq indicates log stepping in [10^-4, 10^1] 
    619       linear indicates linear stepping in [1, 1000] 
    620       zoom indicates linear stepping in [1000, 1010] 
    621       neg indicates linear stepping in [-100.1, 100.1] 
    622 and name is "all" or one of: 
     663        log indicates log stepping in [10^-3, 10^5] (default) 
     664        logq indicates log stepping in [10^-4, 10^1] 
     665        linear indicates linear stepping in [1, 1000] 
     666        zoom indicates linear stepping in [1000, 1010] 
     667        neg indicates linear stepping in [-100.1, 100.1] 
     668        start:stop:n[:stepping] indicates an n-step plot in [start, stop] 
     669            or [10^start, 10^stop] if stepping is "log" (default n=400) 
     670Some functions (notably gammainc/gammaincc) have an additional parameter A 
     671which can be set from the command line as A=value.  Default is A=1. 
     672 
     673Name is one of: 
    623674    """+names) 
    624675    sys.exit(1) 
  • sasmodels/kernelpy.py

    r91bd550 r12eec1e  
    3737        self.info = model_info 
    3838        self.dtype = np.dtype('d') 
     39        logger.info("make python model " + self.info.name) 
    3940 
    4041    def make_kernel(self, q_vectors): 
  • sasmodels/model_test.py

    r012cd34 r12eec1e  
    4747import sys 
    4848import unittest 
     49import traceback 
    4950 
    5051try: 
     
    7475# pylint: enable=unused-import 
    7576 
    76  
    7777def make_suite(loaders, models): 
    7878    # type: (List[str], List[str]) -> unittest.TestSuite 
     
    8686    *models* is the list of models to test, or *["all"]* to test all models. 
    8787    """ 
    88     ModelTestCase = _hide_model_case_from_nose() 
    8988    suite = unittest.TestSuite() 
    9089 
     
    9594        skip = [] 
    9695    for model_name in models: 
    97         if model_name in skip: 
    98             continue 
    99         model_info = load_model_info(model_name) 
    100  
    101         #print('------') 
    102         #print('found tests in', model_name) 
    103         #print('------') 
    104  
    105         # if ispy then use the dll loader to call pykernel 
    106         # don't try to call cl kernel since it will not be 
    107         # available in some environmentes. 
    108         is_py = callable(model_info.Iq) 
    109  
    110         # Some OpenCL drivers seem to be flaky, and are not producing the 
    111         # expected result.  Since we don't have known test values yet for 
    112         # all of our models, we are instead going to compare the results 
    113         # for the 'smoke test' (that is, evaluation at q=0.1 for the default 
    114         # parameters just to see that the model runs to completion) between 
    115         # the OpenCL and the DLL.  To do this, we define a 'stash' which is 
    116         # shared between OpenCL and DLL tests.  This is just a list.  If the 
    117         # list is empty (which it will be when DLL runs, if the DLL runs 
    118         # first), then the results are appended to the list.  If the list 
    119         # is not empty (which it will be when OpenCL runs second), the results 
    120         # are compared to the results stored in the first element of the list. 
    121         # This is a horrible stateful hack which only makes sense because the 
    122         # test suite is thrown away after being run once. 
    123         stash = [] 
    124  
    125         if is_py:  # kernel implemented in python 
    126             test_name = "%s-python"%model_name 
    127             test_method_name = "test_%s_python" % model_info.id 
     96        if model_name not in skip: 
     97            model_info = load_model_info(model_name) 
     98            _add_model_to_suite(loaders, suite, model_info) 
     99 
     100    return suite 
     101 
     102def _add_model_to_suite(loaders, suite, model_info): 
     103    ModelTestCase = _hide_model_case_from_nose() 
     104 
     105    #print('------') 
     106    #print('found tests in', model_name) 
     107    #print('------') 
     108 
     109    # if ispy then use the dll loader to call pykernel 
     110    # don't try to call cl kernel since it will not be 
     111    # available in some environmentes. 
     112    is_py = callable(model_info.Iq) 
     113 
     114    # Some OpenCL drivers seem to be flaky, and are not producing the 
     115    # expected result.  Since we don't have known test values yet for 
     116    # all of our models, we are instead going to compare the results 
     117    # for the 'smoke test' (that is, evaluation at q=0.1 for the default 
     118    # parameters just to see that the model runs to completion) between 
     119    # the OpenCL and the DLL.  To do this, we define a 'stash' which is 
     120    # shared between OpenCL and DLL tests.  This is just a list.  If the 
     121    # list is empty (which it will be when DLL runs, if the DLL runs 
     122    # first), then the results are appended to the list.  If the list 
     123    # is not empty (which it will be when OpenCL runs second), the results 
     124    # are compared to the results stored in the first element of the list. 
     125    # This is a horrible stateful hack which only makes sense because the 
     126    # test suite is thrown away after being run once. 
     127    stash = [] 
     128 
     129    if is_py:  # kernel implemented in python 
     130        test_name = "%s-python"%model_info.name 
     131        test_method_name = "test_%s_python" % model_info.id 
     132        test = ModelTestCase(test_name, model_info, 
     133                                test_method_name, 
     134                                platform="dll",  # so that 
     135                                dtype="double", 
     136                                stash=stash) 
     137        suite.addTest(test) 
     138    else:   # kernel implemented in C 
     139 
     140        # test using dll if desired 
     141        if 'dll' in loaders or not use_opencl(): 
     142            test_name = "%s-dll"%model_info.name 
     143            test_method_name = "test_%s_dll" % model_info.id 
    128144            test = ModelTestCase(test_name, model_info, 
    129                                  test_method_name, 
    130                                  platform="dll",  # so that 
    131                                  dtype="double", 
    132                                  stash=stash) 
     145                                    test_method_name, 
     146                                    platform="dll", 
     147                                    dtype="double", 
     148                                    stash=stash) 
    133149            suite.addTest(test) 
    134         else:   # kernel implemented in C 
    135  
    136             # test using dll if desired 
    137             if 'dll' in loaders or not use_opencl(): 
    138                 test_name = "%s-dll"%model_name 
    139                 test_method_name = "test_%s_dll" % model_info.id 
    140                 test = ModelTestCase(test_name, model_info, 
    141                                      test_method_name, 
    142                                      platform="dll", 
    143                                      dtype="double", 
    144                                      stash=stash) 
    145                 suite.addTest(test) 
    146  
    147             # test using opencl if desired and available 
    148             if 'opencl' in loaders and use_opencl(): 
    149                 test_name = "%s-opencl"%model_name 
    150                 test_method_name = "test_%s_opencl" % model_info.id 
    151                 # Using dtype=None so that the models that are only 
    152                 # correct for double precision are not tested using 
    153                 # single precision.  The choice is determined by the 
    154                 # presence of *single=False* in the model file. 
    155                 test = ModelTestCase(test_name, model_info, 
    156                                      test_method_name, 
    157                                      platform="ocl", dtype=None, 
    158                                      stash=stash) 
    159                 #print("defining", test_name) 
    160                 suite.addTest(test) 
    161  
    162     return suite 
     150 
     151        # test using opencl if desired and available 
     152        if 'opencl' in loaders and use_opencl(): 
     153            test_name = "%s-opencl"%model_info.name 
     154            test_method_name = "test_%s_opencl" % model_info.id 
     155            # Using dtype=None so that the models that are only 
     156            # correct for double precision are not tested using 
     157            # single precision.  The choice is determined by the 
     158            # presence of *single=False* in the model file. 
     159            test = ModelTestCase(test_name, model_info, 
     160                                    test_method_name, 
     161                                    platform="ocl", dtype=None, 
     162                                    stash=stash) 
     163            #print("defining", test_name) 
     164            suite.addTest(test) 
     165 
    163166 
    164167def _hide_model_case_from_nose(): 
     
    348351    return abs(target-actual)/shift < 1.5*10**-digits 
    349352 
    350 def run_one(model): 
    351     # type: (str) -> str 
    352     """ 
    353     Run the tests for a single model, printing the results to stdout. 
    354  
    355     *model* can by a python file, which is handy for checking user defined 
    356     plugin models. 
     353# CRUFT: old interface; should be deprecated and removed 
     354def run_one(model_name): 
     355    # msg = "use check_model(model_info) rather than run_one(model_name)" 
     356    # warnings.warn(msg, category=DeprecationWarning, stacklevel=2) 
     357    try: 
     358        model_info = load_model_info(model_name) 
     359    except Exception: 
     360        output = traceback.format_exc() 
     361        return output 
     362 
     363    success, output = check_model(model_info) 
     364    return output 
     365 
     366def check_model(model_info): 
     367    # type: (ModelInfo) -> str 
     368    """ 
     369    Run the tests for a single model, capturing the output. 
     370 
     371    Returns success status and the output string. 
    357372    """ 
    358373    # Note that running main() directly did not work from within the 
     
    369384    # Build a test suite containing just the model 
    370385    loaders = ['opencl'] if use_opencl() else ['dll'] 
    371     models = [model] 
    372     try: 
    373         suite = make_suite(loaders, models) 
    374     except Exception: 
    375         import traceback 
    376         stream.writeln(traceback.format_exc()) 
    377         return 
     386    suite = unittest.TestSuite() 
     387    _add_model_to_suite(loaders, suite, model_info) 
    378388 
    379389    # Warn if there are no user defined tests. 
     
    390400    for test in suite: 
    391401        if not test.info.tests: 
    392             stream.writeln("Note: %s has no user defined tests."%model) 
     402            stream.writeln("Note: %s has no user defined tests."%model_info.name) 
    393403        break 
    394404    else: 
     
    406416    output = stream.getvalue() 
    407417    stream.close() 
    408     return output 
     418    return result.wasSuccessful(), output 
    409419 
    410420 
  • sasmodels/sasview_model.py

    rbd547d0 rce1eed5  
    803803            return value, [value], [1.0] 
    804804 
     805    @classmethod 
     806    def runTests(cls): 
     807        """ 
     808        Run any tests built into the model and captures the test output. 
     809 
     810        Returns success flag and output 
     811        """ 
     812        from .model_test import check_model 
     813        return check_model(cls._model_info) 
     814 
    805815def test_cylinder(): 
    806816    # type: () -> float 
  • sasmodels/special.py

    rdf69efa rfba9ca0  
    113113        The standard math function, tgamma(x) is unstable for $x < 1$ 
    114114        on some platforms. 
     115 
     116    sas_gammaln(x): 
     117        log gamma function sas_gammaln\ $(x) = \log \Gamma(|x|)$. 
     118 
     119        The standard math function, lgamma(x), is incorrect for single 
     120        precision on some platforms. 
     121 
     122    sas_gammainc(a, x), sas_gammaincc(a, x): 
     123        Incomplete gamma function 
     124        sas_gammainc\ $(a, x) = \int_0^x t^{a-1}e^{-t}\,dt / \Gamma(a)$ 
     125        and complementary incomplete gamma function 
     126        sas_gammaincc\ $(a, x) = \int_x^\infty t^{a-1}e^{-t}\,dt / \Gamma(a)$ 
    115127 
    116128    sas_erf(x), sas_erfc(x): 
     
    207219from numpy import pi, nan, inf 
    208220from scipy.special import gamma as sas_gamma 
     221from scipy.special import gammaln as sas_gammaln 
     222from scipy.special import gammainc as sas_gammainc 
     223from scipy.special import gammaincc as sas_gammaincc 
    209224from scipy.special import erf as sas_erf 
    210225from scipy.special import erfc as sas_erfc 
  • sasmodels/kernelcl.py

    rd86f0fc r95f62aa  
    7474 
    7575from . import generate 
     76from .generate import F32, F64 
    7677from .kernel import KernelModel, Kernel 
    7778 
     
    162163    Return true if device supports the requested precision. 
    163164    """ 
    164     if dtype == generate.F32: 
     165    if dtype == F32: 
    165166        return True 
    166167    elif dtype == generate.F64: 
     
    239240    """ 
    240241    GPU context, with possibly many devices, and one queue per device. 
     242 
     243    Because the environment can be reset during a live program (e.g., if the 
     244    user changes the active GPU device in the GUI), everything associated 
     245    with the device context must be cached in the environment and recreated 
     246    if the environment changes.  The *cache* attribute is a simple dictionary 
     247    which holds keys and references to objects, such as compiled kernels and 
     248    allocated buffers.  The running program should check in the cache for 
     249    long lived objects and create them if they are not there.  The program 
     250    should not hold onto cached objects, but instead only keep them active 
     251    for the duration of a function call.  When the environment is destroyed 
     252    then the *release* method for each active cache item is called before 
     253    the environment is freed.  This means that each cl buffer should be 
     254    in its own cache entry. 
    241255    """ 
    242256    def __init__(self): 
    243257        # type: () -> None 
    244258        # find gpu context 
    245         #self.context = cl.create_some_context() 
    246  
    247         self.context = None 
    248         if 'SAS_OPENCL' in os.environ: 
    249             #Setting PYOPENCL_CTX as a SAS_OPENCL to create cl context 
    250             os.environ["PYOPENCL_CTX"] = os.environ["SAS_OPENCL"] 
    251         if 'PYOPENCL_CTX' in os.environ: 
    252             self._create_some_context() 
    253  
    254         if not self.context: 
    255             self.context = _get_default_context() 
     259        context_list = _create_some_context() 
     260 
     261        # Find a context for F32 and for F64 (maybe the same one). 
     262        # F16 isn't good enough. 
     263        self.context = {} 
     264        for dtype in (F32, F64): 
     265            for context in context_list: 
     266                if has_type(context.devices[0], dtype): 
     267                    self.context[dtype] = context 
     268                    break 
     269            else: 
     270                self.context[dtype] = None 
     271 
     272        # Build a queue for each context 
     273        self.queue = {} 
     274        context = self.context[F32] 
     275        self.queue[F32] = cl.CommandQueue(context, context.devices[0]) 
     276        if self.context[F64] == self.context[F32]: 
     277            self.queue[F64] = self.queue[F32] 
     278        else: 
     279            context = self.context[F64] 
     280            self.queue[F64] = cl.CommandQueue(context, context.devices[0]) 
    256281 
    257282        # Byte boundary for data alignment 
    258         #self.data_boundary = max(d.min_data_type_align_size 
    259         #                         for d in self.context.devices) 
    260         self.queues = [cl.CommandQueue(context, context.devices[0]) 
    261                        for context in self.context] 
     283        #self.data_boundary = max(context.devices[0].min_data_type_align_size 
     284        #                         for context in self.context.values()) 
     285 
     286        # Cache for compiled programs, and for items in context 
    262287        self.compiled = {} 
     288        self.cache = {} 
    263289 
    264290    def has_type(self, dtype): 
     
    267293        Return True if all devices support a given type. 
    268294        """ 
    269         return any(has_type(d, dtype) 
    270                    for context in self.context 
    271                    for d in context.devices) 
    272  
    273     def get_queue(self, dtype): 
    274         # type: (np.dtype) -> cl.CommandQueue 
    275         """ 
    276         Return a command queue for the kernels of type dtype. 
    277         """ 
    278         for context, queue in zip(self.context, self.queues): 
    279             if all(has_type(d, dtype) for d in context.devices): 
    280                 return queue 
    281  
    282     def get_context(self, dtype): 
    283         # type: (np.dtype) -> cl.Context 
    284         """ 
    285         Return a OpenCL context for the kernels of type dtype. 
    286         """ 
    287         for context in self.context: 
    288             if all(has_type(d, dtype) for d in context.devices): 
    289                 return context 
    290  
    291     def _create_some_context(self): 
    292         # type: () -> cl.Context 
    293         """ 
    294         Protected call to cl.create_some_context without interactivity.  Use 
    295         this if SAS_OPENCL is set in the environment.  Sets the *context* 
    296         attribute. 
    297         """ 
    298         try: 
    299             self.context = [cl.create_some_context(interactive=False)] 
    300         except Exception as exc: 
    301             warnings.warn(str(exc)) 
    302             warnings.warn("pyopencl.create_some_context() failed") 
    303             warnings.warn("the environment variable 'SAS_OPENCL' might not be set correctly") 
     295        return self.context.get(dtype, None) is not None 
    304296 
    305297    def compile_program(self, name, source, dtype, fast, timestamp): 
     
    318310            del self.compiled[key] 
    319311        if key not in self.compiled: 
    320             context = self.get_context(dtype) 
     312            context = self.context[dtype] 
    321313            logging.info("building %s for OpenCL %s", key, 
    322314                         context.devices[0].name.strip()) 
    323             program = compile_model(self.get_context(dtype), 
     315            program = compile_model(self.context[dtype], 
    324316                                    str(source), dtype, fast) 
    325317            self.compiled[key] = (program, timestamp) 
    326318        return program 
     319 
     320    def free_buffer(self, key): 
     321        if key in self.cache: 
     322            self.cache[key].release() 
     323            del self.cache[key] 
     324 
     325    def __del__(self): 
     326        for v in self.cache.values(): 
     327            release = getattr(v, 'release', lambda: None) 
     328            release() 
     329        self.cache = {} 
     330 
     331_CURRENT_ID = 0 
     332def unique_id(): 
     333    global _CURRENT_ID 
     334    _CURRENT_ID += 1 
     335    return _CURRENT_ID 
     336 
     337def _create_some_context(): 
     338    # type: () -> cl.Context 
     339    """ 
     340    Protected call to cl.create_some_context without interactivity. 
     341 
     342    Uses SAS_OPENCL or PYOPENCL_CTX if they are set in the environment, 
     343    otherwise scans for the most appropriate device using 
     344    :func:`_get_default_context` 
     345    """ 
     346    if 'SAS_OPENCL' in os.environ: 
     347        #Setting PYOPENCL_CTX as a SAS_OPENCL to create cl context 
     348        os.environ["PYOPENCL_CTX"] = os.environ["SAS_OPENCL"] 
     349 
     350    if 'PYOPENCL_CTX' in os.environ: 
     351        try: 
     352            return [cl.create_some_context(interactive=False)] 
     353        except Exception as exc: 
     354            warnings.warn(str(exc)) 
     355            warnings.warn("pyopencl.create_some_context() failed") 
     356            warnings.warn("the environment variable 'SAS_OPENCL' or 'PYOPENCL_CTX' might not be set correctly") 
     357 
     358    return _get_default_context() 
    327359 
    328360def _get_default_context(): 
     
    404436        self.dtype = dtype 
    405437        self.fast = fast 
    406         self.program = None # delay program creation 
    407         self._kernels = None 
     438        self.timestamp = generate.ocl_timestamp(self.info) 
     439        self._cache_key = unique_id() 
    408440 
    409441    def __getstate__(self): 
     
    414446        # type: (Tuple[ModelInfo, str, np.dtype, bool]) -> None 
    415447        self.info, self.source, self.dtype, self.fast = state 
    416         self.program = None 
    417448 
    418449    def make_kernel(self, q_vectors): 
    419450        # type: (List[np.ndarray]) -> "GpuKernel" 
    420         if self.program is None: 
    421             compile_program = environment().compile_program 
    422             timestamp = generate.ocl_timestamp(self.info) 
    423             self.program = compile_program( 
     451        return GpuKernel(self, q_vectors) 
     452 
     453    @property 
     454    def Iq(self): 
     455        return self._fetch_kernel('Iq') 
     456 
     457    def fetch_kernel(self, name): 
     458        # type: (str) -> cl.Kernel 
     459        """ 
     460        Fetch the kernel from the environment by name, compiling it if it 
     461        does not already exist. 
     462        """ 
     463        gpu = environment() 
     464        key = self._cache_key 
     465        if key not in gpu.cache: 
     466            program = gpu.compile_program( 
    424467                self.info.name, 
    425468                self.source['opencl'], 
    426469                self.dtype, 
    427470                self.fast, 
    428                 timestamp) 
     471                self.timestamp) 
    429472            variants = ['Iq', 'Iqxy', 'Imagnetic'] 
    430473            names = [generate.kernel_name(self.info, k) for k in variants] 
    431             kernels = [getattr(self.program, k) for k in names] 
    432             self._kernels = dict((k, v) for k, v in zip(variants, kernels)) 
    433         is_2d = len(q_vectors) == 2 
    434         if is_2d: 
    435             kernel = [self._kernels['Iqxy'], self._kernels['Imagnetic']] 
     474            kernels = [getattr(program, k) for k in names] 
     475            data = dict((k, v) for k, v in zip(variants, kernels)) 
     476            # keep a handle to program so GC doesn't collect 
     477            data['program'] = program 
     478            gpu.cache[key] = data 
    436479        else: 
    437             kernel = [self._kernels['Iq']]*2 
    438         return GpuKernel(kernel, self.dtype, self.info, q_vectors) 
    439  
    440     def release(self): 
    441         # type: () -> None 
    442         """ 
    443         Free the resources associated with the model. 
    444         """ 
    445         if self.program is not None: 
    446             self.program = None 
    447  
    448     def __del__(self): 
    449         # type: () -> None 
    450         self.release() 
     480            data = gpu.cache[key] 
     481        return data[name] 
    451482 
    452483# TODO: check that we don't need a destructor for buffers which go out of scope 
     
    473504        # type: (List[np.ndarray], np.dtype) -> None 
    474505        # TODO: do we ever need double precision q? 
    475         env = environment() 
    476506        self.nq = q_vectors[0].size 
    477507        self.dtype = np.dtype(dtype) 
     
    493523            self.q[:self.nq] = q_vectors[0] 
    494524        self.global_size = [self.q.shape[0]] 
    495         context = env.get_context(self.dtype) 
    496         #print("creating inputs of size", self.global_size) 
    497         self.q_b = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, 
    498                              hostbuf=self.q) 
     525        self._cache_key = unique_id() 
     526 
     527    @property 
     528    def q_b(self): 
     529        """Lazy creation of q buffer so it can survive context reset""" 
     530        env = environment() 
     531        key = self._cache_key 
     532        if key not in env.cache: 
     533            context = env.context[self.dtype] 
     534            #print("creating inputs of size", self.global_size) 
     535            buffer = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, 
     536                               hostbuf=self.q) 
     537            env.cache[key] = buffer 
     538        return env.cache[key] 
    499539 
    500540    def release(self): 
    501541        # type: () -> None 
    502542        """ 
    503         Free the memory. 
    504         """ 
    505         if self.q_b is not None: 
    506             self.q_b.release() 
    507             self.q_b = None 
     543        Free the buffer associated with the q value 
     544        """ 
     545        environment().free_buffer(id(self)) 
    508546 
    509547    def __del__(self): 
     
    515553    Callable SAS kernel. 
    516554 
    517     *kernel* is the GpuKernel object to call 
    518  
    519     *model_info* is the module information 
    520  
    521     *q_vectors* is the q vectors at which the kernel should be evaluated 
     555    *model* is the GpuModel object to call 
     556 
     557    The following attributes are defined: 
     558 
     559    *info* is the module information 
    522560 
    523561    *dtype* is the kernel precision 
     562 
     563    *dim* is '1d' or '2d' 
     564 
     565    *result* is a vector to contain the results of the call 
    524566 
    525567    The resulting call method takes the *pars*, a list of values for 
     
    531573    Call :meth:`release` when done with the kernel instance. 
    532574    """ 
    533     def __init__(self, kernel, dtype, model_info, q_vectors): 
     575    def __init__(self, model, q_vectors): 
    534576        # type: (cl.Kernel, np.dtype, ModelInfo, List[np.ndarray]) -> None 
    535         q_input = GpuInput(q_vectors, dtype) 
    536         self.kernel = kernel 
    537         self.info = model_info 
    538         self.dtype = dtype 
    539         self.dim = '2d' if q_input.is_2d else '1d' 
    540         # plus three for the normalization values 
    541         self.result = np.empty(q_input.nq+1, dtype) 
    542  
    543         # Inputs and outputs for each kernel call 
    544         # Note: res may be shorter than res_b if global_size != nq 
     577        dtype = model.dtype 
     578        self.q_input = GpuInput(q_vectors, dtype) 
     579        self._model = model 
     580        self._as_dtype = (np.float32 if dtype == generate.F32 
     581                          else np.float64 if dtype == generate.F64 
     582                          else np.float16 if dtype == generate.F16 
     583                          else np.float32)  # will never get here, so use np.float32 
     584        self._cache_key = unique_id() 
     585 
     586        # attributes accessed from the outside 
     587        self.dim = '2d' if self.q_input.is_2d else '1d' 
     588        self.info = model.info 
     589        self.dtype = model.dtype 
     590 
     591        # holding place for the returned value 
     592        # plus one for the normalization values 
     593        self.result = np.empty(self.q_input.nq+1, dtype) 
     594 
     595    @property 
     596    def _result_b(self): 
     597        """Lazy creation of result buffer so it can survive context reset""" 
    545598        env = environment() 
    546         self.queue = env.get_queue(dtype) 
    547  
    548         self.result_b = cl.Buffer(self.queue.context, mf.READ_WRITE, 
    549                                   q_input.global_size[0] * dtype.itemsize) 
    550         self.q_input = q_input # allocated by GpuInput above 
    551  
    552         self._need_release = [self.result_b, self.q_input] 
    553         self.real = (np.float32 if dtype == generate.F32 
    554                      else np.float64 if dtype == generate.F64 
    555                      else np.float16 if dtype == generate.F16 
    556                      else np.float32)  # will never get here, so use np.float32 
     599        key = self._cache_key 
     600        if key not in env.cache: 
     601            context = env.context[self.dtype] 
     602            #print("creating inputs of size", self.global_size) 
     603            buffer = cl.Buffer(context, mf.READ_WRITE, 
     604                               self.q_input.global_size[0] * self.dtype.itemsize) 
     605            env.cache[key] = buffer 
     606        return env.cache[key] 
    557607 
    558608    def __call__(self, call_details, values, cutoff, magnetic): 
    559609        # type: (CallDetails, np.ndarray, np.ndarray, float, bool) -> np.ndarray 
    560         context = self.queue.context 
    561         # Arrange data transfer to card 
     610        env = environment() 
     611        queue = env.queue[self._model.dtype] 
     612        context = queue.context 
     613 
     614        # Arrange data transfer to/from card 
     615        q_b = self.q_input.q_b 
     616        result_b = self._result_b 
    562617        details_b = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, 
    563618                              hostbuf=call_details.buffer) 
     
    565620                             hostbuf=values) 
    566621 
    567         kernel = self.kernel[1 if magnetic else 0] 
    568         args = [ 
     622        name = 'Iq' if self.dim == '1d' else 'Imagnetic' if magnetic else 'Iqxy' 
     623        kernel = self._model.fetch_kernel(name) 
     624        kernel_args = [ 
    569625            np.uint32(self.q_input.nq), None, None, 
    570             details_b, values_b, self.q_input.q_b, self.result_b, 
    571             self.real(cutoff), 
     626            details_b, values_b, q_b, result_b, 
     627            self._as_dtype(cutoff), 
    572628        ] 
    573629        #print("Calling OpenCL") 
     
    580636            stop = min(start + step, call_details.num_eval) 
    581637            #print("queuing",start,stop) 
    582             args[1:3] = [np.int32(start), np.int32(stop)] 
    583             wait_for = [kernel(self.queue, self.q_input.global_size, None, 
    584                                *args, wait_for=wait_for)] 
     638            kernel_args[1:3] = [np.int32(start), np.int32(stop)] 
     639            wait_for = [kernel(queue, self.q_input.global_size, None, 
     640                               *kernel_args, wait_for=wait_for)] 
    585641            if stop < call_details.num_eval: 
    586642                # Allow other processes to run 
     
    590646                    time.sleep(0.05) 
    591647                    last_nap = current_time 
    592         cl.enqueue_copy(self.queue, self.result, self.result_b) 
     648        cl.enqueue_copy(queue, self.result, result_b, wait_for=wait_for) 
    593649        #print("result", self.result) 
    594650 
     
    610666        Release resources associated with the kernel. 
    611667        """ 
    612         for v in self._need_release: 
    613             v.release() 
    614         self._need_release = [] 
     668        environment().free_buffer(id(self)) 
     669        self.q_input.release() 
    615670 
    616671    def __del__(self): 
Note: See TracChangeset for help on using the changeset viewer.