Changeset 36a2418 in sasmodels
- Timestamp:
- Mar 6, 2019 3:16:13 PM (6 years ago)
- Branches:
- master, core_shell_microgels, magnetic_model, ticket-1257-vesicle-product, ticket_1156, ticket_1265_superball, ticket_822_more_unit_tests
- Children:
- da3638f
- Parents:
- fa26e78 (diff), e5bbe64 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent. - Files:
-
- 13 edited
- 6 moved
Legend:
- Unmodified
- Added
- Removed
-
README.rst
re30d645 r2a64722 10 10 is available. 11 11 12 Example 12 Install 13 13 ------- 14 15 The easiest way to use sasmodels is from `SasView <http://www.sasview.org/>`_. 16 17 You can also install sasmodels as a standalone package in python. Use 18 `miniconda <https://docs.conda.io/en/latest/miniconda.html>`_ 19 or `anaconda <https://www.anaconda.com/>`_ 20 to create a python environment with the sasmodels dependencies:: 21 22 $ conda create -n sasmodels -c conda-forge numpy scipy matplotlib pyopencl 23 24 The option ``-n sasmodels`` names the environment sasmodels, and the option 25 ``-c conda-forge`` selects the conda-forge package channel because pyopencl 26 is not part of the base anaconda distribution. 27 28 Activate the environment and install sasmodels:: 29 30 $ conda activate sasmodels 31 (sasmodels) $ pip install sasmodels 32 33 Install `bumps <https://github.com/bumps/bumps>`_ if you want to use it to fit 34 your data:: 35 36 (sasmodels) $ pip install bumps 37 38 Usage 39 ----- 40 41 Check that the works:: 42 43 (sasmodels) $ python -m sasmodels.compare cylinder 44 45 To show the orientation explorer:: 46 47 (sasmodels) $ python -m sasmodels.jitter 48 49 Documentation is available online as part of the SasView 50 `fitting perspective <http://www.sasview.org/docs/index.html>`_ 51 as well as separate pages for 52 `individual models <http://www.sasview.org/docs/user/sasgui/perspectives/fitting/models/index.html>`_. 53 Programming details for sasmodels are available in the 54 `developer documentation <http://www.sasview.org/docs/dev/dev.html>`_. 55 56 57 Fitting Example 58 --------------- 14 59 15 60 The example directory contains a radial+tangential data set for an oriented 16 61 rod-like shape. 17 62 18 The data is loaded by sas.dataloader from the sasview package, so sasview 19 is needed to run the example. 63 To load the example data, you will need the SAS data loader from the sasview 64 package. This is not yet available on PyPI, so you will need a copy of the 65 SasView source code to run it. Create a directory somewhere to hold the 66 sasview and sasmodels source code, which we will refer to as $SOURCE. 20 67 21 To run the example, you need sasview, sasmodels and bumps. Assuming these 22 repositories are installed side by side, change to the sasmodels/example 23 directory and enter:: 68 Use the following to install sasview, and the sasmodels examples:: 24 69 25 PYTHONPATH=..:../../sasview/src ../../bumps/run.py fit.py \ 26 cylinder --preview 70 (sasmodels) $ cd $SOURCE 71 (sasmodels) $ conda install git 72 (sasmodels) $ git clone https://github.com/sasview/sasview.git 73 (sasmodels) $ git clone https://github.com/sasview/sasmodels.git 27 74 28 See bumps documentation for instructions on running the fit. With the 29 python packages installed, e.g., into a virtual environment, then the 30 python path need not be set, and the command would be:: 75 Set the path to the sasview source on your python path within the sasmodels 76 environment. On Windows, this will be:: 31 77 32 bumps fit.py cylinder --preview 78 (sasmodels)> set PYTHONPATH="$SOURCE\sasview\src" 79 (sasmodels)> cd $SOURCE/sasmodels/example 80 (sasmodels)> python -m bumps.cli fit.py cylinder --preview 81 82 On Mac/Linux with the standard shell this will be:: 83 84 (sasmodels) $ export PYTHONPATH="$SOURCE/sasview/src" 85 (sasmodels) $ cd $SOURCE/sasmodels/example 86 (sasmodels) $ bumps fit.py cylinder --preview 33 87 34 88 The fit.py model accepts up to two arguments. The first argument is the … … 38 92 both radial and tangential simultaneously, use the word "both". 39 93 40 Notes 41 ----- 42 43 cylinder.c + cylinder.py is the cylinder model with renamed variables and 44 sld scaled by 1e6 so the numbers are nicer. The model name is "cylinder" 45 46 lamellar.py is an example of a single file model with embedded C code. 94 See `bumps documentation <https://bumps.readthedocs.io/>`_ for detailed 95 instructions on running the fit. 47 96 48 97 |TravisStatus|_ -
explore/beta/sasfit_compare.py
r2a12351b r119073a 505 505 } 506 506 507 Q, IQ = load_sasfit(data_file(' richard_test.txt'))508 Q, IQSD = load_sasfit(data_file(' richard_test2.txt'))509 Q, IQBD = load_sasfit(data_file(' richard_test3.txt'))507 Q, IQ = load_sasfit(data_file('sasfit_sphere_schulz_IQD.txt')) 508 Q, IQSD = load_sasfit(data_file('sasfit_sphere_schulz_IQSD.txt')) 509 Q, IQBD = load_sasfit(data_file('sasfit_sphere_schulz_IQBD.txt')) 510 510 target = Theory(Q=Q, F1=None, F2=None, P=IQ, S=None, I=IQSD, Seff=None, Ibeta=IQBD) 511 511 actual = sphere_r(Q, norm="sasfit", **pars) … … 526 526 } 527 527 528 Q, IQ = load_sasfit(data_file(' richard_test4.txt'))529 Q, IQSD = load_sasfit(data_file(' richard_test5.txt'))530 Q, IQBD = load_sasfit(data_file(' richard_test6.txt'))528 Q, IQ = load_sasfit(data_file('sasfit_ellipsoid_shulz_IQD.txt')) 529 Q, IQSD = load_sasfit(data_file('sasfit_ellipsoid_shulz_IQSD.txt')) 530 Q, IQBD = load_sasfit(data_file('sasfit_ellipsoid_shulz_IQBD.txt')) 531 531 target = Theory(Q=Q, F1=None, F2=None, P=IQ, S=None, I=IQSD, Seff=None, Ibeta=IQBD) 532 532 actual = ellipsoid_pe(Q, norm="sasfit", **pars) -
explore/precision.py
raa8c6e0 rcd28947 207 207 return model_info 208 208 209 # Hack to allow second parameter A in two parameter functions 209 # Hack to allow second parameter A in the gammainc and gammaincc functions. 210 # Create a 2-D variant of the precision test if we need to handle other two 211 # parameter functions. 210 212 A = 1 211 213 def parse_extra_pars(): 214 """ 215 Parse the command line looking for the second parameter "A=..." for the 216 gammainc/gammaincc functions. 217 """ 212 218 global A 213 219 … … 333 339 ) 334 340 add_function( 341 # Note: "a" is given as A=... on the command line via parse_extra_pars 335 342 name="gammainc(x)", 336 343 mp_function=lambda x, a=A: mp.gammainc(a, a=0, b=x)/mp.gamma(a), … … 339 346 ) 340 347 add_function( 348 # Note: "a" is given as A=... on the command line via parse_extra_pars 341 349 name="gammaincc(x)", 342 350 mp_function=lambda x, a=A: mp.gammainc(a, a=x, b=mp.inf)/mp.gamma(a), -
sasmodels/generate.py
r39a06c9 rcd28947 703 703 """ 704 704 for code in source: 705 m = _FQ_PATTERN.search(code) 706 if m is not None: 705 if _FQ_PATTERN.search(code) is not None: 707 706 return True 708 707 return False … … 712 711 # type: (List[str]) -> bool 713 712 """ 714 Return True if C source defines " void Fq(".713 Return True if C source defines "double shell_volume(". 715 714 """ 716 715 for code in source: 717 m = _SHELL_VOLUME_PATTERN.search(code) 718 if m is not None: 716 if _SHELL_VOLUME_PATTERN.search(code) is not None: 719 717 return True 720 718 return False -
sasmodels/kernel.py
r3199b17 r36a2418 135 135 nout = 2 if self.info.have_Fq and self.dim == '1d' else 1 136 136 total_weight = self.result[nout*self.q_input.nq + 0] 137 # Note: total_weight = sum(weight > cutoff), with cutoff >= 0, so it 138 # is okay to test directly against zero. If weight is zero then I(q), 139 # etc. must also be zero. 137 140 if total_weight == 0.: 138 141 total_weight = 1. -
sasmodels/models/pearl_necklace.c
r4453136 r9b5fd42 40 40 const double si = sas_sinx_x(q*A_s); 41 41 const double omsi = 1.0 - si; 42 const double pow_si = pow (si, num_pearls);42 const double pow_si = pown(si, num_pearls); 43 43 44 44 // form factor for num_pearls -
sasmodels/sasview_model.py
r5024a56 r3a1afed 695 695 return self._calculate_Iq(qx, qy) 696 696 697 def _calculate_Iq(self, qx, qy=None , Fq=False, effective_radius_type=1):697 def _calculate_Iq(self, qx, qy=None): 698 698 if self._model is None: 699 699 self._model = core.build_model(self._model_info) … … 715 715 #print("values", values) 716 716 #print("is_mag", is_magnetic) 717 if Fq:718 result = calculator.Fq(call_details, values, cutoff=self.cutoff,719 magnetic=is_magnetic,720 effective_radius_type=effective_radius_type)721 717 result = calculator(call_details, values, cutoff=self.cutoff, 722 718 magnetic=is_magnetic) … … 736 732 Calculate the effective radius for P(q)*S(q) 737 733 734 *mode* is the R_eff type, which defaults to 1 to match the ER 735 calculation for sasview models from version 3.x. 736 738 737 :return: the value of the effective radius 739 738 """ 740 Fq = self._calculate_Iq([0.1], True, mode) 741 return Fq[2] 739 # ER and VR are only needed for old multiplication models, based on 740 # sas.sascalc.fit.MultiplicationModel. Fail for now. If we want to 741 # continue supporting them then add some test cases so that the code 742 # is exercised. We can access ER/VR using the kernel Fq function by 743 # extending _calculate_Iq so that it calls: 744 # if er_mode > 0: 745 # res = calculator.Fq(call_details, values, cutoff=self.cutoff, 746 # magnetic=False, effective_radius_type=mode) 747 # R_eff, form_shell_ratio = res[2], res[4] 748 # return R_eff, form_shell_ratio 749 # Then use the following in calculate_ER: 750 # ER, VR = self._calculate_Iq(q=[0.1], er_mode=mode) 751 # return ER 752 # Similarly, for calculate_VR: 753 # ER, VR = self._calculate_Iq(q=[0.1], er_mode=1) 754 # return VR 755 # Obviously a combined calculate_ER_VR method would be better, but 756 # we only need them to support very old models, so ignore the 2x 757 # performance hit. 758 raise NotImplementedError("ER function is no longer available.") 742 759 743 760 def calculate_VR(self): … … 748 765 :return: the value of the form:shell volume ratio 749 766 """ 750 Fq = self._calculate_Iq([0.1], True, mode)751 r eturn Fq[4]767 # See comments in calculate_ER. 768 raise NotImplementedError("VR function is no longer available.") 752 769 753 770 def set_dispersion(self, parameter, dispersion): -
sasmodels/kernelcl.py
rf872fd1 r3199b17 61 61 62 62 63 # Attempt to setup opencl. This may fail if the pyopencl package is not63 # Attempt to setup OpenCL. This may fail if the pyopencl package is not 64 64 # installed or if it is installed but there are no devices available. 65 65 try: … … 67 67 from pyopencl import mem_flags as mf 68 68 from pyopencl.characterize import get_fast_inaccurate_build_options 69 # Ask OpenCL for the default context so that we know that one exists 69 # Ask OpenCL for the default context so that we know that one exists. 70 70 cl.create_some_context(interactive=False) 71 71 HAVE_OPENCL = True … … 88 88 # pylint: enable=unused-import 89 89 90 # CRUFT: pyopencl < 2017.1 (as of June 2016 needs quotes around include path) 90 91 # CRUFT: pyopencl < 2017.1 (as of June 2016 needs quotes around include path). 91 92 def quote_path(v): 92 93 """ … … 99 100 return '"'+v+'"' if v and ' ' in v and not v[0] in "\"'-" else v 100 101 102 101 103 def fix_pyopencl_include(): 102 104 """ … … 105 107 import pyopencl as cl 106 108 if hasattr(cl, '_DEFAULT_INCLUDE_OPTIONS'): 107 cl._DEFAULT_INCLUDE_OPTIONS = [quote_path(v) for v in cl._DEFAULT_INCLUDE_OPTIONS] 109 cl._DEFAULT_INCLUDE_OPTIONS = [ 110 quote_path(v) for v in cl._DEFAULT_INCLUDE_OPTIONS 111 ] 112 108 113 109 114 if HAVE_OPENCL: … … 118 123 MAX_LOOPS = 2048 119 124 120 121 125 # Pragmas for enable OpenCL features. Be sure to protect them so that they 122 126 # still compile even if OpenCL is not present. … … 133 137 """ 134 138 139 135 140 def use_opencl(): 136 141 sas_opencl = os.environ.get("SAS_OPENCL", "OpenCL").lower() 137 142 return HAVE_OPENCL and sas_opencl != "none" and not sas_opencl.startswith("cuda") 138 143 144 139 145 ENV = None 140 146 def reset_environment(): … … 144 150 global ENV 145 151 ENV = GpuEnvironment() if use_opencl() else None 152 146 153 147 154 def environment(): … … 161 168 return ENV 162 169 170 163 171 def has_type(device, dtype): 164 172 # type: (cl.Device, np.dtype) -> bool … … 171 179 return "cl_khr_fp64" in device.extensions 172 180 else: 173 # Not supporting F16 type since it isn't accurate enough 181 # Not supporting F16 type since it isn't accurate enough. 174 182 return False 183 175 184 176 185 def get_warp(kernel, queue): … … 182 191 cl.kernel_work_group_info.PREFERRED_WORK_GROUP_SIZE_MULTIPLE, 183 192 queue.device) 193 184 194 185 195 def compile_model(context, source, dtype, fast=False): … … 203 213 source_list.insert(0, _F64_PRAGMA) 204 214 205 # Note: USE_SINCOS makes the intel cpu slower under opencl215 # Note: USE_SINCOS makes the Intel CPU slower under OpenCL. 206 216 if context.devices[0].type == cl.device_type.GPU: 207 217 source_list.insert(0, "#define USE_SINCOS\n") … … 210 220 source = "\n".join(source_list) 211 221 program = cl.Program(context, source).build(options=options) 222 212 223 #print("done with "+program) 213 224 return program 214 225 215 226 216 # for now, this returns one device in the context217 # TODO: create a context that contains all devices on all platforms227 # For now, this returns one device in the context. 228 # TODO: Create a context that contains all devices on all platforms. 218 229 class GpuEnvironment(object): 219 230 """ 220 GPU context, with possibly many devices, and one queue per device. 221 222 Because the environment can be reset during a live program (e.g., if the 223 user changes the active GPU device in the GUI), everything associated 224 with the device context must be cached in the environment and recreated 225 if the environment changes. The *cache* attribute is a simple dictionary 226 which holds keys and references to objects, such as compiled kernels and 227 allocated buffers. The running program should check in the cache for 228 long lived objects and create them if they are not there. The program 229 should not hold onto cached objects, but instead only keep them active 230 for the duration of a function call. When the environment is destroyed 231 then the *release* method for each active cache item is called before 232 the environment is freed. This means that each cl buffer should be 233 in its own cache entry. 231 GPU context for OpenCL, with possibly many devices and one queue per device. 234 232 """ 235 233 def __init__(self): 236 234 # type: () -> None 237 # find gpu context235 # Find gpu context. 238 236 context_list = _create_some_context() 239 237 … … 249 247 self.context[dtype] = None 250 248 251 # Build a queue for each context 249 # Build a queue for each context. 252 250 self.queue = {} 253 251 context = self.context[F32] … … 259 257 self.queue[F64] = cl.CommandQueue(context, context.devices[0]) 260 258 261 # Byte boundary for data alignment259 ## Byte boundary for data alignment. 262 260 #self.data_boundary = max(context.devices[0].min_data_type_align_size 263 261 # for context in self.context.values()) 264 262 265 # Cache for compiled programs, and for items in context 263 # Cache for compiled programs, and for items in context. 266 264 self.compiled = {} 267 self.cache = {}268 265 269 266 def has_type(self, dtype): … … 280 277 """ 281 278 # Note: PyOpenCL caches based on md5 hash of source, options and device 282 # so we don't really need to cache things for ourselves. I'll do so 283 # anyway just to save some data munging time. 279 # but I'll do so as well just to save some data munging time. 284 280 tag = generate.tag_source(source) 285 281 key = "%s-%s-%s%s"%(name, dtype, tag, ("-fast" if fast else "")) 286 # Check timestamp on program 282 # Check timestamp on program. 287 283 program, program_timestamp = self.compiled.get(key, (None, np.inf)) 288 284 if program_timestamp < timestamp: … … 297 293 return program 298 294 299 def free_buffer(self, key):300 if key in self.cache:301 self.cache[key].release()302 del self.cache[key]303 304 def __del__(self):305 for v in self.cache.values():306 release = getattr(v, 'release', lambda: None)307 release()308 self.cache = {}309 310 _CURRENT_ID = 0311 def unique_id():312 global _CURRENT_ID313 _CURRENT_ID += 1314 return _CURRENT_ID315 295 316 296 def _create_some_context(): … … 325 305 which one (and not a CUDA device, or no GPU). 326 306 """ 327 # Assume we do not get here if SAS_OPENCL is None or CUDA 307 # Assume we do not get here if SAS_OPENCL is None or CUDA. 328 308 sas_opencl = os.environ.get('SAS_OPENCL', 'opencl') 329 309 if sas_opencl.lower() != 'opencl': 330 # Setting PYOPENCL_CTX as a SAS_OPENCL to create cl context 310 # Setting PYOPENCL_CTX as a SAS_OPENCL to create cl context. 331 311 os.environ["PYOPENCL_CTX"] = sas_opencl 332 312 … … 336 316 except Exception as exc: 337 317 warnings.warn(str(exc)) 338 warnings.warn("pyopencl.create_some_context() failed") 339 warnings.warn("the environment variable 'SAS_OPENCL' or 'PYOPENCL_CTX' might not be set correctly") 318 warnings.warn("pyopencl.create_some_context() failed. The " 319 "environment variable 'SAS_OPENCL' or 'PYOPENCL_CTX' might " 320 "not be set correctly") 340 321 341 322 return _get_default_context() 323 342 324 343 325 def _get_default_context(): … … 352 334 # is running may increase throughput. 353 335 # 354 # Mac book pro, base install:336 # MacBook Pro, base install: 355 337 # {'Apple': [Intel CPU, NVIDIA GPU]} 356 # Mac book pro, base install:338 # MacBook Pro, base install: 357 339 # {'Apple': [Intel CPU, Intel GPU]} 358 # 2 x nvidia 295 with Intel and NVIDIA opencl drivers installed340 # 2 x NVIDIA 295 with Intel and NVIDIA opencl drivers install: 359 341 # {'Intel': [CPU], 'NVIDIA': [GPU, GPU, GPU, GPU]} 360 342 gpu, cpu = None, None … … 379 361 else: 380 362 # System has cl.device_type.ACCELERATOR or cl.device_type.CUSTOM 381 # Intel Phi for example registers as an accelerator 363 # Intel Phi for example registers as an accelerator. 382 364 # Since the user installed a custom device on their system 383 365 # and went through the pain of sorting out OpenCL drivers for … … 386 368 gpu = device 387 369 388 # order the devices by gpu then by cpu; when searching for an available370 # Order the devices by gpu then by cpu; when searching for an available 389 371 # device by data type they will be checked in this order, which means 390 372 # that if the gpu supports double then the cpu will never be used (though … … 413 395 that the compiler is allowed to take shortcuts. 414 396 """ 397 info = None # type: ModelInfo 398 source = "" # type: str 399 dtype = None # type: np.dtype 400 fast = False # type: bool 401 _program = None # type: cl.Program 402 _kernels = None # type: Dict[str, cl.Kernel] 403 415 404 def __init__(self, source, model_info, dtype=generate.F32, fast=False): 416 405 # type: (Dict[str,str], ModelInfo, np.dtype, bool) -> None … … 419 408 self.dtype = dtype 420 409 self.fast = fast 421 self.timestamp = generate.ocl_timestamp(self.info)422 self._cache_key = unique_id()423 410 424 411 def __getstate__(self): … … 429 416 # type: (Tuple[ModelInfo, str, np.dtype, bool]) -> None 430 417 self.info, self.source, self.dtype, self.fast = state 418 self._program = self._kernels = None 431 419 432 420 def make_kernel(self, q_vectors): … … 434 422 return GpuKernel(self, q_vectors) 435 423 436 @property 437 def Iq(self): 438 return self._fetch_kernel('Iq') 439 440 def fetch_kernel(self, name): 424 def get_function(self, name): 441 425 # type: (str) -> cl.Kernel 442 426 """ … … 444 428 does not already exist. 445 429 """ 446 gpu = environment() 447 key = self._cache_key 448 if key not in gpu.cache: 449 program = gpu.compile_program( 450 self.info.name, 451 self.source['opencl'], 452 self.dtype, 453 self.fast, 454 self.timestamp) 455 variants = ['Iq', 'Iqxy', 'Imagnetic'] 456 names = [generate.kernel_name(self.info, k) for k in variants] 457 kernels = [getattr(program, k) for k in names] 458 data = dict((k, v) for k, v in zip(variants, kernels)) 459 # keep a handle to program so GC doesn't collect 460 data['program'] = program 461 gpu.cache[key] = data 462 else: 463 data = gpu.cache[key] 464 return data[name] 465 466 # TODO: check that we don't need a destructor for buffers which go out of scope 430 if self._program is None: 431 self._prepare_program() 432 return self._kernels[name] 433 434 def _prepare_program(self): 435 # type: (str) -> None 436 env = environment() 437 timestamp = generate.ocl_timestamp(self.info) 438 program = env.compile_program( 439 self.info.name, 440 self.source['opencl'], 441 self.dtype, 442 self.fast, 443 timestamp) 444 variants = ['Iq', 'Iqxy', 'Imagnetic'] 445 names = [generate.kernel_name(self.info, k) for k in variants] 446 functions = [getattr(program, k) for k in names] 447 self._kernels = {k: v for k, v in zip(variants, functions)} 448 # Keep a handle to program so GC doesn't collect. 449 self._program = program 450 451 452 # TODO: Check that we don't need a destructor for buffers which go out of scope. 467 453 class GpuInput(object): 468 454 """ … … 486 472 def __init__(self, q_vectors, dtype=generate.F32): 487 473 # type: (List[np.ndarray], np.dtype) -> None 488 # TODO: do we ever need double precision q?474 # TODO: Do we ever need double precision q? 489 475 self.nq = q_vectors[0].size 490 476 self.dtype = np.dtype(dtype) 491 477 self.is_2d = (len(q_vectors) == 2) 492 # TODO: stretch input based on get_warp()493 # not doing it now since warp depends on kernel, which is not known478 # TODO: Stretch input based on get_warp(). 479 # Not doing it now since warp depends on kernel, which is not known 494 480 # at this point, so instead using 32, which is good on the set of 495 481 # architectures tested so far. … … 504 490 self.q[:self.nq] = q_vectors[0] 505 491 self.global_size = [self.q.shape[0]] 506 self._cache_key = unique_id() 507 508 @property 509 def q_b(self): 510 """Lazy creation of q buffer so it can survive context reset""" 492 #print("creating inputs of size", self.global_size) 493 494 # Transfer input value to GPU. 511 495 env = environment() 512 key = self._cache_key 513 if key not in env.cache: 514 context = env.context[self.dtype] 515 #print("creating inputs of size", self.global_size) 516 buffer = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, 517 hostbuf=self.q) 518 env.cache[key] = buffer 519 return env.cache[key] 496 context = env.context[self.dtype] 497 self.q_b = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, 498 hostbuf=self.q) 520 499 521 500 def release(self): 522 501 # type: () -> None 523 502 """ 524 Free the buffer associated with the q value 525 """ 526 environment().free_buffer(id(self)) 503 Free the buffer associated with the q value. 504 """ 505 if self.q_b is not None: 506 self.q_b.release() 507 self.q_b = None 527 508 528 509 def __del__(self): … … 530 511 self.release() 531 512 513 532 514 class GpuKernel(Kernel): 533 515 """ … … 536 518 *model* is the GpuModel object to call 537 519 538 The following attributes are defined: 539 540 *info* is the module information 541 542 *dtype* is the kernel precision 543 544 *dim* is '1d' or '2d' 545 546 *result* is a vector to contain the results of the call 547 548 The resulting call method takes the *pars*, a list of values for 549 the fixed parameters to the kernel, and *pd_pars*, a list of (value,weight) 550 vectors for the polydisperse parameters. *cutoff* determines the 551 integration limits: any points with combined weight less than *cutoff* 552 will not be calculated. 520 The kernel is derived from :class:`Kernel`, providing the 521 :meth:`call_kernel` method to evaluate the kernel for a given set of 522 parameters. Because of the need to move the q values to the GPU before 523 evaluation, the kernel is instantiated for a particular set of q vectors, 524 and can be called many times without transfering q each time. 553 525 554 526 Call :meth:`release` when done with the kernel instance. 555 527 """ 528 #: SAS model information structure. 529 info = None # type: ModelInfo 530 #: Kernel precision. 531 dtype = None # type: np.dtype 532 #: Kernel dimensions (1d or 2d). 533 dim = "" # type: str 534 #: Calculation results, updated after each call to :meth:`_call_kernel`. 535 result = None # type: np.ndarray 536 556 537 def __init__(self, model, q_vectors): 557 # type: ( cl.Kernel, np.dtype, ModelInfo, List[np.ndarray]) -> None538 # type: (GpuModel, List[np.ndarray]) -> None 558 539 dtype = model.dtype 559 540 self.q_input = GpuInput(q_vectors, dtype) 560 541 self._model = model 561 # F16 isn't sufficient, so don't support it 562 self._as_dtype = np.float64 if dtype == generate.F64 else np.float32 563 self._cache_key = unique_id() 564 565 # attributes accessed from the outside 542 543 # Attributes accessed from the outside. 566 544 self.dim = '2d' if self.q_input.is_2d else '1d' 567 545 self.info = model.info 568 self.dtype = model.dtype 569 570 # holding place for the returned value 546 self.dtype = dtype 547 548 # Converter to translate input to target type. 549 self._as_dtype = np.float64 if dtype == generate.F64 else np.float32 550 551 # Holding place for the returned value. 571 552 nout = 2 if self.info.have_Fq and self.dim == '1d' else 1 572 extra_q = 4 # total weight, form volume, shell volume and R_eff 573 self.result = np.empty(self.q_input.nq*nout+extra_q, dtype) 574 575 @property 576 def _result_b(self): 577 """Lazy creation of result buffer so it can survive context reset""" 553 extra_q = 4 # Total weight, form volume, shell volume and R_eff. 554 self.result = np.empty(self.q_input.nq*nout + extra_q, dtype) 555 556 # Allocate result value on GPU. 578 557 env = environment() 579 key = self._cache_key 580 if key not in env.cache: 581 context = env.context[self.dtype] 582 width = ((self.result.size+31)//32)*32 * self.dtype.itemsize 583 buffer = cl.Buffer(context, mf.READ_WRITE, width) 584 env.cache[key] = buffer 585 return env.cache[key] 586 587 def _call_kernel(self, call_details, values, cutoff, magnetic, effective_radius_type): 588 # type: (CallDetails, np.ndarray, np.ndarray, float, bool) -> np.ndarray 558 context = env.context[self.dtype] 559 width = ((self.result.size+31)//32)*32 * self.dtype.itemsize 560 self._result_b = cl.Buffer(context, mf.READ_WRITE, width) 561 562 def _call_kernel(self, call_details, values, cutoff, magnetic, 563 effective_radius_type): 564 # type: (CallDetails, np.ndarray, float, bool, int) -> np.ndarray 589 565 env = environment() 590 566 queue = env.queue[self._model.dtype] 591 567 context = queue.context 592 568 593 # Arrange data transfer to/from card 594 q_b = self.q_input.q_b 595 result_b = self._result_b 569 # Arrange data transfer to card. 596 570 details_b = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, 597 571 hostbuf=call_details.buffer) … … 599 573 hostbuf=values) 600 574 575 # Setup kernel function and arguments. 601 576 name = 'Iq' if self.dim == '1d' else 'Imagnetic' if magnetic else 'Iqxy' 602 kernel = self._model. fetch_kernel(name)577 kernel = self._model.get_function(name) 603 578 kernel_args = [ 604 np.uint32(self.q_input.nq), None, None, 605 details_b, values_b, q_b, result_b, 606 self._as_dtype(cutoff), 607 np.uint32(effective_radius_type), 579 np.uint32(self.q_input.nq), # Number of inputs. 580 None, # Placeholder for pd_start. 581 None, # Placeholder for pd_stop. 582 details_b, # Problem definition. 583 values_b, # Parameter values. 584 self.q_input.q_b, # Q values. 585 self._result_b, # Result storage. 586 self._as_dtype(cutoff), # Probability cutoff. 587 np.uint32(effective_radius_type), # R_eff mode. 608 588 ] 589 590 # Call kernel and retrieve results. 609 591 #print("Calling OpenCL") 610 592 #call_details.show(values) 611 #Call kernel and retrieve results612 593 wait_for = None 613 594 last_nap = time.clock() … … 620 601 *kernel_args, wait_for=wait_for)] 621 602 if stop < call_details.num_eval: 622 # Allow other processes to run 603 # Allow other processes to run. 623 604 wait_for[0].wait() 624 605 current_time = time.clock() … … 626 607 time.sleep(0.001) 627 608 last_nap = current_time 628 cl.enqueue_copy(queue, self.result, result_b, wait_for=wait_for)609 cl.enqueue_copy(queue, self.result, self._result_b, wait_for=wait_for) 629 610 #print("result", self.result) 630 611 631 # Free buffers 632 for v in (details_b, values_b): 633 if v is not None: 634 v.release() 612 # Free buffers. 613 details_b.release() 614 values_b.release() 635 615 636 616 def release(self): … … 639 619 Release resources associated with the kernel. 640 620 """ 641 environment().free_buffer(id(self))642 621 self.q_input.release() 622 if self._result_b is not None: 623 self._result_b.release() 624 self._result_b = None 643 625 644 626 def __del__(self): -
sasmodels/kernelcuda.py
rf872fd1 rfa26e78 63 63 import time 64 64 import re 65 import atexit 65 66 66 67 import numpy as np # type: ignore 67 68 68 69 69 # Attempt to setup cuda. This may fail if the pycuda package is not70 # Attempt to setup CUDA. This may fail if the pycuda package is not 70 71 # installed or if it is installed but there are no devices available. 71 72 try: … … 107 108 MAX_LOOPS = 2048 108 109 110 109 111 def use_cuda(): 110 env = os.environ.get("SAS_OPENCL", "").lower() 111 return HAVE_CUDA and (env == "" or env.startswith("cuda")) 112 sas_opencl = os.environ.get("SAS_OPENCL", "CUDA").lower() 113 return HAVE_CUDA and sas_opencl.startswith("cuda") 114 112 115 113 116 ENV = None … … 121 124 ENV.release() 122 125 ENV = GpuEnvironment() if use_cuda() else None 126 123 127 124 128 def environment(): … … 138 142 return ENV 139 143 144 145 # PyTest is not freeing ENV, so make sure it gets freed. 146 atexit.register(lambda: ENV.release() if ENV is not None else None) 147 148 140 149 def has_type(dtype): 141 150 # type: (np.dtype) -> bool … … 143 152 Return true if device supports the requested precision. 144 153 """ 145 # Assume the nvidiacard supports 32-bit and 64-bit floats.146 # TODO: check if pycuda support F16154 # Assume the NVIDIA card supports 32-bit and 64-bit floats. 155 # TODO: Check if pycuda support F16. 147 156 return dtype in (generate.F32, generate.F64) 148 157 149 158 150 159 FUNCTION_PATTERN = re.compile(r"""^ 151 (?P<space>\s*) # initial space152 (?P<qualifiers>^(?:\s*\b\w+\b\s*)+) # one or more qualifiers before function153 (?P<function>\s*\b\w+\b\s*[(]) # function name plus open parens160 (?P<space>\s*) # Initial space. 161 (?P<qualifiers>^(?:\s*\b\w+\b\s*)+) # One or more qualifiers before function. 162 (?P<function>\s*\b\w+\b\s*[(]) # Function name plus open parens. 154 163 """, re.VERBOSE|re.MULTILINE) 155 164 … … 158 167 """, re.VERBOSE|re.MULTILINE) 159 168 169 160 170 def _add_device_tag(match): 161 171 # type: (None) -> str 162 # Note: should be re.Match, but that isn't a simple type172 # Note: Should be re.Match, but that isn't a simple type. 163 173 """ 164 174 replace qualifiers with __device__ qualifiers if needed … … 173 183 return "".join((space, "__device__ ", qualifiers, function)) 174 184 185 175 186 def mark_device_functions(source): 176 187 # type: (str) -> str … … 179 190 """ 180 191 return FUNCTION_PATTERN.sub(_add_device_tag, source) 192 181 193 182 194 def show_device_functions(source): … … 188 200 print(match.group('qualifiers').replace('\n',r'\n'), match.group('function'), '(') 189 201 return source 202 190 203 191 204 def compile_model(source, dtype, fast=False): … … 212 225 #options = ['--verbose', '-E'] 213 226 options = ['--use_fast_math'] if fast else None 214 program = SourceModule(source, no_extern_c=True, options=options) # include_dirs=[...]227 program = SourceModule(source, no_extern_c=True, options=options) #, include_dirs=[...]) 215 228 216 229 #print("done with "+program) … … 218 231 219 232 220 # for now, this returns one device in the context221 # TODO: create a context that contains all devices on all platforms233 # For now, this returns one device in the context. 234 # TODO: Create a context that contains all devices on all platforms. 222 235 class GpuEnvironment(object): 223 236 """ 224 GPU context , with possibly many devices, and one queue per device.237 GPU context for CUDA. 225 238 """ 226 239 context = None # type: cuda.Context 227 240 def __init__(self, devnum=None): 228 241 # type: (int) -> None 229 # Byte boundary for data alignment230 #self.data_boundary = max(d.min_data_type_align_size231 # for d in self.context.devices)232 self.compiled = {}233 242 env = os.environ.get("SAS_OPENCL", "").lower() 234 243 if devnum is None and env.startswith("cuda:"): 235 244 devnum = int(env[5:]) 245 236 246 # Set the global context to the particular device number if one is 237 247 # given, otherwise use the default context. Perhaps this will be set … … 242 252 self.context = make_default_context() 243 253 254 ## Byte boundary for data alignment. 255 #self.data_boundary = max(d.min_data_type_align_size 256 # for d in self.context.devices) 257 258 # Cache for compiled programs, and for items in context. 259 self.compiled = {} 260 244 261 def release(self): 245 262 if self.context is not None: … … 262 279 Compile the program for the device in the given context. 263 280 """ 264 # Note: PyOpenCL caches based on md5 hash of source, options and device 265 # so we don't really need to cache things for ourselves. I'll do so 266 # anyway just to save some data munging time. 281 # Note: PyCuda (probably) caches but I'll do so as well just to 282 # save some data munging time. 267 283 tag = generate.tag_source(source) 268 284 key = "%s-%s-%s%s"%(name, dtype, tag, ("-fast" if fast else "")) 269 # Check timestamp on program 285 # Check timestamp on program. 270 286 program, program_timestamp = self.compiled.get(key, (None, np.inf)) 271 287 if program_timestamp < timestamp: … … 277 293 return program 278 294 295 279 296 class GpuModel(KernelModel): 280 297 """ … … 292 309 that the compiler is allowed to take shortcuts. 293 310 """ 294 info = None # type: ModelInfo295 source = "" # type: str296 dtype = None # type: np.dtype297 fast = False # type: bool298 program = None# type: SourceModule299 _kernels = None # type: List[cuda.Function]311 info = None # type: ModelInfo 312 source = "" # type: str 313 dtype = None # type: np.dtype 314 fast = False # type: bool 315 _program = None # type: SourceModule 316 _kernels = None # type: Dict[str, cuda.Function] 300 317 301 318 def __init__(self, source, model_info, dtype=generate.F32, fast=False): … … 305 322 self.dtype = dtype 306 323 self.fast = fast 307 self.program = None # delay program creation308 self._kernels = None309 324 310 325 def __getstate__(self): … … 315 330 # type: (Tuple[ModelInfo, str, np.dtype, bool]) -> None 316 331 self.info, self.source, self.dtype, self.fast = state 317 self. program= None332 self._program = self._kernels = None 318 333 319 334 def make_kernel(self, q_vectors): 320 335 # type: (List[np.ndarray]) -> "GpuKernel" 321 if self.program is None: 322 compile_program = environment().compile_program 323 timestamp = generate.ocl_timestamp(self.info) 324 self.program = compile_program( 325 self.info.name, 326 self.source['opencl'], 327 self.dtype, 328 self.fast, 329 timestamp) 330 variants = ['Iq', 'Iqxy', 'Imagnetic'] 331 names = [generate.kernel_name(self.info, k) for k in variants] 332 kernels = [self.program.get_function(k) for k in names] 333 self._kernels = dict((k, v) for k, v in zip(variants, kernels)) 334 is_2d = len(q_vectors) == 2 335 if is_2d: 336 kernel = [self._kernels['Iqxy'], self._kernels['Imagnetic']] 337 else: 338 kernel = [self._kernels['Iq']]*2 339 return GpuKernel(kernel, self.dtype, self.info, q_vectors) 340 341 def release(self): 342 # type: () -> None 343 """ 344 Free the resources associated with the model. 345 """ 346 if self.program is not None: 347 self.program = None 348 349 def __del__(self): 350 # type: () -> None 351 self.release() 352 353 # TODO: check that we don't need a destructor for buffers which go out of scope 336 return GpuKernel(self, q_vectors) 337 338 def get_function(self, name): 339 # type: (str) -> cuda.Function 340 """ 341 Fetch the kernel from the environment by name, compiling it if it 342 does not already exist. 343 """ 344 if self._program is None: 345 self._prepare_program() 346 return self._kernels[name] 347 348 def _prepare_program(self): 349 # type: (str) -> None 350 env = environment() 351 timestamp = generate.ocl_timestamp(self.info) 352 program = env.compile_program( 353 self.info.name, 354 self.source['opencl'], 355 self.dtype, 356 self.fast, 357 timestamp) 358 variants = ['Iq', 'Iqxy', 'Imagnetic'] 359 names = [generate.kernel_name(self.info, k) for k in variants] 360 functions = [program.get_function(k) for k in names] 361 self._kernels = {k: v for k, v in zip(variants, functions)} 362 # Keep a handle to program so GC doesn't collect. 363 self._program = program 364 365 366 # TODO: Check that we don't need a destructor for buffers which go out of scope. 354 367 class GpuInput(object): 355 368 """ … … 373 386 def __init__(self, q_vectors, dtype=generate.F32): 374 387 # type: (List[np.ndarray], np.dtype) -> None 375 # TODO: do we ever need double precision q?388 # TODO: Do we ever need double precision q? 376 389 self.nq = q_vectors[0].size 377 390 self.dtype = np.dtype(dtype) 378 391 self.is_2d = (len(q_vectors) == 2) 379 # TODO: stretch input based on get_warp()380 # not doing it now since warp depends on kernel, which is not known392 # TODO: Stretch input based on get_warp(). 393 # Not doing it now since warp depends on kernel, which is not known 381 394 # at this point, so instead using 32, which is good on the set of 382 395 # architectures tested so far. 383 396 if self.is_2d: 384 # Note: 16 rather than 15 because result is 1 longer than input. 385 width = ((self.nq+16)//16)*16 397 width = ((self.nq+15)//16)*16 386 398 self.q = np.empty((width, 2), dtype=dtype) 387 399 self.q[:self.nq, 0] = q_vectors[0] 388 400 self.q[:self.nq, 1] = q_vectors[1] 389 401 else: 390 # Note: 32 rather than 31 because result is 1 longer than input. 391 width = ((self.nq+32)//32)*32 402 width = ((self.nq+31)//32)*32 392 403 self.q = np.empty(width, dtype=dtype) 393 404 self.q[:self.nq] = q_vectors[0] 394 405 self.global_size = [self.q.shape[0]] 395 406 #print("creating inputs of size", self.global_size) 407 408 # Transfer input value to GPU. 396 409 self.q_b = cuda.to_device(self.q) 397 410 … … 399 412 # type: () -> None 400 413 """ 401 Free the memory.414 Free the buffer associated with the q value. 402 415 """ 403 416 if self.q_b is not None: … … 409 422 self.release() 410 423 424 411 425 class GpuKernel(Kernel): 412 426 """ 413 427 Callable SAS kernel. 414 428 415 *kernel* is the GpuKernel object to call 416 417 *model_info* is the module information 418 419 *q_vectors* is the q vectors at which the kernel should be evaluated 420 421 *dtype* is the kernel precision 422 423 The resulting call method takes the *pars*, a list of values for 424 the fixed parameters to the kernel, and *pd_pars*, a list of (value,weight) 425 vectors for the polydisperse parameters. *cutoff* determines the 426 integration limits: any points with combined weight less than *cutoff* 427 will not be calculated. 429 *model* is the GpuModel object to call 430 431 The kernel is derived from :class:`Kernel`, providing the 432 :meth:`call_kernel` method to evaluate the kernel for a given set of 433 parameters. Because of the need to move the q values to the GPU before 434 evaluation, the kernel is instantiated for a particular set of q vectors, 435 and can be called many times without transfering q each time. 428 436 429 437 Call :meth:`release` when done with the kernel instance. 430 438 """ 431 def __init__(self, kernel, dtype, model_info, q_vectors): 432 # type: (cl.Kernel, np.dtype, ModelInfo, List[np.ndarray]) -> None 439 #: SAS model information structure. 440 info = None # type: ModelInfo 441 #: Kernel precision. 442 dtype = None # type: np.dtype 443 #: Kernel dimensions (1d or 2d). 444 dim = "" # type: str 445 #: Calculation results, updated after each call to :meth:`_call_kernel`. 446 result = None # type: np.ndarray 447 448 def __init__(self, model, q_vectors): 449 # type: (GpuModel, List[np.ndarray]) -> None 450 dtype = model.dtype 433 451 self.q_input = GpuInput(q_vectors, dtype) 434 self.kernel = kernel 435 # F16 isn't sufficient, so don't support it 452 self._model = model 453 454 # Attributes accessed from the outside. 455 self.dim = '2d' if self.q_input.is_2d else '1d' 456 self.info = model.info 457 self.dtype = dtype 458 459 # Converter to translate input to target type. 436 460 self._as_dtype = np.float64 if dtype == generate.F64 else np.float32 437 461 438 # attributes accessed from the outside 439 self.dim = '2d' if self.q_input.is_2d else '1d' 440 self.info = model_info 441 self.dtype = dtype 442 443 # holding place for the returned value 462 # Holding place for the returned value. 444 463 nout = 2 if self.info.have_Fq and self.dim == '1d' else 1 445 extra_q = 4 # total weight, form volume, shell volume and R_eff 446 self.result = np.empty(self.q_input.nq*nout+extra_q, dtype) 447 448 # Inputs and outputs for each kernel call 449 # Note: res may be shorter than res_b if global_size != nq 464 extra_q = 4 # Total weight, form volume, shell volume and R_eff. 465 self.result = np.empty(self.q_input.nq*nout + extra_q, dtype) 466 467 # Allocate result value on GPU. 450 468 width = ((self.result.size+31)//32)*32 * self.dtype.itemsize 451 self.result_b = cuda.mem_alloc(width) 452 self._need_release = [self.result_b] 453 454 def _call_kernel(self, call_details, values, cutoff, magnetic, effective_radius_type): 455 # type: (CallDetails, np.ndarray, np.ndarray, float, bool) -> np.ndarray 456 # Arrange data transfer to card 469 self._result_b = cuda.mem_alloc(width) 470 471 def _call_kernel(self, call_details, values, cutoff, magnetic, 472 effective_radius_type): 473 # type: (CallDetails, np.ndarray, float, bool, int) -> np.ndarray 474 475 # Arrange data transfer to card. 457 476 details_b = cuda.to_device(call_details.buffer) 458 477 values_b = cuda.to_device(values) 459 478 460 kernel = self.kernel[1 if magnetic else 0] 461 args = [ 462 np.uint32(self.q_input.nq), None, None, 463 details_b, values_b, self.q_input.q_b, self.result_b, 464 self._as_dtype(cutoff), 465 np.uint32(effective_radius_type), 479 # Setup kernel function and arguments. 480 name = 'Iq' if self.dim == '1d' else 'Imagnetic' if magnetic else 'Iqxy' 481 kernel = self._model.get_function(name) 482 kernel_args = [ 483 np.uint32(self.q_input.nq), # Number of inputs. 484 None, # Placeholder for pd_start. 485 None, # Placeholder for pd_stop. 486 details_b, # Problem definition. 487 values_b, # Parameter values. 488 self.q_input.q_b, # Q values. 489 self._result_b, # Result storage. 490 self._as_dtype(cutoff), # Probability cutoff. 491 np.uint32(effective_radius_type), # R_eff mode. 466 492 ] 467 493 grid = partition(self.q_input.nq) 468 #print("Calling OpenCL") 494 495 # Call kernel and retrieve results. 496 #print("Calling CUDA") 469 497 #call_details.show(values) 470 # Call kernel and retrieve results471 498 last_nap = time.clock() 472 499 step = 100000000//self.q_input.nq + 1 … … 475 502 stop = min(start + step, call_details.num_eval) 476 503 #print("queuing",start,stop) 477 args[1:3] = [np.int32(start), np.int32(stop)]478 kernel(* args, **grid)504 kernel_args[1:3] = [np.int32(start), np.int32(stop)] 505 kernel(*kernel_args, **grid) 479 506 if stop < call_details.num_eval: 480 507 sync() 481 # Allow other processes to run 508 # Allow other processes to run. 482 509 current_time = time.clock() 483 510 if current_time - last_nap > 0.5: … … 485 512 last_nap = current_time 486 513 sync() 487 cuda.memcpy_dtoh(self.result, self. result_b)514 cuda.memcpy_dtoh(self.result, self._result_b) 488 515 #print("result", self.result) 489 516 … … 496 523 Release resources associated with the kernel. 497 524 """ 498 for p in self._need_release: 499 p.free() 500 self._need_release = [] 525 self.q_input.release() 526 if self._result_b is not None: 527 self._result_b.free() 528 self._result_b = None 501 529 502 530 def __del__(self): … … 512 540 Note: Maybe context.synchronize() is sufficient. 513 541 """ 514 #return # The following works in C++; don't know what pycuda is doing 515 # Create an event with which to synchronize 542 # Create an event with which to synchronize. 516 543 done = cuda.Event() 517 544 … … 519 546 done.record() 520 547 521 # line added to not hog resources548 # Make sure we don't hog resource while waiting to sync. 522 549 while not done.query(): 523 550 time.sleep(0.01) … … 525 552 # Block until the GPU executes the kernel. 526 553 done.synchronize() 554 527 555 # Clean up the event; I don't think they can be reused. 528 556 del done -
sasmodels/kerneldll.py
re44432d r3199b17 100 100 # pylint: enable=unused-import 101 101 102 # Compiler output is a byte stream that needs to be decode in python 3 102 # Compiler output is a byte stream that needs to be decode in python 3. 103 103 decode = (lambda s: s) if sys.version_info[0] < 3 else (lambda s: s.decode('utf8')) 104 104 … … 115 115 COMPILER = "tinycc" 116 116 elif "VCINSTALLDIR" in os.environ: 117 # If vcvarsall.bat has been called, then VCINSTALLDIR is in the environment 118 # and we can use the MSVC compiler. Otherwise, if tinycc is available 119 # the use it. Otherwise, hope that mingw is available. 117 # If vcvarsall.bat has been called, then VCINSTALLDIR is in the 118 # environment and we can use the MSVC compiler. Otherwise, if 119 # tinycc is available then use it. Otherwise, hope that mingw 120 # is available. 120 121 COMPILER = "msvc" 121 122 else: … … 124 125 COMPILER = "unix" 125 126 126 ARCH = "" if ct.sizeof(ct.c_void_p) > 4 else "x86" # 4 byte pointers on x86 127 ARCH = "" if ct.sizeof(ct.c_void_p) > 4 else "x86" # 4 byte pointers on x86. 127 128 if COMPILER == "unix": 128 # Generic unix compile 129 # On mac users will need the X code command line tools installed129 # Generic unix compile. 130 # On Mac users will need the X code command line tools installed. 130 131 #COMPILE = "gcc-mp-4.7 -shared -fPIC -std=c99 -fopenmp -O2 -Wall %s -o %s -lm -lgomp" 131 132 CC = "cc -shared -fPIC -std=c99 -O2 -Wall".split() 132 # add openmp support if not running on a mac133 # Add OpenMP support if not running on a Mac. 133 134 if sys.platform != "darwin": 134 # OpenMP seems to be broken on gcc 5.4.0 (ubuntu 16.04.9) 135 # OpenMP seems to be broken on gcc 5.4.0 (ubuntu 16.04.9). 135 136 # Shut it off for all unix until we can investigate. 136 137 #CC.append("-fopenmp") … … 144 145 # vcomp90.dll on the path. One may be found here: 145 146 # C:/Windows/winsxs/x86_microsoft.vc90.openmp*/vcomp90.dll 146 # Copy this to the python directory and uncomment the OpenMP COMPILE 147 # TODO: remove intermediate OBJ file created in the directory148 # TODO: maybe don't use randomized name for the c file149 # TODO: maybe ask distutils to find MSVC147 # Copy this to the python directory and uncomment the OpenMP COMPILE. 148 # TODO: Remove intermediate OBJ file created in the directory. 149 # TODO: Maybe don't use randomized name for the c file. 150 # TODO: Maybe ask distutils to find MSVC. 150 151 CC = "cl /nologo /Ox /MD /W3 /GS- /DNDEBUG".split() 151 152 if "SAS_OPENMP" in os.environ: … … 172 173 ALLOW_SINGLE_PRECISION_DLLS = True 173 174 175 174 176 def compile(source, output): 175 177 # type: (str, str) -> None … … 183 185 logging.info(command_str) 184 186 try: 185 # need shell=True on windows to keep console box from popping up187 # Need shell=True on windows to keep console box from popping up. 186 188 shell = (os.name == 'nt') 187 189 subprocess.check_output(command, shell=shell, stderr=subprocess.STDOUT) … … 192 194 raise RuntimeError("compile failed. File is in %r"%source) 193 195 196 194 197 def dll_name(model_info, dtype): 195 198 # type: (ModelInfo, np.dtype) -> str … … 202 205 basename += ARCH + ".so" 203 206 204 # Hack to find precompiled dlls 207 # Hack to find precompiled dlls. 205 208 path = joinpath(generate.DATA_PATH, '..', 'compiled_models', basename) 206 209 if os.path.exists(path): … … 242 245 raise ValueError("16 bit floats not supported") 243 246 if dtype == F32 and not ALLOW_SINGLE_PRECISION_DLLS: 244 dtype = F64 # Force 64-bit dll 245 # Note: dtype may be F128 for long double precision 247 dtype = F64 # Force 64-bit dll. 248 # Note: dtype may be F128 for long double precision. 246 249 247 250 dll = dll_path(model_info, dtype) … … 254 257 need_recompile = dll_time < newest_source 255 258 if need_recompile: 256 # Make sure the DLL path exists 259 # Make sure the DLL path exists. 257 260 if not os.path.exists(SAS_DLL_PATH): 258 261 os.makedirs(SAS_DLL_PATH) … … 263 266 file_handle.write(source) 264 267 compile(source=filename, output=dll) 265 # comment the following to keep the generated c file266 # Note: if there is a syntax error then compile raises an error268 # Comment the following to keep the generated C file. 269 # Note: If there is a syntax error then compile raises an error 267 270 # and the source file will not be deleted. 268 271 os.unlink(filename) … … 303 306 self.dllpath = dllpath 304 307 self._dll = None # type: ct.CDLL 305 self._kernels = None # type: List[Callable, Callable]308 self._kernels = None # type: List[Callable, Callable] 306 309 self.dtype = np.dtype(dtype) 307 310 … … 338 341 # type: (List[np.ndarray]) -> DllKernel 339 342 q_input = PyInput(q_vectors, self.dtype) 340 # Note: pickle not supported for DllKernel343 # Note: DLL is lazy loaded. 341 344 if self._dll is None: 342 345 self._load_dll() … … 358 361 self._dll = None 359 362 363 360 364 class DllKernel(Kernel): 361 365 """ … … 379 383 def __init__(self, kernel, model_info, q_input): 380 384 # type: (Callable[[], np.ndarray], ModelInfo, PyInput) -> None 381 #,model_info,q_input) 385 dtype = q_input.dtype 386 self.q_input = q_input 382 387 self.kernel = kernel 388 389 # Attributes accessed from the outside. 390 self.dim = '2d' if q_input.is_2d else '1d' 383 391 self.info = model_info 384 self.q_input = q_input 385 self.dtype = q_input.dtype 386 self.dim = '2d' if q_input.is_2d else '1d' 387 # leave room for f1/f2 results in case we need to compute beta for 1d models 392 self.dtype = dtype 393 394 # Converter to translate input to target type. 395 self._as_dtype = (np.float32 if dtype == generate.F32 396 else np.float64 if dtype == generate.F64 397 else np.float128) 398 399 # Holding place for the returned value. 388 400 nout = 2 if self.info.have_Fq else 1 389 # +4 for total weight, shell volume, effective radius, form volume390 self.result = np.empty( q_input.nq*nout + 4, self.dtype)391 self.real = (np.float32 if self.q_input.dtype == generate.F32 392 else np.float64 if self.q_input.dtype == generate.F64393 e lse np.float128)394 395 def _call_kernel(self, call_details, values, cutoff, magnetic, effective_radius_type): 396 # type: (CallDetails, np.ndarray, np.ndarray, float, bool, int) -> np.ndarray401 extra_q = 4 # Total weight, form volume, shell volume and R_eff. 402 self.result = np.empty(self.q_input.nq*nout + extra_q, dtype) 403 404 def _call_kernel(self, call_details, values, cutoff, magnetic, 405 effective_radius_type): 406 # type: (CallDetails, np.ndarray, float, bool, int) -> np.ndarray 407 408 # Setup kernel function and arguments. 397 409 kernel = self.kernel[1 if magnetic else 0] 398 args = [399 self.q_input.nq, # nq400 None, # pd_start401 None, # pd_stop pd_stride[MAX_PD]402 call_details.buffer.ctypes.data, # problem403 values.ctypes.data, # pars404 self.q_input.q.ctypes.data, # q405 self.result.ctypes.data, # results406 self. real(cutoff), # cutoff407 effective_radius_type, # cutoff410 kernel_args = [ 411 self.q_input.nq, # Number of inputs. 412 None, # Placeholder for pd_start. 413 None, # Placeholder for pd_stop. 414 call_details.buffer.ctypes.data, # Problem definition. 415 values.ctypes.data, # Parameter values. 416 self.q_input.q.ctypes.data, # Q values. 417 self.result.ctypes.data, # Result storage. 418 self._as_dtype(cutoff), # Probability cutoff. 419 effective_radius_type, # R_eff mode. 408 420 ] 421 422 # Call kernel and retrieve results. 409 423 #print("Calling DLL") 410 424 #call_details.show(values) 411 425 step = 100 426 # TODO: Do we need the explicit sleep like the OpenCL and CUDA loops? 412 427 for start in range(0, call_details.num_eval, step): 413 428 stop = min(start + step, call_details.num_eval) 414 args[1:3] = [start, stop]415 kernel(* args) # type: ignore429 kernel_args[1:3] = [start, stop] 430 kernel(*kernel_args) # type: ignore 416 431 417 432 def release(self): 418 433 # type: () -> None 419 434 """ 420 Release anyresources associated with the kernel.435 Release resources associated with the kernel. 421 436 """ 422 self.q_input.release() 437 # TODO: OpenCL/CUDA allocate q_input in __init__ and free it in release. 438 # Should we be doing the same for DLL? 439 #self.q_input.release() 440 pass 441 442 def __del__(self): 443 # type: () -> None 444 self.release() -
sasmodels/kernelpy.py
raa8c6e0 r3199b17 33 33 logger = logging.getLogger(__name__) 34 34 35 35 36 class PyModel(KernelModel): 36 37 """ … … 38 39 """ 39 40 def __init__(self, model_info): 40 # Make sure Iq is available and vectorized 41 # Make sure Iq is available and vectorized. 41 42 _create_default_functions(model_info) 42 43 self.info = model_info … … 53 54 """ 54 55 pass 56 55 57 56 58 class PyInput(object): … … 91 93 self.q = None 92 94 95 93 96 class PyKernel(Kernel): 94 97 """ … … 131 134 parameter_vector = np.empty(len(partable.call_parameters)-2, 'd') 132 135 133 # Create views into the array to hold the arguments 136 # Create views into the array to hold the arguments. 134 137 offset = 0 135 138 kernel_args, volume_args = [], [] … … 174 177 else (lambda mode: 1.0)) 175 178 176 177 178 179 def _call_kernel(self, call_details, values, cutoff, magnetic, effective_radius_type): 179 180 # type: (CallDetails, np.ndarray, np.ndarray, float, bool) -> np.ndarray … … 195 196 self.q_input.release() 196 197 self.q_input = None 198 197 199 198 200 def _loops(parameters, # type: np.ndarray … … 254 256 total = np.zeros(nq, 'd') 255 257 for loop_index in range(call_details.num_eval): 256 # update polydispersity parameter values258 # Update polydispersity parameter values. 257 259 if p0_index == p0_length: 258 260 pd_index = (loop_index//pd_stride)%pd_length … … 265 267 p0_index += 1 266 268 if weight > cutoff: 267 # Call the scattering function 269 # Call the scattering function. 268 270 # Assume that NaNs are only generated if the parameters are bad; 269 271 # exclude all q for that NaN. Even better would be to have an … … 273 275 continue 274 276 275 # update value and norm277 # Update value and norm. 276 278 total += weight * Iq 277 279 weight_norm += weight … … 293 295 any functions that are not already marked as vectorized. 294 296 """ 295 # Note: must call create_vector_Iq before create_vector_Iqxy297 # Note: Must call create_vector_Iq before create_vector_Iqxy. 296 298 _create_vector_Iq(model_info) 297 299 _create_vector_Iqxy(model_info) -
sasmodels/model_test.py
r5024a56 r00afc15 167 167 # test using cuda if desired and available 168 168 if 'cuda' in loaders and use_cuda(): 169 test_name = "%s-cuda" %model_name169 test_name = "%s-cuda" % model_info.id 170 170 test_method_name = "test_%s_cuda" % model_info.id 171 171 # Using dtype=None so that the models that are only -
sasmodels/models/rpa.c
r71b751d r19dc29e7 25 25 double S0ba,Pbb,S0bb,Pbc,S0bc,Pbd,S0bd; 26 26 double S0ca,S0cb,Pcc,S0cc,Pcd,S0cd; 27 double S0da,S0db,S0dc;27 //double S0da,S0db,S0dc; 28 28 double Pdd,S0dd; 29 29 double Kaa,Kbb,Kcc; 30 30 double Kba,Kca,Kcb; 31 double Kda,Kdb,Kdc,Kdd;31 //double Kda,Kdb,Kdc,Kdd; 32 32 double Zaa,Zab,Zac,Zba,Zbb,Zbc,Zca,Zcb,Zcc; 33 33 double DenT,T11,T12,T13,T21,T22,T23,T31,T32,T33; … … 36 36 double N11,N12,N13,N21,N22,N23,N31,N32,N33; 37 37 double M11,M12,M13,M21,M22,M23,M31,M32,M33; 38 double S11,S12,S13,S14,S21,S22,S23,S24; 39 double S31,S32,S33,S34,S41,S42,S43,S44; 38 double S11,S12,S22,S23,S13,S33; 39 //double S21,S31,S32,S44; 40 //double S14,S24,S34,S41,S42,S43; 40 41 double Lad,Lbd,Lcd,Nav,Intg; 41 42 … … 115 116 S0cd=(Phicd*vcd*Ncd)*Pcd; 116 117 117 S0da=S0ad;118 S0db=S0bd;119 S0dc=S0cd;118 //S0da=S0ad; 119 //S0db=S0bd; 120 //S0dc=S0cd; 120 121 Pdd=2.0*(exp(-Xd)-1.0+Xd)/(Xd*Xd); // free D chain 121 122 S0dd=N[3]*Phi[3]*v[3]*Pdd; … … 198 199 S0ca=S0ac; 199 200 S0cb=S0bc; 200 S0da=S0ad;201 S0db=S0bd;202 S0dc=S0cd;201 //S0da=S0ad; 202 //S0db=S0bd; 203 //S0dc=S0cd; 203 204 204 205 // self chi parameter is 0 ... of course … … 206 207 Kbb=0.0; 207 208 Kcc=0.0; 208 Kdd=0.0;209 //Kdd=0.0; 209 210 210 211 Kba=Kab; 211 212 Kca=Kac; 212 213 Kcb=Kbc; 213 Kda=Kad;214 Kdb=Kbd;215 Kdc=Kcd;214 //Kda=Kad; 215 //Kdb=Kbd; 216 //Kdc=Kcd; 216 217 217 218 Zaa=Kaa-Kad-Kad; … … 294 295 S12= Q12*S0aa + Q22*S0ab + Q32*S0ac; 295 296 S13= Q13*S0aa + Q23*S0ab + Q33*S0ac; 296 S14=-S11-S12-S13;297 S21= Q11*S0ba + Q21*S0bb + Q31*S0bc;298 297 S22= Q12*S0ba + Q22*S0bb + Q32*S0bc; 299 298 S23= Q13*S0ba + Q23*S0bb + Q33*S0bc; 300 S24=-S21-S22-S23;301 S31= Q11*S0ca + Q21*S0cb + Q31*S0cc;302 S32= Q12*S0ca + Q22*S0cb + Q32*S0cc;303 299 S33= Q13*S0ca + Q23*S0cb + Q33*S0cc; 304 S34=-S31-S32-S33; 305 S41=S14; 306 S42=S24; 307 S43=S34; 308 S44=S11+S22+S33+2.0*S12+2.0*S13+2.0*S23; 300 //S21= Q11*S0ba + Q21*S0bb + Q31*S0bc; 301 //S31= Q11*S0ca + Q21*S0cb + Q31*S0cc; 302 //S32= Q12*S0ca + Q22*S0cb + Q32*S0cc; 303 //S44=S11+S22+S33+2.0*S12+2.0*S13+2.0*S23; 304 //S14=-S11-S12-S13; 305 //S24=-S21-S22-S23; 306 //S34=-S31-S32-S33; 307 //S41=S14; 308 //S42=S24; 309 //S43=S34; 309 310 310 311 //calculate contrast where L[i] is the scattering length of i and D is the matrix
Note: See TracChangeset
for help on using the changeset viewer.