Changes in / [31fc4ad:07646b6] in sasmodels


Ignore:
Files:
1 added
13 edited

Legend:

Unmodified
Added
Removed
  • doc/guide/gpu_setup.rst

    r63602b1 r8b31efa  
    9494Device Selection 
    9595================ 
     96**OpenCL drivers** 
     97 
    9698If you have multiple GPU devices you can tell the program which device to use. 
    9799By default, the program looks for one GPU and one CPU device from available 
     
    104106was used to run the model. 
    105107 
    106 **If you don't want to use OpenCL, you can set** *SAS_OPENCL=None* 
    107 **in your environment settings, and it will only use normal programs.** 
    108  
    109 If you want to use one of the other devices, you can run the following 
     108If you want to use a specific driver and devices, you can run the following 
    110109from the python console:: 
    111110 
     
    115114This will provide a menu of different OpenCL drivers available. 
    116115When one is selected, it will say "set PYOPENCL_CTX=..." 
    117 Use that value as the value of *SAS_OPENCL*. 
     116Use that value as the value of *SAS_OPENCL=driver:device*. 
     117 
     118To use the default OpenCL device (rather than CUDA or None), 
     119set *SAS_OPENCL=opencl*. 
     120 
     121In batch queues, you may need to set *XDG_CACHE_HOME=~/.cache*  
     122(Linux only) to a different directory, depending on how the filesystem  
     123is configured.  You should also set *SAS_DLL_PATH* for CPU-only modules. 
     124 
     125    -DSAS_MODELPATH=path sets directory containing custom models 
     126    -DSAS_OPENCL=vendor:device|cuda:device|none sets the target GPU device 
     127    -DXDG_CACHE_HOME=~/.cache sets the pyopencl cache root (linux only) 
     128    -DSAS_COMPILER=tinycc|msvc|mingw|unix sets the DLL compiler 
     129    -DSAS_OPENMP=1 turns on OpenMP for the DLLs 
     130    -DSAS_DLL_PATH=path sets the path to the compiled modules 
     131 
     132 
     133**CUDA drivers** 
     134 
     135If OpenCL drivers are not available on your system, but NVidia CUDA 
     136drivers are available, then set *SAS_OPENCL=cuda* or 
     137*SAS_OPENCL=cuda:n* for a particular device number *n*.  If no device 
     138number is specified, then the CUDA drivers looks for look for 
     139*CUDA_DEVICE=n* or a file ~/.cuda-device containing n for the device number. 
     140 
     141In batch queues, the SLURM command *sbatch --gres=gpu:1 ...* will set 
     142*CUDA_VISIBLE_DEVICES=n*, which ought to set the correct device 
     143number for *SAS_OPENCL=cuda*.  If not, then set 
     144*CUDA_DEVICE=$CUDA_VISIBLE_DEVICES* within the batch script.  You may 
     145need to set the CUDA cache directory to a folder accessible across the 
     146cluster with *PYCUDA_CACHE_DIR* (or *PYCUDA_DISABLE_CACHE* to disable 
     147caching), and you may need to set environment specific compiler flags 
     148with *PYCUDA_DEFAULT_NVCC_FLAGS*.  You should also set *SAS_DLL_PATH*  
     149for CPU-only modules. 
     150 
     151**No GPU support** 
     152 
     153If you don't want to use OpenCL or CUDA, you can set *SAS_OPENCL=None* 
     154in your environment settings, and it will only use normal programs. 
     155 
     156In batch queues, you may need to set *SAS_DLL_PATH* to a directory 
     157accessible on the compute node. 
     158 
    118159 
    119160Device Testing 
     
    154195*Document History* 
    155196 
    156 | 2017-09-27 Paul Kienzle 
     197| 2018-10-15 Paul Kienzle 
  • doc/guide/magnetism/magnetism.rst

    rbefe905 rdf87acf  
    8989 
    9090===========   ================================================================ 
    91  M0:sld       $D_M M_0$ 
    92  mtheta:sld   $\theta_M$ 
    93  mphi:sld     $\phi_M$ 
    94  up:angle     $\theta_\mathrm{up}$ 
    95  up:frac_i    $u_i$ = (spin up)/(spin up + spin down) *before* the sample 
    96  up:frac_f    $u_f$ = (spin up)/(spin up + spin down) *after* the sample 
     91 sld_M0       $D_M M_0$ 
     92 sld_mtheta   $\theta_M$ 
     93 sld_mphi     $\phi_M$ 
     94 up_frac_i    $u_i$ = (spin up)/(spin up + spin down) *before* the sample 
     95 up_frac_f    $u_f$ = (spin up)/(spin up + spin down) *after* the sample 
     96 up_angle     $\theta_\mathrm{up}$ 
    9797===========   ================================================================ 
    9898 
    9999.. note:: 
    100     The values of the 'up:frac_i' and 'up:frac_f' must be in the range 0 to 1. 
     100    The values of the 'up_frac_i' and 'up_frac_f' must be in the range 0 to 1. 
    101101 
    102102*Document History* 
  • sasmodels/compare.py

    r610ef23 r4de14584  
    4141from . import kerneldll 
    4242from . import kernelcl 
     43from . import kernelcuda 
    4344from .data import plot_theory, empty_data1D, empty_data2D, load_data 
    4445from .direct_model import DirectModel, get_mesh 
     
    115116    === environment variables === 
    116117    -DSAS_MODELPATH=path sets directory containing custom models 
    117     -DSAS_OPENCL=vendor:device|none sets the target OpenCL device 
     118    -DSAS_OPENCL=vendor:device|cuda:device|none sets the target GPU device 
    118119    -DXDG_CACHE_HOME=~/.cache sets the pyopencl cache root (linux only) 
    119120    -DSAS_COMPILER=tinycc|msvc|mingw|unix sets the DLL compiler 
     
    724725        set_integration_size(model_info, ngauss) 
    725726 
    726     if dtype != "default" and not dtype.endswith('!') and not kernelcl.use_opencl(): 
     727    if (dtype != "default" and not dtype.endswith('!')  
     728            and not (kernelcl.use_opencl() or kernelcuda.use_cuda())): 
    727729        raise RuntimeError("OpenCL not available " + kernelcl.OPENCL_ERROR) 
    728730 
  • sasmodels/core.py

    ree60aa7 rb0de252  
    2121from . import mixture 
    2222from . import kernelpy 
     23from . import kernelcuda 
    2324from . import kernelcl 
    2425from . import kerneldll 
     
    209210        #print("building dll", numpy_dtype) 
    210211        return kerneldll.load_dll(source['dll'], model_info, numpy_dtype) 
     212    elif platform == "cuda": 
     213        return kernelcuda.GpuModel(source, model_info, numpy_dtype, fast=fast) 
    211214    else: 
    212215        #print("building ocl", numpy_dtype) 
     
    244247    # type: (ModelInfo, str, str) -> (np.dtype, bool, str) 
    245248    """ 
    246     Interpret dtype string, returning np.dtype and fast flag. 
     249    Interpret dtype string, returning np.dtype, fast flag and platform. 
    247250 
    248251    Possible types include 'half', 'single', 'double' and 'quad'.  If the 
     
    252255    default for the model and platform. 
    253256 
    254     Platform preference can be specfied ("ocl" vs "dll"), with the default 
    255     being OpenCL if it is availabe.  If the dtype name ends with '!' then 
    256     platform is forced to be DLL rather than OpenCL. 
     257    Platform preference can be specfied ("ocl", "cuda", "dll"), with the 
     258    default being OpenCL or CUDA if available, otherwise DLL.  If the dtype 
     259    name ends with '!' then platform is forced to be DLL rather than GPU. 
     260    The default platform is set by the environment variable SAS_OPENCL, 
     261    SAS_OPENCL=driver:device for OpenCL, SAS_OPENCL=cuda:device for CUDA 
     262    or SAS_OPENCL=none for DLL. 
    257263 
    258264    This routine ignores the preferences within the model definition.  This 
     
    266272    if platform is None: 
    267273        platform = "ocl" 
    268     if not kernelcl.use_opencl() or not model_info.opencl: 
    269         platform = "dll" 
    270274 
    271275    # Check if type indicates dll regardless of which platform is given 
     
    273277        platform = "dll" 
    274278        dtype = dtype[:-1] 
     279 
     280    # Make sure model allows opencl/gpu 
     281    if not model_info.opencl: 
     282        platform = "dll" 
     283 
     284    # Make sure opencl is available, or fallback to cuda then to dll 
     285    if platform == "ocl" and not kernelcl.use_opencl(): 
     286        platform = "cuda" if kernelcuda.use_cuda() else "dll" 
    275287 
    276288    # Convert special type names "half", "fast", and "quad" 
     
    283295        dtype = "float16" 
    284296 
    285     # Convert dtype string to numpy dtype. 
     297    # Convert dtype string to numpy dtype.  Use single precision for GPU 
     298    # if model allows it, otherwise use double precision. 
    286299    if dtype is None or dtype == "default": 
    287         numpy_dtype = (generate.F32 if platform == "ocl" and model_info.single 
     300        numpy_dtype = (generate.F32 if model_info.single and platform in ("ocl", "cuda") 
    288301                       else generate.F64) 
    289302    else: 
    290303        numpy_dtype = np.dtype(dtype) 
    291304 
    292     # Make sure that the type is supported by opencl, otherwise use dll 
     305    # Make sure that the type is supported by GPU, otherwise use dll 
    293306    if platform == "ocl": 
    294307        env = kernelcl.environment() 
    295         if not env.has_type(numpy_dtype): 
    296             platform = "dll" 
    297             if dtype is None: 
    298                 numpy_dtype = generate.F64 
     308    elif platform == "cuda": 
     309        env = kernelcuda.environment() 
     310    else: 
     311        env = None 
     312    if env is not None and not env.has_type(numpy_dtype): 
     313        platform = "dll" 
     314        if dtype is None: 
     315            numpy_dtype = generate.F64 
    299316 
    300317    return numpy_dtype, fast, platform 
  • sasmodels/kernel_header.c

    r296c52b r74e9b5f  
    11#ifdef __OPENCL_VERSION__ 
    22# define USE_OPENCL 
     3#elif defined(__CUDACC__) 
     4# define USE_CUDA 
    35#elif defined(_OPENMP) 
    46# define USE_OPENMP 
    57#endif 
     8 
     9// Use SAS_DOUBLE to force the use of double even for float kernels 
     10#define SAS_DOUBLE dou ## ble 
    611 
    712// If opencl is not available, then we are compiling a C function 
    813// Note: if using a C++ compiler, then define kernel as extern "C" 
    914#ifdef USE_OPENCL 
     15 
     16   #define USE_GPU 
     17   #define pglobal global 
     18   #define pconstant constant 
     19 
    1020   typedef int int32_t; 
    11 #  if defined(USE_SINCOS) 
    12 #    define SINCOS(angle,svar,cvar) svar=sincos(angle,&cvar) 
    13 #  else 
    14 #    define SINCOS(angle,svar,cvar) do {const double _t_=angle; svar=sin(_t_);cvar=cos(_t_);} while (0) 
    15 #  endif 
     21 
     22   #if defined(USE_SINCOS) 
     23   #  define SINCOS(angle,svar,cvar) svar=sincos(angle,&cvar) 
     24   #else 
     25   #  define SINCOS(angle,svar,cvar) do {const double _t_=angle; svar=sin(_t_);cvar=cos(_t_);} while (0) 
     26   #endif 
    1627   // Intel CPU on Mac gives strange values for erf(); on the verified 
    1728   // platforms (intel, nvidia, amd), the cephes erf() is significantly 
     
    2435   #  define erfcf erfc 
    2536   #endif 
    26 #else // !USE_OPENCL 
    27 // Use SAS_DOUBLE to force the use of double even for float kernels 
    28 #  define SAS_DOUBLE dou ## ble 
    29 #  ifdef __cplusplus 
     37 
     38#elif defined(USE_CUDA) 
     39 
     40   #define USE_GPU 
     41   #define local __shared__ 
     42   #define pglobal 
     43   #define constant __constant__ 
     44   #define pconstant const 
     45   #define kernel extern "C" __global__ 
     46 
     47   // OpenCL powr(a,b) = C99 pow(a,b), b >= 0 
     48   // OpenCL pown(a,b) = C99 pow(a,b), b integer 
     49   #define powr(a,b) pow(a,b) 
     50   #define pown(a,b) pow(a,b) 
     51   //typedef int int32_t; 
     52   #if defined(USE_SINCOS) 
     53   #  define SINCOS(angle,svar,cvar) sincos(angle,&svar,&cvar) 
     54   #else 
     55   #  define SINCOS(angle,svar,cvar) do {const double _t_=angle; svar=sin(_t_);cvar=cos(_t_);} while (0) 
     56   #endif 
     57 
     58#else // !USE_OPENCL && !USE_CUDA 
     59 
     60   #define local 
     61   #define pglobal 
     62   #define constant const 
     63   #define pconstant const 
     64 
     65   #ifdef __cplusplus 
    3066      #include <cstdio> 
    3167      #include <cmath> 
     
    5187     #endif 
    5288     inline void SINCOS(double angle, double &svar, double &cvar) { svar=sin(angle); cvar=cos(angle); } 
    53 else // !__cplusplus 
     89   #else // !__cplusplus 
    5490     #include <inttypes.h>  // C99 guarantees that int32_t types is here 
    5591     #include <stdio.h> 
     
    77113     #define kernel 
    78114     #define SINCOS(angle,svar,cvar) do {const double _t_=angle; svar=sin(_t_);cvar=cos(_t_);} while (0) 
    79 #  endif  // !__cplusplus 
    80 #  define global 
    81 #  define local 
    82 #  define constant const 
    83 // OpenCL powr(a,b) = C99 pow(a,b), b >= 0 
    84 // OpenCL pown(a,b) = C99 pow(a,b), b integer 
    85 #  define powr(a,b) pow(a,b) 
    86 #  define pown(a,b) pow(a,b) 
     115   #endif  // !__cplusplus 
     116   // OpenCL powr(a,b) = C99 pow(a,b), b >= 0 
     117   // OpenCL pown(a,b) = C99 pow(a,b), b integer 
     118   #define powr(a,b) pow(a,b) 
     119   #define pown(a,b) pow(a,b) 
     120 
    87121#endif // !USE_OPENCL 
    88122 
  • sasmodels/kernel_iq.c

    re44432d re44432d  
    277277kernel 
    278278void KERNEL_NAME( 
    279     int32_t nq,                 // number of q values 
    280     const int32_t pd_start,     // where we are in the dispersity loop 
    281     const int32_t pd_stop,      // where we are stopping in the dispersity loop 
    282     global const ProblemDetails *details, 
    283     global const double *values, 
    284     global const double *q, // nq q values, with padding to boundary 
    285     global double *result,  // nq+1 return values, again with padding 
    286     const double cutoff,     // cutoff in the dispersity weight product 
     279    int32_t nq,                   // number of q values 
     280    const int32_t pd_start,       // where we are in the dispersity loop 
     281    const int32_t pd_stop,        // where we are stopping in the dispersity loop 
     282    pglobal const ProblemDetails *details, 
     283    pglobal const double *values, // parameter values and distributions 
     284    pglobal const double *q,      // nq q values, with padding to boundary 
     285    pglobal double *result,       // nq+1 return values, again with padding 
     286    const double cutoff,          // cutoff in the dispersity weight product 
    287287    int32_t effective_radius_type // which effective radius to compute 
    288288    ) 
    289289{ 
    290 #ifdef USE_OPENCL 
     290#if defined(USE_GPU) 
    291291  // who we are and what element we are working with 
     292  #if defined(USE_OPENCL) 
    292293  const int q_index = get_global_id(0); 
     294  #else // USE_CUDA 
     295  const int q_index = threadIdx.x + blockIdx.x * blockDim.x; 
     296  #endif 
    293297  if (q_index >= nq) return; 
    294298#else 
     
    341345  // 
    342346  // The code differs slightly between opencl and dll since opencl is only 
    343   // seeing one q value (stored in the variable "this_result") while the dll 
     347  // seeing one q value (stored in the variable "this_F2") while the dll 
    344348  // version must loop over all q. 
    345   #ifdef USE_OPENCL 
     349  #if defined(CALL_FQ) 
     350    double weight_norm = (pd_start == 0 ? 0.0 : result[2*nq]); 
     351    double weighted_form = (pd_start == 0 ? 0.0 : result[2*nq+1]); 
     352    double weighted_shell = (pd_start == 0 ? 0.0 : result[2*nq+2]); 
     353    double weighted_radius = (pd_start == 0 ? 0.0 : result[2*nq+3]); 
     354  #else 
     355    double weight_norm = (pd_start == 0 ? 0.0 : result[nq]); 
     356    double weighted_form = (pd_start == 0 ? 0.0 : result[nq+1]); 
     357    double weighted_shell = (pd_start == 0 ? 0.0 : result[nq+2]); 
     358    double weighted_radius = (pd_start == 0 ? 0.0 : result[nq+3]); 
     359  #endif 
     360  #if defined(USE_GPU) 
    346361    #if defined(CALL_FQ) 
    347       double weight_norm = (pd_start == 0 ? 0.0 : result[2*nq]); 
    348       double weighted_form = (pd_start == 0 ? 0.0 : result[2*nq+1]); 
    349       double weighted_shell = (pd_start == 0 ? 0.0 : result[2*nq+2]); 
    350       double weighted_radius = (pd_start == 0 ? 0.0 : result[2*nq+3]); 
    351362      double this_F2 = (pd_start == 0 ? 0.0 : result[2*q_index+0]); 
    352363      double this_F1 = (pd_start == 0 ? 0.0 : result[2*q_index+1]); 
    353364    #else 
    354       double weight_norm = (pd_start == 0 ? 0.0 : result[nq]); 
    355       double weighted_form = (pd_start == 0 ? 0.0 : result[nq+1]); 
    356       double weighted_shell = (pd_start == 0 ? 0.0 : result[nq+2]); 
    357       double weighted_radius = (pd_start == 0 ? 0.0 : result[nq+3]); 
    358       double this_result = (pd_start == 0 ? 0.0 : result[q_index]); 
     365      double this_F2 = (pd_start == 0 ? 0.0 : result[q_index]); 
    359366    #endif 
    360   #else // !USE_OPENCL 
    361     #if defined(CALL_FQ) 
    362       double weight_norm = (pd_start == 0 ? 0.0 : result[2*nq]); 
    363       double weighted_form = (pd_start == 0 ? 0.0 : result[2*nq+1]); 
    364       double weighted_shell = (pd_start == 0 ? 0.0 : result[2*nq+2]); 
    365       double weighted_radius = (pd_start == 0 ? 0.0 : result[2*nq+3]); 
    366     #else 
    367       double weight_norm = (pd_start == 0 ? 0.0 : result[nq]); 
    368       double weighted_form = (pd_start == 0 ? 0.0 : result[nq+1]); 
    369       double weighted_shell = (pd_start == 0 ? 0.0 : result[nq+2]); 
    370       double weighted_radius = (pd_start == 0 ? 0.0 : result[nq+3]); 
    371     #endif 
     367  #else // !USE_GPU 
    372368    if (pd_start == 0) { 
    373369      #ifdef USE_OPENMP 
     
    381377      #endif 
    382378    } 
    383     //if (q_index==0) printf("start %d %g %g\n", pd_start, weighted_shell, result[0]); 
    384 #endif // !USE_OPENCL 
     379    //if (q_index==0) printf("start %d %g %g\n", pd_start, pd_norm, result[0]); 
     380#endif // !USE_GPU 
    385381 
    386382 
     
    405401  const int n4 = pd_length[4]; 
    406402  const int p4 = pd_par[4]; 
    407   global const double *v4 = pd_value + pd_offset[4]; 
    408   global const double *w4 = pd_weight + pd_offset[4]; 
     403  pglobal const double *v4 = pd_value + pd_offset[4]; 
     404  pglobal const double *w4 = pd_weight + pd_offset[4]; 
    409405  int i4 = (pd_start/pd_stride[4])%n4;  // position in level 4 at pd_start 
    410406 
     
    441437          FETCH_Q         // set qx,qy from the q input vector 
    442438          APPLY_ROTATION  // convert qx,qy to qa,qb,qc 
    443           CALL_KERNEL     // scattering = Iqxy(qa, qb, qc, p1, p2, ...) 
     439          CALL_KERNEL     // F2 = Iqxy(qa, qb, qc, p1, p2, ...) 
    444440 
    445441      ++step;  // increment counter representing position in dispersity mesh 
     
    613609  const int n##_LOOP = details->pd_length[_LOOP]; \ 
    614610  const int p##_LOOP = details->pd_par[_LOOP]; \ 
    615   global const double *v##_LOOP = pd_value + details->pd_offset[_LOOP]; \ 
    616   global const double *w##_LOOP = pd_weight + details->pd_offset[_LOOP]; \ 
     611  pglobal const double *v##_LOOP = pd_value + details->pd_offset[_LOOP]; \ 
     612  pglobal const double *w##_LOOP = pd_weight + details->pd_offset[_LOOP]; \ 
    617613  int i##_LOOP = (pd_start/details->pd_stride[_LOOP])%n##_LOOP; 
    618614 
     
    638634// Pointers to the start of the dispersity and weight vectors, if needed. 
    639635#if MAX_PD>0 
    640   global const double *pd_value = values + NUM_VALUES; 
    641   global const double *pd_weight = pd_value + details->num_weights; 
     636  pglobal const double *pd_value = values + NUM_VALUES; 
     637  pglobal const double *pd_weight = pd_value + details->num_weights; 
    642638#endif 
    643639 
     
    706702      BUILD_ROTATION(); 
    707703 
    708 #ifndef USE_OPENCL 
     704#if !defined(USE_GPU) 
    709705      // DLL needs to explicitly loop over the q values. 
    710706      #ifdef USE_OPENMP 
     
    712708      #endif 
    713709      for (q_index=0; q_index<nq; q_index++) 
    714 #endif // !USE_OPENCL 
     710#endif // !USE_GPU 
    715711      { 
    716712 
     
    721717        #if defined(MAGNETIC) && NUM_MAGNETIC > 0 
    722718          // Compute the scattering from the magnetic cross sections. 
    723           double scattering = 0.0; 
     719          double F2 = 0.0; 
    724720          const double qsq = qx*qx + qy*qy; 
    725721          if (qsq > 1.e-16) { 
     
    746742//  q_index, qx, qy, xs, sk, local_values.vector[sld_index], px, py, mx, my, mz); 
    747743                } 
    748                 scattering += xs_weight * CALL_KERNEL(); 
     744                F2 += xs_weight * CALL_KERNEL(); 
    749745              } 
    750746            } 
     
    754750            CALL_KERNEL(); // sets F1 and F2 by reference 
    755751          #else 
    756             const double scattering = CALL_KERNEL(); 
     752            const double F2 = CALL_KERNEL(); 
    757753          #endif 
    758754        #endif // !MAGNETIC 
    759 //printf("q_index:%d %g %g %g %g\n", q_index, scattering, weight0); 
    760  
    761         #ifdef USE_OPENCL 
     755//printf("q_index:%d %g %g %g %g\n", q_index, F2, weight0); 
     756 
     757        #if defined(USE_GPU) 
    762758          #if defined(CALL_FQ) 
    763759            this_F2 += weight * F2; 
    764760            this_F1 += weight * F1; 
    765761          #else 
    766             this_result += weight * scattering; 
     762            this_F2 += weight * F2; 
    767763          #endif 
    768764        #else // !USE_OPENCL 
     
    771767            result[2*q_index+1] += weight * F1; 
    772768          #else 
    773             result[q_index] += weight * scattering; 
     769            result[q_index] += weight * F2; 
    774770          #endif 
    775771        #endif // !USE_OPENCL 
     
    795791#endif 
    796792 
    797 // Remember the current result and the updated norm. 
    798 #ifdef USE_OPENCL 
     793// Remember the results and the updated norm. 
     794#if defined(USE_GPU) 
    799795  #if defined(CALL_FQ) 
    800     result[2*q_index+0] = this_F2; 
    801     result[2*q_index+1] = this_F1; 
    802     if (q_index == 0) { 
    803       result[2*nq+0] = weight_norm; 
    804       result[2*nq+1] = weighted_form; 
    805       result[2*nq+3] = weighted_shell; 
    806       result[2*nq+3] = weighted_radius; 
    807     } 
     796  result[2*q_index+0] = this_F2; 
     797  result[2*q_index+1] = this_F1; 
    808798  #else 
    809     result[q_index] = this_result; 
    810     if (q_index == 0) { 
    811       result[nq+0] = weight_norm; 
    812       result[nq+1] = weighted_form; 
    813       result[nq+2] = weighted_shell; 
    814       result[nq+3] = weighted_radius; 
    815     } 
     799  result[q_index] = this_F2; 
    816800  #endif 
    817  
    818 //if (q_index == 0) printf("res: %g/%g\n", result[0], weighted_shell); 
    819 #else // !USE_OPENCL 
    820   #if defined(CALL_FQ) 
     801  if (q_index == 0) 
     802#endif 
     803  { 
     804#if defined(CALL_FQ) 
    821805    result[2*nq] = weight_norm; 
    822806    result[2*nq+1] = weighted_form; 
    823807    result[2*nq+2] = weighted_shell; 
    824808    result[2*nq+3] = weighted_radius; 
    825   #else 
     809#else 
    826810    result[nq] = weight_norm; 
    827811    result[nq+1] = weighted_form; 
    828812    result[nq+2] = weighted_shell; 
    829813    result[nq+3] = weighted_radius; 
    830   #endif 
    831 //printf("res: %g/%g\n", result[0], weighted_shell); 
    832 #endif // !USE_OPENCL 
     814#endif 
     815  } 
    833816 
    834817// ** clear the macros in preparation for the next kernel ** 
  • sasmodels/kernelcl.py

    re44432d r95f62aa  
    11""" 
    22GPU driver for C kernels 
     3 
     4TODO: docs are out of date 
    35 
    46There should be a single GPU environment running on the system.  This 
     
    5961 
    6062 
    61 # Attempt to setup opencl. This may fail if the opencl package is not 
     63# Attempt to setup opencl. This may fail if the pyopencl package is not 
    6264# installed or if it is installed but there are no devices available. 
    6365try: 
     
    7476 
    7577from . import generate 
     78from .generate import F32, F64 
    7679from .kernel import KernelModel, Kernel 
    7780 
     
    131134 
    132135def use_opencl(): 
    133     return HAVE_OPENCL and os.environ.get("SAS_OPENCL", "").lower() != "none" 
     136    sas_opencl = os.environ.get("SAS_OPENCL", "OpenCL").lower() 
     137    return HAVE_OPENCL and sas_opencl != "none" and not sas_opencl.startswith("cuda") 
    134138 
    135139ENV = None 
     
    162166    Return true if device supports the requested precision. 
    163167    """ 
    164     if dtype == generate.F32: 
     168    if dtype == F32: 
    165169        return True 
    166170    elif dtype == generate.F64: 
     
    179183        cl.kernel_work_group_info.PREFERRED_WORK_GROUP_SIZE_MULTIPLE, 
    180184        queue.device) 
    181  
    182 def _stretch_input(vector, dtype, extra=1e-3, boundary=32): 
    183     # type: (np.ndarray, np.dtype, float, int) -> np.ndarray 
    184     """ 
    185     Stretch an input vector to the correct boundary. 
    186  
    187     Performance on the kernels can drop by a factor of two or more if the 
    188     number of values to compute does not fall on a nice power of two 
    189     boundary.   The trailing additional vector elements are given a 
    190     value of *extra*, and so f(*extra*) will be computed for each of 
    191     them.  The returned array will thus be a subset of the computed array. 
    192  
    193     *boundary* should be a power of 2 which is at least 32 for good 
    194     performance on current platforms (as of Jan 2015).  It should 
    195     probably be the max of get_warp(kernel,queue) and 
    196     device.min_data_type_align_size//4. 
    197     """ 
    198     remainder = vector.size % boundary 
    199     if remainder != 0: 
    200         size = vector.size + (boundary - remainder) 
    201         vector = np.hstack((vector, [extra] * (size - vector.size))) 
    202     return np.ascontiguousarray(vector, dtype=dtype) 
    203  
    204185 
    205186def compile_model(context, source, dtype, fast=False): 
     
    239220    """ 
    240221    GPU context, with possibly many devices, and one queue per device. 
     222 
     223    Because the environment can be reset during a live program (e.g., if the 
     224    user changes the active GPU device in the GUI), everything associated 
     225    with the device context must be cached in the environment and recreated 
     226    if the environment changes.  The *cache* attribute is a simple dictionary 
     227    which holds keys and references to objects, such as compiled kernels and 
     228    allocated buffers.  The running program should check in the cache for 
     229    long lived objects and create them if they are not there.  The program 
     230    should not hold onto cached objects, but instead only keep them active 
     231    for the duration of a function call.  When the environment is destroyed 
     232    then the *release* method for each active cache item is called before 
     233    the environment is freed.  This means that each cl buffer should be 
     234    in its own cache entry. 
    241235    """ 
    242236    def __init__(self): 
    243237        # type: () -> None 
    244238        # find gpu context 
    245         #self.context = cl.create_some_context() 
    246  
    247         self.context = None 
    248         if 'SAS_OPENCL' in os.environ: 
    249             #Setting PYOPENCL_CTX as a SAS_OPENCL to create cl context 
    250             os.environ["PYOPENCL_CTX"] = os.environ["SAS_OPENCL"] 
    251         if 'PYOPENCL_CTX' in os.environ: 
    252             self._create_some_context() 
    253  
    254         if not self.context: 
    255             self.context = _get_default_context() 
     239        context_list = _create_some_context() 
     240 
     241        # Find a context for F32 and for F64 (maybe the same one). 
     242        # F16 isn't good enough. 
     243        self.context = {} 
     244        for dtype in (F32, F64): 
     245            for context in context_list: 
     246                if has_type(context.devices[0], dtype): 
     247                    self.context[dtype] = context 
     248                    break 
     249            else: 
     250                self.context[dtype] = None 
     251 
     252        # Build a queue for each context 
     253        self.queue = {} 
     254        context = self.context[F32] 
     255        self.queue[F32] = cl.CommandQueue(context, context.devices[0]) 
     256        if self.context[F64] == self.context[F32]: 
     257            self.queue[F64] = self.queue[F32] 
     258        else: 
     259            context = self.context[F64] 
     260            self.queue[F64] = cl.CommandQueue(context, context.devices[0]) 
    256261 
    257262        # Byte boundary for data alignment 
    258         #self.data_boundary = max(d.min_data_type_align_size 
    259         #                         for d in self.context.devices) 
    260         self.queues = [cl.CommandQueue(context, context.devices[0]) 
    261                        for context in self.context] 
     263        #self.data_boundary = max(context.devices[0].min_data_type_align_size 
     264        #                         for context in self.context.values()) 
     265 
     266        # Cache for compiled programs, and for items in context 
    262267        self.compiled = {} 
     268        self.cache = {} 
    263269 
    264270    def has_type(self, dtype): 
     
    267273        Return True if all devices support a given type. 
    268274        """ 
    269         return any(has_type(d, dtype) 
    270                    for context in self.context 
    271                    for d in context.devices) 
    272  
    273     def get_queue(self, dtype): 
    274         # type: (np.dtype) -> cl.CommandQueue 
    275         """ 
    276         Return a command queue for the kernels of type dtype. 
    277         """ 
    278         for context, queue in zip(self.context, self.queues): 
    279             if all(has_type(d, dtype) for d in context.devices): 
    280                 return queue 
    281  
    282     def get_context(self, dtype): 
    283         # type: (np.dtype) -> cl.Context 
    284         """ 
    285         Return a OpenCL context for the kernels of type dtype. 
    286         """ 
    287         for context in self.context: 
    288             if all(has_type(d, dtype) for d in context.devices): 
    289                 return context 
    290  
    291     def _create_some_context(self): 
    292         # type: () -> cl.Context 
    293         """ 
    294         Protected call to cl.create_some_context without interactivity.  Use 
    295         this if SAS_OPENCL is set in the environment.  Sets the *context* 
    296         attribute. 
    297         """ 
    298         try: 
    299             self.context = [cl.create_some_context(interactive=False)] 
    300         except Exception as exc: 
    301             warnings.warn(str(exc)) 
    302             warnings.warn("pyopencl.create_some_context() failed") 
    303             warnings.warn("the environment variable 'SAS_OPENCL' might not be set correctly") 
     275        return self.context.get(dtype, None) is not None 
    304276 
    305277    def compile_program(self, name, source, dtype, fast, timestamp): 
     
    318290            del self.compiled[key] 
    319291        if key not in self.compiled: 
    320             context = self.get_context(dtype) 
     292            context = self.context[dtype] 
    321293            logging.info("building %s for OpenCL %s", key, 
    322294                         context.devices[0].name.strip()) 
    323             program = compile_model(self.get_context(dtype), 
     295            program = compile_model(self.context[dtype], 
    324296                                    str(source), dtype, fast) 
    325297            self.compiled[key] = (program, timestamp) 
    326298        return program 
     299 
     300    def free_buffer(self, key): 
     301        if key in self.cache: 
     302            self.cache[key].release() 
     303            del self.cache[key] 
     304 
     305    def __del__(self): 
     306        for v in self.cache.values(): 
     307            release = getattr(v, 'release', lambda: None) 
     308            release() 
     309        self.cache = {} 
     310 
     311_CURRENT_ID = 0 
     312def unique_id(): 
     313    global _CURRENT_ID 
     314    _CURRENT_ID += 1 
     315    return _CURRENT_ID 
     316 
     317def _create_some_context(): 
     318    # type: () -> cl.Context 
     319    """ 
     320    Protected call to cl.create_some_context without interactivity. 
     321 
     322    Uses SAS_OPENCL or PYOPENCL_CTX if they are set in the environment, 
     323    otherwise scans for the most appropriate device using 
     324    :func:`_get_default_context`.  Ignore *SAS_OPENCL=OpenCL*, which 
     325    indicates that an OpenCL device should be used without specifying 
     326    which one (and not a CUDA device, or no GPU). 
     327    """ 
     328    # Assume we do not get here if SAS_OPENCL is None or CUDA 
     329    sas_opencl = os.environ.get('SAS_OPENCL', 'opencl') 
     330    if sas_opencl.lower() != 'opencl': 
     331        # Setting PYOPENCL_CTX as a SAS_OPENCL to create cl context 
     332        os.environ["PYOPENCL_CTX"] = sas_opencl 
     333 
     334    if 'PYOPENCL_CTX' in os.environ: 
     335        try: 
     336            return [cl.create_some_context(interactive=False)] 
     337        except Exception as exc: 
     338            warnings.warn(str(exc)) 
     339            warnings.warn("pyopencl.create_some_context() failed") 
     340            warnings.warn("the environment variable 'SAS_OPENCL' or 'PYOPENCL_CTX' might not be set correctly") 
     341 
     342    return _get_default_context() 
    327343 
    328344def _get_default_context(): 
     
    404420        self.dtype = dtype 
    405421        self.fast = fast 
    406         self.program = None # delay program creation 
    407         self._kernels = None 
     422        self.timestamp = generate.ocl_timestamp(self.info) 
     423        self._cache_key = unique_id() 
    408424 
    409425    def __getstate__(self): 
     
    414430        # type: (Tuple[ModelInfo, str, np.dtype, bool]) -> None 
    415431        self.info, self.source, self.dtype, self.fast = state 
    416         self.program = None 
    417432 
    418433    def make_kernel(self, q_vectors): 
    419434        # type: (List[np.ndarray]) -> "GpuKernel" 
    420         if self.program is None: 
    421             compile_program = environment().compile_program 
    422             timestamp = generate.ocl_timestamp(self.info) 
    423             self.program = compile_program( 
     435        return GpuKernel(self, q_vectors) 
     436 
     437    @property 
     438    def Iq(self): 
     439        return self._fetch_kernel('Iq') 
     440 
     441    def fetch_kernel(self, name): 
     442        # type: (str) -> cl.Kernel 
     443        """ 
     444        Fetch the kernel from the environment by name, compiling it if it 
     445        does not already exist. 
     446        """ 
     447        gpu = environment() 
     448        key = self._cache_key 
     449        if key not in gpu.cache: 
     450            program = gpu.compile_program( 
    424451                self.info.name, 
    425452                self.source['opencl'], 
    426453                self.dtype, 
    427454                self.fast, 
    428                 timestamp) 
     455                self.timestamp) 
    429456            variants = ['Iq', 'Iqxy', 'Imagnetic'] 
    430457            names = [generate.kernel_name(self.info, k) for k in variants] 
    431             kernels = [getattr(self.program, k) for k in names] 
    432             self._kernels = dict((k, v) for k, v in zip(variants, kernels)) 
    433         is_2d = len(q_vectors) == 2 
    434         if is_2d: 
    435             kernel = [self._kernels['Iqxy'], self._kernels['Imagnetic']] 
     458            kernels = [getattr(program, k) for k in names] 
     459            data = dict((k, v) for k, v in zip(variants, kernels)) 
     460            # keep a handle to program so GC doesn't collect 
     461            data['program'] = program 
     462            gpu.cache[key] = data 
    436463        else: 
    437             kernel = [self._kernels['Iq']]*2 
    438         return GpuKernel(kernel, self.dtype, self.info, q_vectors) 
    439  
    440     def release(self): 
    441         # type: () -> None 
    442         """ 
    443         Free the resources associated with the model. 
    444         """ 
    445         if self.program is not None: 
    446             self.program = None 
    447  
    448     def __del__(self): 
    449         # type: () -> None 
    450         self.release() 
     464            data = gpu.cache[key] 
     465        return data[name] 
    451466 
    452467# TODO: check that we don't need a destructor for buffers which go out of scope 
     
    473488        # type: (List[np.ndarray], np.dtype) -> None 
    474489        # TODO: do we ever need double precision q? 
    475         env = environment() 
    476490        self.nq = q_vectors[0].size 
    477491        self.dtype = np.dtype(dtype) 
     
    481495        # at this point, so instead using 32, which is good on the set of 
    482496        # architectures tested so far. 
    483         extra_q = 3  # total weight, weighted volume and weighted radius 
    484497        if self.is_2d: 
    485             width = ((self.nq+15+extra_q)//16)*16 
     498            width = ((self.nq+15)//16)*16 
    486499            self.q = np.empty((width, 2), dtype=dtype) 
    487500            self.q[:self.nq, 0] = q_vectors[0] 
    488501            self.q[:self.nq, 1] = q_vectors[1] 
    489502        else: 
    490             width = ((self.nq+31+extra_q)//32)*32 
     503            width = ((self.nq+31)//32)*32 
    491504            self.q = np.empty(width, dtype=dtype) 
    492505            self.q[:self.nq] = q_vectors[0] 
    493506        self.global_size = [self.q.shape[0]] 
    494         context = env.get_context(self.dtype) 
    495         #print("creating inputs of size", self.global_size) 
    496         self.q_b = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, 
    497                              hostbuf=self.q) 
     507        self._cache_key = unique_id() 
     508 
     509    @property 
     510    def q_b(self): 
     511        """Lazy creation of q buffer so it can survive context reset""" 
     512        env = environment() 
     513        key = self._cache_key 
     514        if key not in env.cache: 
     515            context = env.context[self.dtype] 
     516            #print("creating inputs of size", self.global_size) 
     517            buffer = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, 
     518                               hostbuf=self.q) 
     519            env.cache[key] = buffer 
     520        return env.cache[key] 
    498521 
    499522    def release(self): 
    500523        # type: () -> None 
    501524        """ 
    502         Free the memory. 
    503         """ 
    504         if self.q_b is not None: 
    505             self.q_b.release() 
    506             self.q_b = None 
     525        Free the buffer associated with the q value 
     526        """ 
     527        environment().free_buffer(id(self)) 
    507528 
    508529    def __del__(self): 
     
    514535    Callable SAS kernel. 
    515536 
    516     *kernel* is the GpuKernel object to call 
    517  
    518     *model_info* is the module information 
    519  
    520     *q_vectors* is the q vectors at which the kernel should be evaluated 
     537    *model* is the GpuModel object to call 
     538 
     539    The following attributes are defined: 
     540 
     541    *info* is the module information 
    521542 
    522543    *dtype* is the kernel precision 
     544 
     545    *dim* is '1d' or '2d' 
     546 
     547    *result* is a vector to contain the results of the call 
    523548 
    524549    The resulting call method takes the *pars*, a list of values for 
     
    530555    Call :meth:`release` when done with the kernel instance. 
    531556    """ 
    532     def __init__(self, kernel, dtype, model_info, q_vectors): 
     557    def __init__(self, model, q_vectors): 
    533558        # type: (cl.Kernel, np.dtype, ModelInfo, List[np.ndarray]) -> None 
    534         q_input = GpuInput(q_vectors, dtype) 
    535         self.kernel = kernel 
    536         self.info = model_info 
    537         self.dtype = dtype 
    538         self.dim = '2d' if q_input.is_2d else '1d' 
    539         # leave room for f1/f2 results in case we need to compute beta for 1d models 
     559        dtype = model.dtype 
     560        self.q_input = GpuInput(q_vectors, dtype) 
     561        self._model = model 
     562        self._as_dtype = (np.float32 if dtype == generate.F32 
     563                          else np.float64 if dtype == generate.F64 
     564                          else np.float16 if dtype == generate.F16 
     565                          else np.float32)  # will never get here, so use np.float32 
     566        self._cache_key = unique_id() 
     567 
     568        # attributes accessed from the outside 
     569        self.dim = '2d' if self.q_input.is_2d else '1d' 
     570        self.info = model.info 
     571        self.dtype = model.dtype 
     572 
     573        # holding place for the returned value 
    540574        nout = 2 if self.info.have_Fq and self.dim == '1d' else 1 
    541         # +4 for total weight, shell volume, effective radius, form volume 
    542         self.result = np.empty(q_input.nq*nout + 4, self.dtype) 
    543  
    544         # Inputs and outputs for each kernel call 
    545         # Note: res may be shorter than res_b if global_size != nq 
     575        extra_q = 4  # total weight, form volume, shell volume and R_eff 
     576        self.result = np.empty(self.q_input.nq*nout+extra_q, dtype) 
     577 
     578    @property 
     579    def _result_b(self): 
     580        """Lazy creation of result buffer so it can survive context reset""" 
    546581        env = environment() 
    547         self.queue = env.get_queue(dtype) 
    548  
    549         self.result_b = cl.Buffer(self.queue.context, mf.READ_WRITE, 
    550                                   q_input.global_size[0] * nout * dtype.itemsize) 
    551         self.q_input = q_input # allocated by GpuInput above 
    552  
    553         self._need_release = [self.result_b, self.q_input] 
    554         self.real = (np.float32 if dtype == generate.F32 
    555                      else np.float64 if dtype == generate.F64 
    556                      else np.float16 if dtype == generate.F16 
    557                      else np.float32)  # will never get here, so use np.float32 
     582        key = self._cache_key 
     583        if key not in env.cache: 
     584            context = env.context[self.dtype] 
     585            width = ((self.result.size+31)//32)*32 * self.dtype.itemsize 
     586            buffer = cl.Buffer(context, mf.READ_WRITE, width) 
     587            env.cache[key] = buffer 
     588        return env.cache[key] 
    558589 
    559590    def _call_kernel(self, call_details, values, cutoff, magnetic, effective_radius_type): 
    560591        # type: (CallDetails, np.ndarray, np.ndarray, float, bool) -> np.ndarray 
    561         context = self.queue.context 
    562         # Arrange data transfer to card 
     592        env = environment() 
     593        queue = env.queue[self._model.dtype] 
     594        context = queue.context 
     595 
     596        # Arrange data transfer to/from card 
     597        q_b = self.q_input.q_b 
     598        result_b = self._result_b 
    563599        details_b = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, 
    564600                              hostbuf=call_details.buffer) 
     
    566602                             hostbuf=values) 
    567603 
    568         kernel = self.kernel[1 if magnetic else 0] 
    569         args = [ 
     604        name = 'Iq' if self.dim == '1d' else 'Imagnetic' if magnetic else 'Iqxy' 
     605        kernel = self._model.fetch_kernel(name) 
     606        kernel_args = [ 
    570607            np.uint32(self.q_input.nq), None, None, 
    571             details_b, values_b, self.q_input.q_b, self.result_b, 
    572             self.real(cutoff), 
     608            details_b, values_b, q_b, result_b, 
     609            self._as_dtype(cutoff), 
    573610            np.uint32(effective_radius_type), 
    574611        ] 
     
    582619            stop = min(start + step, call_details.num_eval) 
    583620            #print("queuing",start,stop) 
    584             args[1:3] = [np.int32(start), np.int32(stop)] 
    585             wait_for = [kernel(self.queue, self.q_input.global_size, None, 
    586                                *args, wait_for=wait_for)] 
     621            kernel_args[1:3] = [np.int32(start), np.int32(stop)] 
     622            wait_for = [kernel(queue, self.q_input.global_size, None, 
     623                               *kernel_args, wait_for=wait_for)] 
    587624            if stop < call_details.num_eval: 
    588625                # Allow other processes to run 
     
    590627                current_time = time.clock() 
    591628                if current_time - last_nap > 0.5: 
    592                     time.sleep(0.05) 
     629                    time.sleep(0.001) 
    593630                    last_nap = current_time 
    594         cl.enqueue_copy(self.queue, self.result, self.result_b) 
     631        cl.enqueue_copy(queue, self.result, result_b, wait_for=wait_for) 
    595632        #print("result", self.result) 
    596633 
     
    605642        Release resources associated with the kernel. 
    606643        """ 
    607         for v in self._need_release: 
    608             v.release() 
    609         self._need_release = [] 
     644        environment().free_buffer(id(self)) 
     645        self.q_input.release() 
    610646 
    611647    def __del__(self): 
  • sasmodels/model_test.py

    r012cd34 r74e9b5f  
    55Usage:: 
    66 
    7     python -m sasmodels.model_test [opencl|dll|opencl_and_dll] model1 model2 ... 
     7    python -m sasmodels.model_test [opencl|cuda|dll] model1 model2 ... 
    88 
    99    if model1 is 'all', then all except the remaining models will be tested 
     
    6363from .modelinfo import expand_pars 
    6464from .kernelcl import use_opencl 
     65from .kernelcuda import use_cuda 
    6566 
    6667# pylint: disable=unused-import 
     
    8081    Construct the pyunit test suite. 
    8182 
    82     *loaders* is the list of kernel drivers to use, which is one of 
    83     *["dll", "opencl"]*, *["dll"]* or *["opencl"]*.  For python models, 
    84     the python driver is always used. 
     83    *loaders* is the list of kernel drivers to use (dll, opencl or cuda). 
     84    For python model the python driver is always used. 
    8585 
    8686    *models* is the list of models to test, or *["all"]* to test all models. 
     
    135135 
    136136            # test using dll if desired 
    137             if 'dll' in loaders or not use_opencl(): 
     137            if 'dll' in loaders: 
    138138                test_name = "%s-dll"%model_name 
    139139                test_method_name = "test_%s_dll" % model_info.id 
     
    156156                                     test_method_name, 
    157157                                     platform="ocl", dtype=None, 
     158                                     stash=stash) 
     159                #print("defining", test_name) 
     160                suite.addTest(test) 
     161 
     162            # test using cuda if desired and available 
     163            if 'cuda' in loaders and use_cuda(): 
     164                test_name = "%s-cuda"%model_name 
     165                test_method_name = "test_%s_cuda" % model_info.id 
     166                # Using dtype=None so that the models that are only 
     167                # correct for double precision are not tested using 
     168                # single precision.  The choice is determined by the 
     169                # presence of *single=False* in the model file. 
     170                test = ModelTestCase(test_name, model_info, 
     171                                     test_method_name, 
     172                                     platform="cuda", dtype=None, 
    158173                                     stash=stash) 
    159174                #print("defining", test_name) 
     
    220235 
    221236                # Check for missing tests.  Only do so for the "dll" tests 
    222                 # to reduce noise from both opencl and dll, and because 
     237                # to reduce noise from both opencl and cuda, and because 
    223238                # python kernels use platform="dll". 
    224239                if self.platform == "dll": 
     
    368383 
    369384    # Build a test suite containing just the model 
    370     loaders = ['opencl'] if use_opencl() else ['dll'] 
     385    loader = 'opencl' if use_opencl() else 'cuda' if use_cuda() else 'dll' 
    371386    models = [model] 
    372387    try: 
    373         suite = make_suite(loaders, models) 
     388        suite = make_suite([loader], models) 
    374389    except Exception: 
    375390        import traceback 
     
    434449        loaders = ['opencl'] 
    435450        models = models[1:] 
     451    elif models and models[0] == 'cuda': 
     452        if not use_cuda(): 
     453            print("cuda is not available") 
     454            return 1 
     455        loaders = ['cuda'] 
     456        models = models[1:] 
    436457    elif models and models[0] == 'dll': 
    437458        # TODO: test if compiler is available? 
    438459        loaders = ['dll'] 
    439460        models = models[1:] 
    440     elif models and models[0] == 'opencl_and_dll': 
    441         loaders = ['opencl', 'dll'] if use_opencl() else ['dll'] 
    442         models = models[1:] 
    443461    else: 
    444         loaders = ['opencl', 'dll'] if use_opencl() else ['dll'] 
     462        loaders = ['dll'] 
     463        if use_opencl(): 
     464            loaders.append('opencl') 
     465        if use_cuda(): 
     466            loaders.append('cuda') 
    445467    if not models: 
    446468        print("""\ 
    447469usage: 
    448   python -m sasmodels.model_test [-v] [opencl|dll] model1 model2 ... 
     470  python -m sasmodels.model_test [-v] [opencl|cuda|dll] model1 model2 ... 
    449471 
    450472If -v is included on the command line, then use verbose output. 
    451473 
    452 If neither opencl nor dll is specified, then models will be tested with 
    453 both OpenCL and dll; the compute target is ignored for pure python models. 
     474If no platform is specified, then models will be tested with dll, and 
     475if available, OpenCL and CUDA; the compute target is ignored for pure python models. 
    454476 
    455477If model1 is 'all', then all except the remaining models will be tested. 
     
    471493    Run "nosetests sasmodels" on the command line to invoke it. 
    472494    """ 
    473     loaders = ['opencl', 'dll'] if use_opencl() else ['dll'] 
     495    loaders = ['dll'] 
     496    if use_opencl(): 
     497        loaders.append('opencl') 
     498    if use_cuda(): 
     499        loaders.append('cuda') 
    474500    tests = make_suite(loaders, ['all']) 
    475501    def build_test(test): 
  • sasmodels/models/lib/gauss76.c

    r99b84ec r74e9b5f  
    1111 
    1212// Gaussians 
    13 constant double Gauss76Wt[76]={ 
     13constant double Gauss76Wt[76] = { 
    1414        .00126779163408536,             //0 
    1515        .00294910295364247, 
     
    9090}; 
    9191 
    92 constant double Gauss76Z[76]={ 
     92constant double Gauss76Z[76] = { 
    9393        -.999505948362153,              //0 
    9494        -.997397786355355, 
  • sasmodels/models/lib/polevl.c

    r447e9aa r74e9b5f  
    5151*/ 
    5252 
    53 double polevl( double x, constant double *coef, int N ); 
    54 double polevl( double x, constant double *coef, int N ) 
     53static 
     54double polevl( double x, pconstant double *coef, int N ) 
    5555{ 
    5656 
     
    7272 */ 
    7373 
    74 double p1evl( double x, constant double *coef, int N ); 
    75 double p1evl( double x, constant double *coef, int N ) 
     74static 
     75double p1evl( double x, pconstant double *coef, int N ) 
    7676{ 
    7777    int i=0; 
  • sasmodels/models/lib/sas_J1.c

    r5181ccc r74e9b5f  
    4242#if FLOAT_SIZE>4 
    4343//Cephes double pression function 
    44 double cephes_j1(double x); 
    4544 
    4645constant double RPJ1[8] = { 
     
    106105    0.0 }; 
    107106 
     107static 
    108108double cephes_j1(double x) 
    109109{ 
     
    155155#else 
    156156//Single precission version of cephes 
    157 float cephes_j1f(float x); 
    158  
    159157constant float JPJ1[8] = { 
    160158    -4.878788132172128E-009, 
     
    190188    }; 
    191189 
     190static 
    192191float cephes_j1f(float xx) 
    193192{ 
     
    240239 
    241240//Finally J1c function that equals 2*J1(x)/x 
    242 double sas_2J1x_x(double x); 
     241static 
    243242double sas_2J1x_x(double x) 
    244243{ 
  • sasmodels/models/spinodal.py

    r475ff58 r93fe8a1  
    1212where $x=q/q_0$, $q_0$ is the peak position, $I_{max}$ is the intensity  
    1313at $q_0$ (parameterised as the $scale$ parameter), and $B$ is a flat  
    14 background. The spinodal wavelength is given by $2\pi/q_0$.  
     14background. The spinodal wavelength, $\Lambda$, is given by $2\pi/q_0$.  
     15 
     16The definition of $I_{max}$ in the literature varies. Hashimoto *et al* (1991)  
     17define it as  
     18 
     19.. math:: 
     20    I_{max} = \Lambda^3\Delta\rho^2 
     21     
     22whereas Meier & Strobl (1987) give  
     23 
     24.. math:: 
     25    I_{max} = V_z\Delta\rho^2 
     26     
     27where $V_z$ is the volume per monomer unit. 
    1528 
    1629The exponent $\gamma$ is equal to $d+1$ for off-critical concentration  
     
    2841 
    2942H. Furukawa. Dynamics-scaling theory for phase-separating unmixing mixtures: 
    30 Growth rates of droplets and scaling properties of autocorrelation functions. 
    31 Physica A 123,497 (1984). 
     43Growth rates of droplets and scaling properties of autocorrelation functions.  
     44Physica A 123, 497 (1984). 
     45 
     46H. Meier & G. Strobl. Small-Angle X-ray Scattering Study of Spinodal  
     47Decomposition in Polystyrene/Poly(styrene-co-bromostyrene) Blends.  
     48Macromolecules 20, 649-654 (1987). 
     49 
     50T. Hashimoto, M. Takenaka & H. Jinnai. Scattering Studies of Self-Assembling  
     51Processes of Polymer Blends in Spinodal Decomposition.  
     52J. Appl. Cryst. 24, 457-466 (1991). 
    3253 
    3354Revision History 
     
    3556 
    3657* **Author:**  Dirk Honecker **Date:** Oct 7, 2016 
    37 * **Revised:** Steve King    **Date:** Sep 7, 2018 
     58* **Revised:** Steve King    **Date:** Oct 25, 2018 
    3859""" 
    3960 
  • setup.py

    r1f991d6 r783e76f  
    2929                return version[1:-1] 
    3030    raise RuntimeError("Could not read version from %s/__init__.py"%package) 
     31 
     32install_requires = ['numpy', 'scipy'] 
     33 
     34if sys.platform=='win32' or sys.platform=='cygwin': 
     35    install_requires.append('tinycc') 
    3136 
    3237setup( 
     
    6166        'sasmodels': ['*.c', '*.cl'], 
    6267    }, 
    63     install_requires=[ 
    64     ], 
     68    install_requires=install_requires, 
    6569    extras_require={ 
     70        'full': ['docutils', 'bumps', 'matplotlib'], 
     71        'server': ['bumps'], 
    6672        'OpenCL': ["pyopencl"], 
    67         'Bumps': ["bumps"], 
    68         'TinyCC': ["tinycc"], 
    6973    }, 
    7074    build_requires=['setuptools'], 
Note: See TracChangeset for help on using the changeset viewer.