Changeset af6de50 in sasview


Ignore:
Timestamp:
Oct 25, 2016 5:21:44 PM (7 years ago)
Author:
Paul Kienzle <pkienzle@…>
Branches:
master, ESS_GUI, ESS_GUI_Docs, ESS_GUI_batch_fitting, ESS_GUI_bumps_abstraction, ESS_GUI_iss1116, ESS_GUI_iss879, ESS_GUI_iss959, ESS_GUI_opencl, ESS_GUI_ordering, ESS_GUI_sync_sascalc, costrafo411, magnetic_scatt, release-4.1.1, release-4.1.2, release-4.2.2, ticket-1009, ticket-1094-headless, ticket-1242-2d-resolution, ticket-1243, ticket-1249, ticket885, unittest-saveload
Children:
907186d
Parents:
cf1910f
Message:

update C model docs with warnings about GPU programming

File:
1 edited

Legend:

Unmodified
Added
Removed
  • src/sas/sasgui/perspectives/fitting/media/plugin.rst

    r3e1c9e5 raf6de50  
    1111There are essentially three ways to generate new fitting models for SasView: 
    1212 
    13 * Using the SasView :ref:`New_Plugin_Model` helper dialog (best for beginners and/or relatively simple models) 
    14 * By copying/editing an existing model (this can include models generated by the *New Plugin Model* dialog) in the :ref:`Python_shell` or :ref:`Advanced_Plugin_Editor` as described below (suitable for all use cases) 
    15 * By writing a model from scratch outside of SasView (only recommended for code monkeys!) 
     13* Using the SasView :ref:`New_Plugin_Model` helper dialog (best for beginners 
     14  and/or relatively simple models) 
     15* By copying/editing an existing model (this can include models generated by 
     16  the *New Plugin Model* dialog) in the :ref:`Python_shell` or 
     17  :ref:`Advanced_Plugin_Editor` as described below (suitable for all use cases) 
     18* By writing a model from scratch outside of SasView (only recommended for 
     19  code monkeys!) 
    1620 
    1721Overview 
     
    479483    } 
    480484 
    481 The C model operates on a single $q$ value at a time.  The code will be 
    482 run in parallel across different $q$ values, either on the graphics card 
    483 or the processor. 
    484  
    485 Rather than returning NAN from Iq, you must define the *INVALID(v)*.  The 
    486 *v* parameter lets you access all the parameters in the model using 
    487 *v.par1*, *v.par2*, etc. For example:: 
    488  
    489     #define INVALID(v) (v.bell_radius < v.radius) 
    490  
    491485*Iqxy* is similar to *Iq*, except it uses parameters *qx, qy* instead of *q*, 
    492 and it includes orientation parameters. As in python models, *form_volume* 
    493 includes only the volume parameters.  *Iqxy* will default to 
    494 *Iq(sqrt(qx**2 + qy**2), par1, ...)* and *form_volume* will default to 1.0. 
    495  
    496 The C code follows the C99 standard, including the usual math functions, 
    497 as defined in 
    498 `OpenCL <https://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/mathFunctions.html>`_. 
    499  
    500 The standard constants and functions include the following:: 
    501  
    502     M_PI = pi 
    503     M_PI_2 = pi/2 
    504     M_PI_4 = pi/4 
    505     M_E = e 
    506     M_SQRT1_2 = 1/sqrt(2) 
    507     NAN = NaN 
    508     INFINITY = 1/0 
    509     erf(x) = error function 
    510     erfc(x) = 1-erf(x) 
    511     expm1(x) = exp(x) - 1 
    512     tgamma(x) = gamma function 
    513  
    514 Some non-standard constants and functions are also provided:: 
    515  
    516     M_PI_180 = pi/180 
    517     M_4PI_3 = 4pi/3 
    518     square(x) = x*x 
    519     cube(x) = x*x*x 
    520     sinc(x) = sin(x)/x, with sin(0)/0 -> 1 
    521     SINCOS(x, s, c) sets s=sin(angle) and c=cos(angle) 
    522     powr(x, y) = x^y for x >= 0 
    523     pown(x, n) = x^n for n integer 
    524  
    525 **source=['lib/fn.c', ...]** includes the listed C source files in the 
     486and it includes orientation parameters. 
     487 
     488*form_volume* defines the volume of the shape. As in python models, 
     489includes only the volume parameters. 
     490 
     491*Iqxy* will default to *Iq(sqrt(qx**2 + qy**2), par1, ...)* and 
     492*form_volume* will default to 1.0. 
     493 
     494**source=['fn.c', ...]** includes the listed C source files in the 
    526495program before *Iq* and *Iqxy* are defined. This allows you to extend the 
    527 library of available C functions. Additional special functions and 
    528 scattering calculations are defined in 
    529 `sasmodels/models/lib <https://github.com/SasView/sasmodels/tree/master/sasmodels/models/lib>`_, 
    530 including:: 
    531  
    532     sph_j1c(x) = 3 j1(x)/x = 3 (sin(x) - x cos(x))/x^3  [spherical bessel function] 
    533     sas_J1c(x) = 2 J1(x)/x  [bessel function of the first kind] 
    534     sas_gamma(x) = gamma function  [tgamma is unstable below 1] 
    535     sas_erf(x) = error function [erf is broken on some Intel OpenCL drivers] 
    536     sas_erfc(x) = 1-erf(x) 
    537     sas_J0(x) = J0(x) 
    538     sas_J1(x) = J1(x) 
    539     sas_JN(x) = JN(x) 
    540     Si(x) = integral sin(z)/z from 0 to x 
    541     Gauss76Wt = gaussian quadrature weights for 76 point integral 
    542     Gauss76Z = gaussian quadrature values for 76 point integral 
    543  
    544 These functions have been tuned to be fast and numerically stable down 
    545 to $q=0$ even in single precision.  In some cases they work around bugs 
    546 which appear on some platforms but not others. So use them where needed!!! 
     496library of C functions available to your model. 
    547497 
    548498Models are defined using double precision declarations for the 
    549 parameters and return values.  Declarations and constants will be converted 
    550 to float or long double depending on the precision requested. 
     499parameters and return values.  When a model is run using single 
     500precision or long double precision, each variable is converted 
     501to the target type, depending on the precision requested. 
    551502 
    552503**Floating point constants must include the decimal point.**  This allows us 
     
    558509use the builtin constant M_PI rather than 4*atan(1); it is faster and smaller! 
    559510 
    560 FLOAT_SIZE is the number of bytes in the converted variables. If your 
    561 algorithm depends on precision (which is not uncommon for numerical 
    562 algorithms), use the following:: 
    563  
    564     #if FLOAT_SIZE>4 
    565     ... code for double precision ... 
    566     #else 
    567     ... code for single precision ... 
    568     #endif 
    569  
    570 A value defined as SAS_DOUBLE will stay double precision; this should 
    571 not be used since some graphics cards do not support double precision. 
    572  
     511The C model operates on a single $q$ value at a time.  The code will be 
     512run in parallel across different $q$ values, either on the graphics card 
     513or the processor. 
     514 
     515Rather than returning NAN from Iq, you must define the *INVALID(v)*.  The 
     516*v* parameter lets you access all the parameters in the model using 
     517*v.par1*, *v.par2*, etc. For example:: 
     518 
     519    #define INVALID(v) (v.bell_radius < v.radius) 
     520 
     521Special Functions 
     522................. 
     523 
     524The C code follows the C99 standard, with the usual math functions, 
     525as defined in 
     526`OpenCL <https://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/mathFunctions.html>`_. 
     527This includes the following: 
     528 
     529    M_PI, M_PI_2, M_PI_4, M_SQRT1_2, M_E: 
     530        $\pi$, $\pi/2$, $\pi/4$, $1/\sqrt{2}$ and Euler's constant $e$ 
     531    exp, log, pow(x,y), expm1, sqrt: 
     532        Power functions $e^x$, $\ln x$, $x^y$, $e^x - 1$, $\sqrt{x}$. 
     533        The function expm1(x) is accurate across all $x$, including $x$ 
     534        very close to zero. 
     535    sin, cos, tan, asin, acos, atan: 
     536        Trigonometry functions and inverses, operating on radians. 
     537    sinh, cos, tanh, asinh, acosh, atanh: 
     538        Hyperbolic trigonometry functions. 
     539    atan2(y,x): 
     540        Angle from the $x$\ -axis to the point $(x,y)$, which is equal to 
     541        $\tan^{-1}(y/x)$ corrected for quadrant.  That is, if $x$ and $y$ are 
     542        both negative, then atan2(y,x) returns a value in quadrant III where 
     543        atan(y/x) would return a value in quadrant I. Similarly for 
     544        quadrants II and IV when $x$ and $y$ have opposite sign. 
     545    fmin(x,y), fmax(x,y), trunc, rint: 
     546        Floating point functions.  rint(x) returns the nearest integer. 
     547    NAN: 
     548        NaN, Not a Number, $0/0$.  Use isnan(x) to test for NaN.  Note that 
     549        you cannot use :code:`x == NAN` to test for NaN values since that 
     550        will always return false.  NAN does not equal NAN! 
     551    INFINITY: 
     552        $\infty, 1/0$.  Use isinf(x) to test for infinity, or isfinite(x) 
     553        to test for finite and not NaN. 
     554    erf, erfc, tgamma, lgamma:  **do not use** 
     555        Special functions that should be part of the standard, but are missing 
     556        or inaccurate on some platforms. Use sas_erf, sas_erfc and sas_gamma 
     557        instead (see below). Note: lgamma(x) has not yet been tested. 
     558 
     559Some non-standard constants and functions are also provided: 
     560 
     561    M_PI_180, M_4PI_3: 
     562        $\pi/{180}$, $\tfrac{4}{3}\pi$ 
     563    SINCOS(x, s, c): 
     564        Macro which sets s=sin(x) and c=cos(x). The variables *c* and *s* 
     565        must be declared first. 
     566    square(x): 
     567        $x^2$ 
     568    cube(x): 
     569        $x^3$ 
     570    sinc(x): 
     571        $\sin(x)/x$, with limit $\sin(0)/0 = 1$. 
     572    powr(x, y): 
     573        $x^y$ for $x \ge 0$; this is faster than general $x^y$ on some GPUs. 
     574    pown(x, n): 
     575        $x^n$ for $n$ integer; this is faster than general $x^n$ on some GPUs. 
     576    FLOAT_SIZE: 
     577        The number of bytes in a floating point value.  Even though all 
     578        variables are declared double, they may be converted to single 
     579        precision float before running. If your algorithm depends on 
     580        precision (which is not uncommon for numerical algorithms), use 
     581        the following:: 
     582 
     583            #if FLOAT_SIZE>4 
     584            ... code for double precision ... 
     585            #else 
     586            ... code for single precision ... 
     587            #endif 
     588    SAS_DOUBLE: 
     589        A replacement for :code:`double` so that the declared variable will 
     590        stay double precision; this should generally not be used since some 
     591        graphics cards do not support double precision.  There is no provision 
     592        for forcing a constant to stay double precision. 
     593 
     594The following special functions and scattering calculations are defined in 
     595`sasmodels/models/lib <https://github.com/SasView/sasmodels/tree/master/sasmodels/models/lib>`_. 
     596These functions have been tuned to be fast and numerically stable down 
     597to $q=0$ even in single precision.  In some cases they work around bugs 
     598which appear on some platforms but not others. So use them where needed!!! 
     599 
     600    polevl(x, c, n): 
     601        Polynomial evaluation $p(x) = \sum_{i=0}^n c_i x^{n-i}$ using Horner's 
     602        method so it is faster and more accurate. 
     603 
     604        :code:`source = ["lib/polevl.c", ...]` 
     605 
     606    sas_gamma: 
     607        Gamma function $\text{sas_gamma}(x) = \Gamma(x)$.  The standard math 
     608        library gamma function, tgamma(x) is unstable below 1 on some platforms. 
     609 
     610        :code:`source = ["lib/sasgamma.c", ...]` 
     611 
     612    erf, erfc: 
     613        Error function 
     614        $\text{erf}(x) = \frac{1}{\sqrt\pi}\int_0^x e^{-t^2}\,dt$ 
     615        and complementary error function 
     616        $\text{erfc}(x) = \frac{1}{\sqrt\pi}\int_x^\inf e^{-t^2}\,dt$. 
     617        The standard manth library erf and erfc are slower and broken 
     618        on some platforms. 
     619 
     620        :code:`source = ["lib/polevl.c", "lib/sas_erf.c", ...]` 
     621 
     622    sas_J0: 
     623        Bessel functions of the first kind where 
     624        $J_0(x) = \frac{1}{\pi}\int_0^\pi \cos(x\sin(\tau))\,d\tau$. 
     625 
     626        :code:`source = ["lib/polevl.c", "lib/sas_J0.c", ...]` 
     627 
     628    sas_J1: 
     629        Bessel functions of the first kind where 
     630        $J_1(x) = \frac{1}{\pi}\int_0^\pi \cos(\tau - x\sin(\tau))\,d\tau$. 
     631 
     632        :code:`source = ["lib/polevl.c", "lib/sas_J1.c", ...]` 
     633 
     634    sas_JN: 
     635        Bessel functions of the first kind where 
     636        $J_n(x) = \frac{1}{\pi}\int_0^\pi \cos(n\tau - x\sin(\tau))\,d\tau$. 
     637 
     638        :code:`source = ["lib/polevl.c", "lib/sas_J0.c", "lib/sas_J1.c", "lib/sas_JN.c", ...]` 
     639 
     640    Si: 
     641        Sine integral $\text{Si}(x) = \int_0^x \tfrac{\sin t}{t}\,dt$. 
     642 
     643        :code:`soure = ["lib/Si.c", ...]` 
     644 
     645    sph_j1c(qr): 
     646        Spherical Bessel form 
     647        $F(qr) = 3 j_1(qr)/(qr) = 3 (\sin(qr) - qr \cos(qr))/{(qr)^3}$, 
     648        with a limiting value of 1 at $qr=0$.  This function uses a Taylor 
     649        series for small $qr$ for numerical accuracy. 
     650 
     651        :code:`source = ["lib/sph_j1c.c", ...]` 
     652 
     653    sas_J1c(qr): 
     654        Bessel form $F(qr) = 2 J_1(qr)/{(qr)}$, with a limiting value of 1 at $qr=0$. 
     655 
     656        :code:`source = ["lib/polevl.c", "lib/sas_J1c.c", ...]` 
     657 
     658    Gauss76z[i], Gauss76Wt[i]: 
     659        Points $z_i$ and weights $w_i$ for 76-point Gaussian quadrature, 
     660        computing $\int_{-1}^1 f(z)\,dz \approx \sum_{i=1}^{76} w_i f(z_i)$. 
     661        Similar arrays are available in :code:`gauss20.c` for 20 point 
     662        quadrature and in :code:`gauss150.c` for 150 point quadrature. 
     663 
     664        :code:`source = ["gauss76.c", ...]` 
     665 
     666Problems with C models 
     667...................... 
     668 
     669The graphics processor (GPU) in your computer is a specialized computer tuned 
     670for certain kinds of problems.  This leads to strange restrictions that you 
     671need to be aware of.  Your code may work fine on some platforms or for some 
     672models, but then return bad values on other platforms.  Some examples of 
     673particular problems: 
     674 
     675  (1) Code is too complex, or uses too much memory.  GPU devices only have a 
     676  limited amount of memory available for each processor.  If you run programs 
     677  which take too much memory, then rather than running multiple values in parallel 
     678  as it usually does, the GPU may only run a single version of the code at a 
     679  time, making it slower than running on the CPU.  It may fail to run on 
     680  some platforms, or worse, cause the screen to go blank or the system to reboot. 
     681 
     682  (2) Code takes too long.  Because GPU devices are used for the computer 
     683  display, the OpenCL drivers are very careful about the amount of time they 
     684  will allow any code to run.  For example, on OS X, the model will stop running 
     685  after 5 seconds regardless if the computation is complete.  You may end up 
     686  with only some of your 2-D array defined, with the rest containing random 
     687  data. Or it may cause the screen to go blank or the system to reboot. 
     688 
     689  (3) Memory is not *aligned*.  The GPU hardware is specialized to operate on 
     690  multiple values simultaneously.  To keep the GPU simpler the values in memory 
     691  must be aligned with the different GPU compute engines.  Not following these 
     692  rules can lead to unexpected values being loaded into memory, and wrong answers 
     693  computed.  The conclusion from a very long and strange debugging session was 
     694  that any arrays that you declare in your model should be a multiple of four. 
     695  For example 
     696 
     697      double Iq(q, p1, p2, ...) 
     698      { 
     699          double vector[8];  // Only going to use seven slots, but declare 8 
     700          ... 
     701      } 
     702 
     703The first step when your model is behaving strangely is to set **single=False**. 
     704This automatically restricts the model to only run on the CPU, or on high end 
     705GPU cards.  There can still be problems even on high end cards, so you can force 
     706the model off the GPU by setting **opencl=False**.  This runs the model 
     707as a normal C program without any GPU restrictions so you know that 
     708strange results are probably from your code rather than the environment.  Once 
     709the code is debugged, you can compare your output to the output on the GPU. 
     710 
     711Although it can be difficult to get your model to work on the GPU, the reward 
     712can be a model that runs 1000x faster on a good card.  Even your laptop may 
     713show a 50x improvement or more over the equivalent pure python model. 
    573714 
    574715External C Models 
     
    577718External C models are very much like embedded C models, except that 
    578719*Iq*, *Iqxy* and *form_volume* are defined in an external source file 
    579 loaded using the *source=[...]*  method. You need to supply the function 
     720loaded using the *source=[...]* statement. You need to supply the function 
    580721declarations for each of these that you need instead of building them 
    581722automatically from the parameter table. 
Note: See TracChangeset for help on using the changeset viewer.