.. default-role:: code .. role:: bash(code) :language: bash .. _cookbook: ================== |galario| Cookbook ================== .. _cookbook_GPU_CPU_version: Using the GPU and CPU version ----------------------------- The CPU version of |galario| is always compiled, even on a system without a CUDA-enabled GPU. In this case you can import the double and single precision CPU versions of the library with: .. code-block:: python from galario import double from galario import single If built on a machine with a CUDA-enabled GPU, |galario| is compiled also for the GPU. You can still import the CPU version as above, and the GPU version as follows: .. code-block:: python from galario import double_cuda from galario import single_cuda To check programmatically whether the GPU version is available, you can read the global variable :data:`galario.HAVE_CUDA`. The following snippet imports the GPU version of galario if it is available, otherwise it imports the CPU version: .. code-block:: python if galario.HAVE_CUDA: from galario import double_cuda as g_double from galario import single_cuda as g_single else: from galario import double as g_double from galario import single as g_single This snippet simplifies the development of portable code. Since the functions in `double`, `double_cuda`, `single` and `single_cuda` have the same interfaces, by adding this snippet to the imports of the module you can develop and test your code on a machine even without a GPU, then move to a machine with a GPU and run it on the GPU without any change. Selecting the GPU ----------------- |galario| can be used on machines with one or more CUDA-capable GPUs. The number of GPUs available on the machine can be obtained with the :func:`ngpus() ` function: .. code-block:: python double_cuda.ngpus() # or single_cuda.ngpus() which returns an integer number. It is possible to tell |galario| to use a particular GPU for the computation the :func:`use_gpu() ` function: .. code-block:: python double_cuda.use_gpu(ID) where `ID` is an integer number representing the GPU ID. By default, |galario| uses the GPU with `ID=0`. This means that on machines with only one CUDA-capable GPU it is not necessary to call `double_cuda.use_GPU(0)` as this is the default behaviour. .. note:: The `ID` to be used in :func:`use_gpu() ` might differ from the device ID reported by the `nvidia-smi` command. See the documentation of :func:`use_gpu() ` for more details. Parallelization: setting the threads ------------------------------------ On the CPU ~~~~~~~~~~ The CPU version of |galario| uses OpenMP to parallelize its operations by distributing the workload to different threads. Once you imported the CPU version of |galario| (either double or single precision), you can set the number of threads with the :func:`threads() ` function: .. code-block:: python double.threads(N_OMP) where `N_OMP` is the number of threads to be used. Calling `double.threads(1)` ensures a serial execution of |galario|. By default, if `double.threads(N_OMP)` is not called by the user, |galario| does not set the number of OpenMP threads to be used. This means that, at runtime, OpenMP is free to set the number of threads dynamically. Some OpenMP implementations default to one thread, others to as many threads as there are physical cores. It is possible to retrieve the number of threads used by galario by calling `double.threads()` without argument. .. note:: Setting `N_OMP` larger than the number of **physical** cores, one can use the HyperThreading technology, which can give anything from a moderate boost to a significant performance penalty. This depends on the image size, the memory latency and bandwidth, and other parameters. Experiment around what works best. In most cases, a considerable speedup can be obtained by setting `N_OMP` equal to the number of **physical** cores, which for matrix sizes up to 4k x 4k yields an almost linear scaling in our benchmark results. **Suggestion**: if |galario| is to be used in a code that already uses **MPI** to parallelize the tasks over multiple processes, setting `double.threads(1)` might turn out to give a better overall performance. On the GPU ~~~~~~~~~~ It is possible to change the number of threads per block used to launch 1D and 2D kernels on the GPU with: .. code-block:: python double_cuda.threads(N) where `N` is the square root of the number of threads for block to be used. By default, `N` is set to 16, which implies 256 threads per block. Due to the physical structure of the current NVIDIA cards, `N` must be equal to 8, 16 or 32. This is an advanced feature, for most cases the default value should be sufficient. More details are given in the documentation of :func:`threads() `. .. _cookbook_meshgrid: Computing the (R.A., Dec.) coordinate mesh grid for the image creation ---------------------------------------------------------------------- In this recipe we will adopt the definitions described in the :ref:`image specifications `. To compute a model image defined in the real :math:`(R.A., Dec.)` space, it is necessary to map the pixel indices to the celestial coordinates. This can be easily done with the :func:`get_coords_meshgrid() ` function (see an example below). In general, the :math:`(R.A., Dec.)` coordinates of the pixel centers can be computed as: .. code:: python import numpy as np # axes indices x = (np.linspace(0.5, -0.5 + 1./float(ncol), ncol)) * ncol * dxy # R.A. y = (np.linspace(0.5, -0.5 + 1./float(nrow), nrow)) * nrow * dxy * v_origin # Dec. where `nrow` and `ncol` are the number of rows and columns respectively, `dxy` is the pixel size :math:`\Delta_{xy}` (in radians) and `v_origin=1` for `origin=upper`, `v_origin=-1` for `origin=lower`. `x` and `y` contain the :math:`(R.A., Dec.)` coordinates reported in the Figure of the :ref:`image specifications ` page. A coordinate mesh grid can be created with: .. code:: python import numpy as np x_m, y_m = np.meshgrid(x, y) # (x, y) mesh grid R_m = np.hypot(x_m, y_m) # radial mesh grid `x_m` and `y_m` contain the :math:`(R.A., Dec.)` meshgrid, while `R_m` is the radial meshgrid containing radial distance from the centre (which is always located in the `[i, j]=[Nxy/2, Nxy/2]` pixel), useful for axisymmetric brightness functions. An example ~~~~~~~~~~ All the above calculations can be easily done with just one call to the :func:`get_coords_meshgrid() ` function, e.g.: .. code-block:: python from galario.double import get_coords_meshgrid, arcsec nrow, ncol = nxy, nxy # number of rows and columns (here for a square matrix) dxy = 1e-3*arcsec # pixel size (rad) inc = 30.*deg # inclination (rad) x, y, x_m, y_m, R_m = get_coords_meshgrid(nrow, ncol, dxy=dxy, inc=inc, origin='lower') The returned arrays are in radians, the same units of `dxy`. To obtain :math:`(R.A., Dec.)` in pixel units leave `dxy=1` (the default value). In order to put the coordinates center `(0, 0)` offset by `(Dx, Dy)` w.r.t. the image center, provide the additional offset parameters defined along the same direction of R.A. and Dec. (North to the top, East to the left): .. code-block:: python Dx = -1.*arcsec # R.A. offset (negative, therefore moves the origin to the West) Dy = 0.5*arcsec # Dec. offset (positive, therefore moves the origin to the North) x, y, x_m, y_m, R_m = get_coords_meshgrid(nrow, ncol, dxy=dxy, inc=inc, Dx=Dx, Dy=Dy, origin='lower') It follows that, applying the brightness function of a source on this last coordinate meshgrid, will put the source center offset by `(Dx, Dy)` from the image center. For an axisymmetric brightness :math:`f(R)`, once the meshgrid is computed, the image and its visibilities can be computed as easily as : .. code-block:: python from galario.double import sampleImage, chi2Image image = f(R_m) vis = sampleImage(image, ...) or # chi2 = chi2Image(image, ...)