# 5. **galario** Cookbook¶

## 5.1. Using the GPU and CPU version¶

The CPU version of **galario** is always compiled, even on a system without a CUDA-enabled GPU. In this case you can import
the double and single precision CPU versions of the library with:

```
from galario import double
from galario import single
```

If built on a machine with a CUDA-enabled GPU, **galario** is compiled also for the GPU. You can still import
the CPU version as above, and the GPU version as follows:

```
from galario import double_cuda
from galario import single_cuda
```

To check programmatically whether the GPU version is available, you can read the global variable `galario.HAVE_CUDA`

.

The following snippet imports the GPU version of galario if it is available, otherwise it imports the CPU version:

```
if galario.HAVE_CUDA:
from galario import double_cuda as g_double
from galario import single_cuda as g_single
else:
from galario import double as g_double
from galario import single as g_single
```

This snippet simplifies the development of portable code. Since the functions in `double`

, `double_cuda`

, `single`

and `single_cuda`

have the same interfaces, by adding this snippet to the imports of the module you can develop and test your code on a machine even
without a GPU, then move to a machine with a GPU and run it on the GPU without any change.

## 5.2. Selecting the GPU¶

**galario** can be used on machines with one or more CUDA-capable GPUs. The number of GPUs available on the machine can be
obtained with the `ngpus()`

function:

```
double_cuda.ngpus() # or single_cuda.ngpus()
```

which returns an integer number.

It is possible to tell **galario** to use a particular GPU for the computation the `use_gpu()`

function:

```
double_cuda.use_gpu(ID)
```

where `ID`

is an integer number representing the GPU ID. By default, **galario** uses the GPU with `ID=0`

. This means that on machines
with only one CUDA-capable GPU it is not necessary to call `double_cuda.use_GPU(0)`

as this is the default behaviour.

## 5.3. Parallelization: setting the threads¶

### 5.3.1. On the CPU¶

The CPU version of **galario** uses OpenMP to parallelize its operations by distributing the workload to different threads.

Once you imported the CPU version of **galario** (either double or single precision), you can set the number of threads with
the `threads()`

function:

```
double.threads(N_OMP)
```

where `N_OMP`

is the number of threads to be used. Calling `double.threads(1)`

ensures a serial execution of **galario**.

By default, if `double.threads(N_OMP)`

is not called by the user, **galario** does not set the number of OpenMP threads to be used.
This means that, at runtime, OpenMP is free to set the number of threads dynamically. Some OpenMP implementations default to one thread, others to as many threads as there are physical cores.

It is possible to retrieve the number of threads used by galario by calling `double.threads()`

without argument.

Note

Setting

`N_OMP`

larger than the number ofphysicalcores, one can use the HyperThreading technology, which can give anything from a moderate boost to a significant performance penalty. This depends on the image size, the memory latency and bandwidth, and other parameters. Experiment around what works best.In most cases, a considerable speedup can be obtained by setting

`N_OMP`

equal to the number ofphysicalcores, which for matrix sizes up to 4k x 4k yields an almost linear scaling in our benchmark results.

Suggestion: ifgalariois to be used in a code that already usesMPIto parallelize the tasks over multiple processes, setting`double.threads(1)`

might turn out to give a better overall performance.

### 5.3.2. On the GPU¶

It is possible to change the number of threads per block used to launch 1D and 2D kernels on the GPU with:

```
double_cuda.threads(N)
```

where `N`

is the square root of the number of threads for block to be used. By default, `N`

is set to 16, which implies
256 threads per block. Due to the physical structure of the current NVIDIA cards, `N`

must be equal to 8, 16 or 32.

This is an advanced feature, for most cases the default value should be sufficient. More details are given in the
documentation of `threads()`

.

## 5.4. Computing the (R.A., Dec.) coordinate mesh grid for the image creation¶

In this recipe we will adopt the definitions described in the image specifications.

To compute a model image defined in the real \((R.A., Dec.)\) space, it is necessary to map the pixel indices to the
celestial coordinates. This can be easily done with the `get_coords_meshgrid()`

function (see an example below).

In general, the \((R.A., Dec.)\) coordinates of the pixel centers can be computed as:

```
import numpy as np
# axes indices
x = (np.linspace(0.5, -0.5 + 1./float(ncol), ncol)) * ncol * dxy # R.A.
y = (np.linspace(0.5, -0.5 + 1./float(nrow), nrow)) * nrow * dxy * v_origin # Dec.
```

where `nrow`

and `ncol`

are the number of rows and columns respectively, `dxy`

is the pixel size \(\Delta_{xy}\) (in radians)
and `v_origin=1`

for `origin=upper`

, `v_origin=-1`

for `origin=lower`

. `x`

and `y`

contain the \((R.A., Dec.)\) coordinates
reported in the Figure of the image specifications page.

A coordinate mesh grid can be created with:

```
import numpy as np
x_m, y_m = np.meshgrid(x, y) # (x, y) mesh grid
R_m = np.hypot(x_m, y_m) # radial mesh grid
```

`x_m`

and `y_m`

contain the \((R.A., Dec.)\) meshgrid, while `R_m`

is the radial meshgrid containing radial distance
from the centre (which is always located in the `[i, j]=[Nxy/2, Nxy/2]`

pixel), useful for axisymmetric brightness functions.

### 5.4.1. An example¶

All the above calculations can be easily done with just one call to the `get_coords_meshgrid()`

function, e.g.:

```
from galario.double import get_coords_meshgrid, arcsec
nrow, ncol = nxy, nxy # number of rows and columns (here for a square matrix)
dxy = 1e-3*arcsec # pixel size (rad)
inc = 30.*deg # inclination (rad)
x, y, x_m, y_m, R_m = get_coords_meshgrid(nrow, ncol, dxy=dxy, inc=inc, origin='lower')
```

The returned arrays are in radians, the same units of `dxy`

. To obtain \((R.A., Dec.)\) in pixel units leave `dxy=1`

(the default value).

In order to put the coordinates center `(0, 0)`

offset by `(Dx, Dy)`

w.r.t. the image center, provide the
additional offset parameters defined along the same direction of R.A. and Dec. (North to the top, East to the left):

```
Dx = -1.*arcsec # R.A. offset (negative, therefore moves the origin to the West)
Dy = 0.5*arcsec # Dec. offset (positive, therefore moves the origin to the North)
x, y, x_m, y_m, R_m = get_coords_meshgrid(nrow, ncol, dxy=dxy, inc=inc, Dx=Dx, Dy=Dy, origin='lower')
```

It follows that, applying the brightness function of a source on this last coordinate meshgrid, will put the source center offset by `(Dx, Dy)`

from the image center.

For an axisymmetric brightness \(f(R)\), once the meshgrid is computed, the image and its visibilities can be computed as easily as :

```
from galario.double import sampleImage, chi2Image
image = f(R_m)
vis = sampleImage(image, ...) or # chi2 = chi2Image(image, ...)
```