8. Using galario from C++¶
The core of galario is written in C++/Cuda, all functions are in the header galario.h
.
8.1. A small example¶
Here is a small test program that performs the FFT in 2D on an image with random values. It is part of the galario test suite.
/******************************************************************************
* This file is part of GALARIO: *
* Gpu Accelerated Library for Analysing Radio Interferometer Observations *
* *
* Copyright (C) 2017-2020, Marco Tazzari, Frederik Beaujean, Leonardo Testi. *
* *
* This program is free software: you can redistribute it and/or modify *
* it under the terms of the Lesser GNU General Public License as published by *
* the Free Software Foundation, either version 3 of the License, or *
* (at your option) any later version. *
* *
* This program is distributed in the hope that it will be useful, *
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. *
* *
* For more details see the LICENSE file. *
* For documentation see https://mtazzari.github.io/galario/ *
******************************************************************************/
/*
* A simple test to make sure we can include, link, and run galario from pure C++.
*/
#include "galario.h"
#include <cassert>
#include <complex>
#include <vector>
using namespace galario;
int main()
{
init();
constexpr int nx = 128;
constexpr int ny = nx;
// vector initializes to 0
std::vector<dreal> realdata(nx*ny);
dcomplex* res = copy_input(nx, ny, &realdata[0]);
// These checks compute garbage but we want to check if they compile and link.
// Link errors could come from incompatible const modifiers.
int n = 4;
dreal* rp = &realdata[0];
dcomplex* cp = res;
dreal r = realdata[0];
dreal dxy = 0.2;
sweep(nx, rp, dxy/100., dxy/10.5, nx, dxy, 0.5, cp);
uv_rotate(r, r, r, rp, rp, n, rp, rp ,rp, rp);
fft2d(nx, ny, res);
fftshift(n, n, cp);
fftshift_axis0(n, n, cp);
// interpolate(n, n, cp, n, rp, rp, r, cp);
apply_phase_sampled(r, r, n, rp, rp, cp);
auto ncomplex = nx*(ny/2+1);
std::vector<dcomplex> vis_int(res, res + ncomplex);
auto chi2 = reduce_chi2(300, &realdata[0], &realdata[0], res, &realdata[0]);
(void)chi2; // avoid unused variable warning
// copy input before reduce_chi2 to see if it is modified inadvertently
for (auto i = 0; i < ncomplex; ++i) {
assert(vis_int[i] == res[i]);
}
galario_free(res);
cleanup();
return 0;
}
After successfully installing galario to /path/to/galario
, a simple
test program running galario on the CPU with openMP
and double
precision can be built with:
g++ -I/path/to/galario/include -L/path/to/galario/lib -lgalario -DDOUBLE_PRECISION galario_test.cpp -o galario_test
To use single precision, simply do not define the preprocessor symbol -DDOUBLE_PRECISION
and link in the appropriate library with -lgalario_single
. A mismatch between the library and the preprocessor symbol causes undefined behavior but usually a segmentation fault causes the program to abort at runtime.
If galario was installed with cuda
support, you can link in -lgalario_cuda
or -lgalario_single_cuda
instead.
8.2. Example walk-through¶
A small walk through the example’s main
function line by line:
#include "galario.h"
...
using namespace galario;
All galario functions are declared in the header galario.h
and inside the namespace galario
. If you want to use particular functions without making all names in the namespace
available, use for example galario::sweep
:
init();
...
cleanup();
Before any computation is done inside galario, the library has to be initialized. Similarly, any data created during initialization should be cleaned at the end of main()
.
To create the input image, define a std::vector
of the appropriate size. Not that this initializes all values to zero:
std::vector<dreal> realdata(nx*ny);
The data type dreal
can refer to either float
or double
, depending on the preprocessor symbol DOUBLE_PRECISION
. galario assumes the input is a real image but the output of the FFT is complex. galario provides a helper function to allocate an array of the proper size and to copy over the input image:
dcomplex* res = copy_input(nx, ny, realdata);
The actual FFT is done in-place, and the result is stored in res
. The data layout is described in the FFTW manual and applied to the CPU and GPU versions of galario:
fft2d(nx, ny, res);
To deallocate res
, we use:
galario_free(res);
In general, any array created by galario and handed back to the user must be deallocated using galario_free
. The reason is that in the CPU version, we use the FFTW to allocate memory that is aligned to allow FFTW to be most efficient (think SIMD vectorization). Correspondingly, we should not use the standard C library’s free
function.