Next: , Previous: , Up: Multi-threaded FFTW   [Contents][Index]


5.2 Usage of Multi-threaded FFTW

Here, it is assumed that the reader is already familiar with the usage of the uniprocessor FFTW routines, described elsewhere in this manual. We only describe what one has to change in order to use the multi-threaded routines.

First, programs using the parallel complex transforms should be linked with -lfftw3_threads -lfftw3 -lm on Unix, or -lfftw3_omp -lfftw3 -lm if you compiled with OpenMP. You will also need to link with whatever library is responsible for threads on your system (e.g. -lpthread on GNU/Linux) or include whatever compiler flag enables OpenMP (e.g. -fopenmp with gcc).

Second, before calling any FFTW routines, you should call the function:

int fftw_init_threads(void);

This function, which need only be called once, performs any one-time initialization required to use threads on your system. It returns zero if there was some error (which should not happen under normal circumstances) and a non-zero value otherwise.

Third, before creating a plan that you want to parallelize, you should call:

void fftw_plan_with_nthreads(int nthreads);

The nthreads argument indicates the number of threads you want FFTW to use (or actually, the maximum number). All plans subsequently created with any planner routine will use that many threads. You can call fftw_plan_with_nthreads, create some plans, call fftw_plan_with_nthreads again with a different argument, and create some more plans for a new number of threads. Plans already created before a call to fftw_plan_with_nthreads are unaffected. If you pass an nthreads argument of 1 (the default), threads are disabled for subsequent plans.

You can determine the current number of threads that the planner can use by calling:

int fftw_planner_nthreads(void);

With OpenMP, to configure FFTW to use all of the currently running OpenMP threads (set by omp_set_num_threads(nthreads) or by the OMP_NUM_THREADS environment variable), you can do: fftw_plan_with_nthreads(omp_get_max_threads()). (The ‘omp_’ OpenMP functions are declared via #include <omp.h>.)

Given a plan, you then execute it as usual with fftw_execute(plan), and the execution will use the number of threads specified when the plan was created. When done, you destroy it as usual with fftw_destroy_plan. As described in Thread safety, plan execution is thread-safe, but plan creation and destruction are not: you should create/destroy plans only from a single thread, but can safely execute multiple plans in parallel.

There is one additional routine: if you want to get rid of all memory and other resources allocated internally by FFTW, you can call:

void fftw_cleanup_threads(void);

which is much like the fftw_cleanup() function except that it also gets rid of threads-related data. You must not execute any previously created plans after calling this function.

We should also mention one other restriction: if you save wisdom from a program using the multi-threaded FFTW, that wisdom cannot be used by a program using only the single-threaded FFTW (i.e. not calling fftw_init_threads). See Words of Wisdom-Saving Plans.

Finally, FFTW provides a optional callback interface that allows you to replace its parallel threading backend at runtime:

void fftw_threads_set_callback(
    void (*parallel_loop)(void *(*work)(void *), char *jobdata, size_t elsize, int njobs, void *data),
    void *data);

This routine (which is not threadsafe and should generally be called before creating any FFTW plans) allows you to provide a function parallel_loop that executes parallel work for FFTW: it should call the function work(jobdata + elsize*i) for i from 0 to njobs-1, possibly in parallel. (The ‘data‘ pointer supplied to fftw_threads_set_callback is passed through to your parallel_loop function.) For example, if you link to an FFTW threads library built to use POSIX threads, but you want it to use OpenMP instead (because you are using OpenMP elsewhere in your program and want to avoid competing threads), you can call fftw_threads_set_callback with the callback function:

void parallel_loop(void *(*work)(char *), char *jobdata, size_t elsize, int njobs, void *data)
{
#pragma omp parallel for
    for (int i = 0; i < njobs; ++i)
        work(jobdata + elsize * i);
}

The same mechanism could be used in order to make FFTW use a threading backend implemented via Intel TBB, Apple GCD, or Cilk, for example.


Next: , Previous: , Up: Multi-threaded FFTW   [Contents][Index]