There is a fair amount of overhead involved in synchronizing threads, so the optimal number of threads to use depends upon the size of the transform as well as on the number of processors you have.
As a general rule, you don’t want to use more threads than you have processors. (Using more threads will work, but there will be extra overhead with no benefit.) In fact, if the problem size is too small, you may want to use fewer threads than you have processors.
You will have to experiment with your system to see what level of
parallelization is best for your problem size. Typically, the problem
will have to involve at least a few thousand data points before threads
become beneficial. If you plan with
FFTW_PATIENT, it will
automatically disable threads for sizes that don’t benefit from