Go to the next, previous, or main section.

Adding a New FFT to the Benchmark

The benchmark was designed to make it as easy as possible to add new FFTs. Unfortunately, since it also had to be general enough to encompass all of the possible FFT implementations, it is still not as simple as we would like. So, read the directions carefully, and have fun comparing your own FFT to the pack! If you do add a new FFT, especially one that is fairly competitive with the others, we would like to hear from you.

In addition to reading this manual, we encourage you to look at the other FFTs in the benchmark for examples.

The process of adding a new FFT to the benchmark consists of a few steps, of which the last is the most compilicated:

  1. For C code, add #include <fftw.h> at the top of your source code file(s) and use FFTW_REAL as your floating point type. Also, use int rather than long. For Fortran, we suggest that you use double precision for floating point variables and integer for integer variables.
  2. It may be necessary (and is a good idea) to change the names of your subroutines (and global variables) to prevent conflicts with other code in the benchmark. If the author of the code were T. Eliot, for example, you might prepend "eliot_" before all of the routine names.
  3. Modify the Makefile (or its equivalent on your system) to compile your code. (We assume that you know how to use make. When in doubt, follow the example of the other FFT programs.)
  4. Add the prototype for your code to bench_1d_protos.h for 1D transforms, or to bench_3d_protos.h for 3D transforms.
  5. Insert a subroutine of the following form into bench_ffts.c (and its prototype into bench_ffts.h):
    void do_foo_fft(int rank, int *n, int *n_rev, int N, short is_power_of_two,
                    FFTW_COMPLEX *arr, FFTW_COMPLEX *work,
                    int size_arr, int size_work,
                    short compute_accuracy, factor_type allowed_factors)
    {
    ...described below...
    }
    
    Here, foo is a unique name you have assigned to your code.
  6. Insert a call to do_foo_fft into bench_1d.c (for 1D routines) or bench_3d.c (for 3D routines). The call will look like this:
    do_foo_fft(1,&n,&n,n,is_power_of_two,
               arr,work,size_arr,size_work,
               compute_accuracy,allowed_factors);
    
    or this:
    do_foo_fft(3,n,n_rev,N,is_power_of_two,
               arr,work,size_arr,size_work,
               compute_accuracy,allowed_factors);
    
    respectively. You should insert the call among the other such calls in alphabetical order according to the name that you have given your routine (commercial and platform-specific libraries should go at the end, however).
  7. Finally, fill in the body of the do_foo_fft function. This process is described in detail below.

Parameters of do_foo_fft

First, we'll go over the parameters of do_foo_fft. These describe the size of the array to benchmark, provide computation space, and give other necessary information. (The usage of these parameters will be described in more detail in the next sub-section. You will not use many of them directly, but will simply pass them to the DO_BENCHMARK_ND macro.)

int rank
The dimensionality of the array that the benchmark should be performed on. You should check to make sure that this is a value that your code handles.

int *n
An array of length rank containing the dimensions of the array to be transformed.

int *n_rev
The array n in reverse order. Useful when calling Fortran-derived code that expects data in column-major order.

int N
The product of the dimensions of the array. (This will be zero the first time do_foo_fft is called, when the name of your routine will be printed out and other benchmark initializations will be performed...more on this below.)

short is_power_of_two
1 if N is a power of two (or zero). 0 otherwise. Useful for codes that only work for powers of two.

FFTW_COMPLEX *arr
A pointer to storage space that you should use for the input array in your FFT, of length size_arr >= N

FFTW_COMPLEX *work
A work array that may be used for temporary storage by your FFT. Contains work_size elements, and is almost certainly large enough to hold any temporary data that you might need. (work_size is at least three times the size of the largest element of n[].) (work is also useful if you have an out-of-place 1D FFT.)

short compute_accuracy
Whether to compute the accuracy (1) or speed (0) of the FFT.

factor_type allowed_factors
Describes the kind of factors that will be allowed in the array sizes on this run of the benchmark program. Either ALL_FACTORS, POWERS_OF_TWO_ONLY, or NON_POWERS_OF_TWO_ONLY.

The Body of do_foo_fft

The body of the do_foo_fft should look something like this:
{
     if (rank is not okay) return;
     if (N != 0) {
          check if N and n are okay for foo...
          if they are, init foo fft...
     }
     if (N == 0 || (N and n are okay for foo...)) {
          DO_BENCHMARK_ND(name,
                          rank, n, N, arr, out,
                          fft,
                          scale_1, sign, reim_alt,
                          ifft,
                          scale,
                          compute_accuracy)
     }
     else
          SKIP_BENCHMARK("could not handle this N because...")
}
Here, the things in italics are what you will have to fill in. DO_BENCHMARK_ND is actually a big macro that performs the timing for the benchmark, handles the output, etcetera; its italicized parameters will be described in more detail below.

(Note that there are no semicolons after DO_BENCHMARK_ND and SKIP_BENCHMARK because those macros are actually {...} blocks.)

In the common case that your code only works for powers of two, the body of do_foo_fft should look like this:

{
     if (rank is not okay) return;
     if (is_power_of_two) {
          if (N != 0)
               init foo fft...
          DO_BENCHMARK_ND(name,
                          rank, n, N, arr, out,
                          fft,
                          scale_1, sign, reim_alt,
                          ifft,
                          scale,
                          compute_accuracy)
     }
     else if (allowed_factors != NON_POWERS_OF_TWO_ONLY)
          SKIP_BENCHMARK("not a power of two")
}

DO_BENCHMARK_ND

The benchmark is designed so that most of the information about your FFT needs to be entered in only one place, revolving around a call to a macro called DO_BENCHMARK_ND. This macro handles benchmarking your code, checking it for correctness and accuracy, printing out messages to the output files, and many other tasks. It takes a large number of parameters, which tell it everything it needs to know about your FFT. Here, we describe those parameters that were italicized in the code skeletons above.

name
A string that is a short, descriptive name for the FFT; usually the author's last name. In the benchmark output, this will be the name under which the results for your code will be listed.

out
Pointer to the output array of your FFT. If your FFT is in-place, this should be arr. In 1D, out-of-place FFTs can write their output to work (in 3D, you will have to make the work array bigger to allow out-of-place transforms).

fft
A call to your FFT, transforming arr and storing the output in out. Remember, this is a macro, so you can put whatever is necessary here to call your code. This is what will be benchmarked, so you should NOT put one-time initializations here; do those outside the call to DO_BENCHMARK_ND. Also, note that your FFT will never be called when N is zero, so you don't have to worry about that.

scale_1
Factor by which to multiply the output of your FFT in order to get an unnormalized FFT. (Usually, this will be 1.0.)

sign
The sign of the exponent in the transform that your FFT computes (should be +1 or -1).

reim_alt
1 if the input & output of your FFT are complex values stored in alternating real/imaginary order. 0 if the input & output of your FFT are arrays of all the real parts followed by all the imaginary parts. (These are the only two supported possibilities.) Note that if your FFT expects separate arrays for the real and imaginary parts of the data, you should use reim_alt=0 and pass (FFTW_REAL*)arr and (FFTW_REAL*)arr+N to your routine.

ifft
Just like fft, except this should be the inverse transform, and should transform out to arr. (If your code does not include an inverse transform, just put 0 for this parameter. In this case, also pass -compute_accuracy instead of compute_accuracy as the last parameter to DO_BENCHMARK_ND.)

scale
Scale factor to multiply ifft(fft(x)) by to get x. e.g. for unnormalized transforms it should be 1.0/N and for normalized transforms it should be 1.0.

Calling Fortran Transforms

Calling Fortran transforms in the benchmark is almost the same as calling C transforms, with a few small differences.

First of all, you must surround the body of your do_foo_fft function with #ifdef USE_FORTRAN and #endif directives. This allows your code to be disabled on systems lacking Fortran compilers.

Second, you have to deal with the painful ways in which Fortran compilers can munge names. Some Fortran compilers convert routine names to lower case, some conver them to upper case, some add underscores, and others do similar sorts of nonsense. To help you deal with this, you should wrap your subroutine names in the FORTRANIZE macro. Since macros cannot convert names to upper case, you have to pass two names to FORTRANIZE, one lower case and one upper case. For example, if you had a Fortran subroutine foo_fft(n,in,out,isign), you would call it like this:

int isign = -1;
...
FORTRANIZE(foo_fft,FOO_FFT)(n_rev,arr,work,&isign)
(Note that all parameters in Fortran routines are passed by reference. Ugh.)

Do not use underscores in your Fortran routine names. This can cause problems with some compilers (e.g. g77 will munge names differently and thwart the FORTRANIZE macro).

For 3D code, you should note that most Fortran routines will expect arrays in column-major order. To deal with this, you simply pass the array dimensions in reverse order; the parameter n_rev is conveniently provided for this purpose.

Finally, since your Fortran code cannot use the FFTW_REAL type, its floating point type is fixed (probably to double precision). Therefore, you must surround the body of your do_foo_fft function with an if (sizeof(FFTW_REAL) == sizeof(double)) {...} statement.


Go to the next, previous, or main section.