The following test triggers the bug (SSE2, double precision):
./tests/bench -oexhaustive r4*2:5:3
This test computes a pair of length-4 real->complex transforms where the second input is 5 real numbers away from the first input. That is, there is a gap of one real number between the first and second input array. The -oexhaustive level allow FFTW to attempt to compute this transform by reducing it to a pair of complex transforms of length 2, but now the second input is not aligned to a complex-number boundary. The fact that 5 is odd is the problem.
The bug cannot occur in complex->complex transforms because the complex interface accepts strides in units of complex numbers, so strides are aligned by construction.
This bug has been around at least since fftw-3.1.2 (July 2006), and probably since fftw-3.0 (2003).
fftw_planner_nthreads()
returns the number of threads
currently being used by the planner.
The avx512 alignment requirement was set to 64 bytes, but this is wrong. Alignment requirements are a property of the platform (e.g., x86) and not of the instruction set (e.g., AVX). Among other things, this broke wisdom with avx512.
Note that avx512 support is still experimental because the FFTW authors have no avx512 hardware available for testing.
fftw_threads_set_callback
function to change the threading backend at runtime.
By default, FFTW 3.3.7 was broken with gcc-8. AVX and AVX2 code
assumed that the compiler honors the distinction between +0 and -0,
but gcc-8 -ffast-math
does not. The default CFLAGS
included -ffast-math
. This release ensures that FFTW
works with gcc-8 -ffast-math
, and removes
-ffast-math
from the default CFLAGS for good measure.
The primary build mechanism for FFTW remains GNU autoconf/automake. CMake support is meant to offer an easy way to compile FFTW on Windows, and as such it does not cover all the features of the automake build system, such as exotic cycle counters, cross-compiling, or build of binaries for a mixture of ISA's (e.g., amd64 vs amd64+avx vs amd64+avx2). Patches are welcome.
tests/bench
: use 64-bit precision to compute mflops.
libfftw3.so.2.6.6
instead of
libfftw3.so.3.*
.
fftw_make_planner_thread_safe()
API introduced
in 3.3.5 didn't work, and this 3.3.6 fixes it. Sorry about that.
(_MSC_VER > 1500)
--enable-vsx
to configure.
--enable-avx2
to configure.
--enable-avx512
, --enable-kcvi
)
This code is expected to work but the FFTW maintainers do not have
hardware to test it.
--enable-avx128-fma
)
fftw_make_planner_thread_safe()
API.
fftwq_alloc_real
.
fftw_alignment_of
(to check whether two arrays are
equally aligned for the purposes of applying a plan) and fftw_sprint_plan
(to output a description of plan to a string).
fftw-wisdom-to-conf
; thanks to Florian Oppermann for the
bug report.
fftw3l.f03
interface
file for the long double interface, which is not supported by
some Fortran compilers. Provided new fftw3q.f03
interface file
to access the quadruple-precision FFTW routines with recent
versions of gcc/gfortran.
make check
now runs MPI tests
__float128
in gcc 4.6 or later (on x86.
x86-64, and Itanium). The new routines use the fftwq_
prefix.
fftw_alloc_real
and fftw_alloc_complex
to use fftw_malloc
for real and complex arrays without typecasts
or sizeof.
fftw_export_wisdom_to_filename
and
fftw_import_wisdom_from_filename
that export/import wisdom
to a file, which don't require you to open/close the file yourself.
fftw_cost
to return FFTW's internal cost metric for
a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the
suggestion.
--enable-sse2
configure flag now works in both double and single
precision (and is equivalent to --enable-sse
in the latter case).
--enable-portable-binary
flag: we new produce portable binaries
by default.
-mtune=native
.
Remove the --with-gcc-arch
flag; if you want to specify a particlar
arch to configure, use ./configure CC="gcc -mtune=..."
.
--with-our-malloc16
configure flag is now renamed --with-our-malloc
.
srand48
declaration is missing;
thanks to Ralf Wildenhues for the bug report.
fftw_set_timelimit
: ensure that a negative timelimit
is equivalent to no timelimit in all cases. Thanks to William Andrew
Burnson for the bug report.
alloca
with
too large a buffer.
snprintf
is defined as a macro;
thanks to Marcus Mae for the bug report.
dfftw_execute
,
because of reports of problems with various Fortran compilers;
it is better to use dfftw_execute_dft
etcetera.
--enable-openmp
and --enable-threads
are mutually exclusive (thanks to Long To),
and document slightly odd behavior of plan_guru_r2r
in Fortran
(thanks to Alexander Pozdneev).
FFTW_WISDOM_ONLY
flag, at the suggestion of Mario
Emmenlauer and Phil Dumont.
make check
for MPI code (which still fails in a couple corner
cases, but should be much better than in alpha2).
README.Cell
and the Cell section
of the manual.
plan_guru
"
function there is a new "plan_guru64
"
function with the same semantics, but which takes fftw_iodim64
instead of
fftw_iodim
. fftw_iodim64
is the same as fftw_iodim
, except that it takes
ptrdiff_t
integer types as parameters, which is a 64-bit type on
64-bit machines. This is only useful for specifying very large transforms
on 64-bit machines. (Internally, FFTW uses ptrdiff_t
everywhere
regardless of what API you choose.)
FFTW_WISDOM_ONLY
planner flag, to create plan only if wisdom is
available and return NULL
otherwise.
--enable-sse
instead.
--with-g77-wrappers
configure option to force inclusion
of g77 wrappers, in addition to whatever is needed for the
detected Fortran compilers. This is many intended for GNU/Linux
distros switching to gfortran, but wishing to include both
gfortran and g77 support in FFTW.
__declspec
attribute to threads API functions when compiling
for Windows (thanks to Robert O. Morris for the bug report)
dfftw_init_threads
in Fortran;
thanks to Markus Wetzstein for the bug report.
configure
script: --enable-portable-binary
option was ignored!
Thanks to Andrew Salamon for the bug report.
configure
script now detects Core/Duo arch.
-maltivec
when checking for altivec.h
. Fixes Gentoo bug #129304,
thanks to Markus Dittrich.
fftw-wisdom
tool, replaced obsolete --impatient
with --measure
.
__declspec(dllexport)
).
--without-cycle-counter
option is removed. If no cycle counter is found,
then the estimator is always used. A --with-slow-timer
option is provided
to force the use of lower-resolution timers.
static
keyword that prevented simultaneous linkage
of different-precision versions; thanks to Rasmus Larsen for the bug report.
f77_wisdom.f
file; thanks to Alan Watson.
-xopenmp
flag for SunOS; thanks to John Lou for the bug report.
-Wp,-H128000
flag to increase
preprocessor limits; thanks to Peter Vouras for the bug report.
tempfile
in fftw-wisdom-to-conf
script;
thanks to Nicolas Decoster for the patch.
make smallcheck
target in tests/
directory, at the request of
James Treacy.
fftw_flops
now returns double
arguments, not int
, to avoid overflows
for large sizes.
README
file for test program.
fftw_threads_init function
, which some people were
calling accidentally instead of the fftw_init_threads
API function.
-openmp
flag (Intel C compiler) when --enable-openmp
is used.
FFTW_PATIENT
transforms.
make check
' should now only take a few minutes; for more
strenuous tests (which may take a day or so), do 'cd tests; make bigcheck
'.
fftw_print_plan
is split into fftw_fprint_plan
and fftw_print_plan
, where
the latter uses stdout.
alloca
under MinGW, AIX.
PTHREAD_SCOPE_SYSTEM
there.
--with-openmp
or --with-sgi-mp
in addition to --enable-threads
.
CFLAGS
environment
variable if it is defined. (Thanks to Diab Jerius.)
mpi/README.f77
for more information.
fftw_f77_threads_init
function to the Fortran wrappers
for the multi-threaded transforms. Thanks to V. Sundararajan for
the bug report.
TODO
list.
--enable-type-prefix
option to configure
makes it easy to install
both single- and double-precision versions of FFTW on the same
(Unix) system. (See the installation section of the manual.)
fftw_mpi
documentation in the FFTW
manual.)
rfftwnd_mpi
documentation in the
FFTW manual.)
doc
directory). On Unix systems, they are also
automatically configured, compiled, and installed along with the main
FFTW library when you include --enable-mpi
in the flags to the
configure
script. (See the FFTW manual.)
MPI_Alltoall
primitive). Beware that
the interfaces have changed slightly, however.
rfftw_threads
documentation in the FFTW manual.)
doc
directory). On Unix systems, they are also
automatically configured, compiled, and installed along with the main
FFTW library when you include --enable-threads
in the flags to the
configure
script. (See the FFTW manual.)
--disable-fortran
option to configure
) and
are documented in the main FFTW manual.
rfftwnd
tutorial
section of the manual, in the hope of preventing future confusion
on this subject.
128x54x81
) for the -c
and -s
correctness and speed test options.
FFTW_OUT_OF_PLACE
to fftw.h
. The
flag is mentioned several times in the documentation, but its
definition was accidentally omitted since FFTW_OUT_OF_PLACE
is the
default behavior.
fftwnd_create_plan_specific
is used).
Thanks to Geert van Kempen for his suggestions.
fftw_one
, fftwnd_one
, rfftw_one
, etcetera, to simplify
and clarify the use of fftw for single, unit-stride transforms.
FFTW_COMPLEX
, FFTW_REAL
to fftw_complex
, fftw_real
(for
greater consistency in capitalization). The all-caps names will
continue to be supported indefinitely, but are deprecated. (Also,
support for the COMPLEX
and REAL
types from FFTW 1.0 is now
disabled by default.)
FFTW_THREADSAFE
flag (described therein).
configure --enable-shared
to
produce a shared library instead of a static library (the default).
_op_count
) routines
introduced in v1.3, as these were little-used and were a pain to
keep up-to-date as FFTW changed internally.
float
and double
(e.g. long double
). (See the file fftw-int.h
.)
howmany
> 1 and stride
> dist
.
*_create_plan_specific
functions.)
*_count_plan_ops
functions.)
gettimeofday
function if available. (This function typically
has much higher accuracy than clock()
, permitting plans to be
created much more quickly than before on many machines.)
matlab/
directory).
fortran/
directory). (These were available
separately before.)
rfftwnd
routines where a block was accidentally
allocated to be too small, causing random memory to be
overwritten (yikes!). (Amazingly, this bug only caused the
test program to fail on one system that we could find. Our
test suite can now catch this sort of bug.)
fftw_time_diff
macro/function) to allow more general timer data structures.
FFTW_TIME_MIN
) is reached.)
fftwnd_destroy_plan
(reported
by Richard Sullivan). Our test programs now all check for leaks.
COMPLEX
to FFTW_COMPLEX
to avoid clashes with
existing packages. COMPLEX
is still supported
for compatibility with 1.0.