This chapter describes the installation and customization of FFTW, the latest version of which may be downloaded from the FFTW home page.
As distributed, FFTW makes very few assumptions about your system.  All
you need is an ANSI C compiler (gcc is fine, although
vendor-provided compilers often produce faster code).  However,
installation of FFTW is somewhat simpler if you have a Unix or a GNU
system, such as Linux.  In this chapter, we first describe the
installation of FFTW on Unix and non-Unix systems.  We then describe how
you can customize FFTW to achieve better performance.  Specifically, you
can I) enable gcc/x86-specific hacks that improve performance on
Pentia and PentiumPro's; II) adapt FFTW to use the high-resolution clock
of your machine, if any; III) produce code (codelets) to support
fast transforms of sizes that are not supported efficiently by the
standard FFTW distribution.
FFTW comes with a configure program in the GNU style.
Installation can be as simple as:
./configure make make install
This will build the complex and real transform libraries along with the
test programs.  We recommend that you use GNU make if it is
available; on some systems it is called gmake.  The "make
install" command installs the fftw and rfftw libraries in standard
places, and typically requires root privileges (unless you specify a
different install directory with the --prefix flag to
configure).  You can also type "make check" to put the
FFTW test programs through their paces.
The configure script knows good CFLAGS (C compiler flags)
for a few systems.  If your system is not known, the configure
script will print out a warning.  (6)  In this case, you can compile
FFTW with the command
make CFLAGS="<write your CFLAGS here>"
If you do find an optimal set of CFLAGS for your system, please
let us know what they are (along with the output of config.guess)
so that we can include them in future releases.
The configure program supports all the standard flags defined by
the GNU Coding Standards; c.f. the INSTALL file in FFTW or
the GNU web page.
Note especially --help to list all flags and
--enable-shared to create shared, rather than static, libraries.
configure also accepts a few FFTW-specific flags, particularly:
--with-gcc Enables the use of gcc.  By default, FFTW uses
the vendor-supplied cc compiler if present.  Unfortunately,
gcc produces slower code than cc on many systems.
--enable-float Produces a single-precision version of FFTW
(float) instead of the default double-precision (double).
--enable-i386-hacks  See below.
--enable-pentium-timer  See below.
It is quite straightforward to install FFTW even on non-Unix systems
lacking the niceties of the configure script.  The FFTW Home Page
may include some FFTW packages preconfigured for particular
systems/compilers, and also contains installation notes sent in by
users.  All you really need to do, though, is to compile all of the
.c files in the appropriate directories of the FFTW package.
For the complex transforms, compile all of the .c files in the
fftw directory and link them into a library.  Similarly, for the
real transforms, compile all of the .c files in the rfftw
directory into a library.  Note that these sources #include
various files in the fftw and rfftw directories, so you
may need to set up the #include paths for your compiler
appropriately.  Be sure to enable the highest-possible level of
optimization in your compiler.
By default, FFTW is compiled for double-precision transforms.  To work
in single precision rather than double precision, #define the
symbol FFTW_ENABLE_FLOAT in fftw.h (in the fftw
directory).
These libraries should be linked with any program that uses the
corresponding transforms.  The required header files, fftw.h and
rfftw.h, are located in the fftw and rfftw
directories respectively; you may want to put them with the libraries,
or wherever header files normally go on your system.
FFTW includes test programs, fftw_test and rfftw_test, in
the tests directory.  These are compiled and linked like any
program using FFTW, except that they use additional header files located
in the fftw and rfftw directories, so you will need to set
your compiler #include paths appropriately.  fftw_test is
compiled from fftw_test.c and test_main.c, while
rfftw_test is compiled from rfftw_test.c and
test_main.c.  When you run these programs, you will be prompted
interactively for various possible tests to perform; see also
tests/README for more information.
gcc and Pentium/PentiumPro hacks
The configure option --enable-i386-hacks enables specific
optimizations for gcc and Pentium/PentiumPro, which can
significantly improve performance of double-precision transforms.
Specifically, we have tested these hacks on Linux, with gcc
2.[78] and egcs 1.0.3.  These optimizations only affect the
performance, not the correctness of FFTW (i.e. it is always safe to try
them out).
These hacks provide a workaround to the incorrect alignment of local
double variables in gcc.  The compiler aligns these
variables to multiples of 4 bytes, but execution is much faster (on
Pentium and PentiumPro) if doubles are aligned to a multiple of 8
bytes.  By carefully counting the number of variables allocated by the
compiler in performance-critical regions of the code, we have been able
to introduce dummy allocations (using alloca) that align the
stack properly.  The hack depends crucially on the compiler flags that
are used.  For example, it won't work without
-fomit-frame-pointer.
The fftw_test program outputs speed measurements that you can use
to see if these hacks are beneficial.
The configure option --enable-pentium-timer enables the
use of the Pentium and PentiumPro cycle counter for timing purposes.  In
order to get correct results, you must define FFTW_CYCLES_PER_SEC
in fftw/config.h to be the clock speed of your processor; the
resulting FFTW library will be nonportable.  The use of this option is
deprecated.  On serious operating systems (such as Linux), FFTW uses
gettimeofday(), which has enough resolution and is portable.
(Note that Win32 has its own high-resolution timing routines as well.
FFTW contains unsupported code to use these routines.)
FFTW needs a reasonably precise clock in order to find the optimal way
to compute a transform.  On Unix systems, configure looks for
gettimeofday and other system-specific timers.  If it does not
find any high resolution clock, it defaults to using the clock()
function, which is very portable, but forces FFTW to run for a long time
in order to get reliable measurements.
If your machine supports a high-resolution clock not recognized by FFTW,
it is therefore advisable to use it.  You must edit
fftw/fftw-int.h.  There are a few macros you must redefine.  The
code is documented and should be self-explanatory.  (By the way,
fftw-int stands for fftw-internal, but for some
inexplicable reason people are still using primitive systems with 8.3
filenames.)
Even if you don't install high-resolution timing code, we still
recommend that you look at the FFTW_TIME_MIN constant in
fftw/fftw-int.h. This constant holds the minimum time interval (in
seconds) required to get accurate timing measurements, and should be (at
least) several hundred times the resolution of your clock.  The default
constants are on the conservative side, and may cause FFTW to take
longer than necessary when you create a plan. Set FFTW_TIME_MIN
to whatever is appropriate on your system (be sure to set the
right FFTW_TIME_MIN...there are several definitions in
fftw-int.h, corresponding to different platforms and timers).
As an aid in checking the resolution of your clock, you can use the
tests/fftw_test program with the -t option
(c.f. tests/README). Remember, the mere fact that your clock
reports times in, say, picoseconds, does not mean that it is actually
accurate to that resolution.
If you know that you will only use transforms of a certain size (say,
powers of 2) and want to reduce the size of the library, you can
reconfigure FFTW to support only those sizes you are interested in.  You
may even generate code to enable efficient transforms of a size not
supported by the default distribution.  The default distribution
supports transforms of any size, but not all sizes are equally fast.
The default installation of FFTW is best at handling sizes of the form
2a 3b 5c 7d
        11e 13f,
where e+f is either 0 or
1, and the other exponents are arbitrary.  Other sizes are
computed by means of a slow, general-purpose routine.  However, if you
have an application that requires fast transforms of size, say,
17, there is a way to generate specialized code to handle that.
The directory gensrc contains all the programs and scripts that
were used to generate FFTW.  In particular, the program
gensrc/genfft.ml was used to generate the code that FFTW uses to
compute the transforms.  We do not expect casual users to use it.
genfft is a rather sophisticated program that generates directed
acyclic graphs of FFT algorithms and performs algebraic simplifications
on them.  genfft is written in Objective Caml, a dialect of ML.
Objective Caml is described at http://pauillac.inria.fr/ocaml/
and can be downloaded from from ftp://ftp.inria.fr/lang/caml-light.
If you have Objective Caml installed, you can type sh
bootstrap.sh in the top-level directory to re-generate the files.  If
you change the gensrc/config file, you can optimize FFTW for
sizes that are not currently supported efficiently (say, 17 or 19).
We do not provide more details about the code-generation process, since we do not expect that users will need to generate their own code. However, feel free to contact us at fftw@theory.lcs.mit.edu if you are interested in the subject.
You might find it interesting to learn Caml and/or some modern programming techniques that we used in the generator (including monadic programming), especially if you heard the rumor that Java and object-oriented programming are the latest advancement in the field.
Go to the first, previous, next, last section, table of contents.