# Essays/FFT

## FFT

How fast can we multiply polynomials? The J idiom is +//.@:(*/) but for big polynomials we can take advantage of the fact that multiplication in the frequency domain corresponds to convolution in the time domain.

```NB. Start with the standard J FFT library:

cube      =: (\$~ q:@#) :. ,
roots     =: +@]^:(0>[)  (_1^2%]) ^ i.@-:   NB. The elegant version
roots     =: +@]^:(0>[) [: (, j.) [: (, (j.~%:0.5) , |."1&.:+.@|.@}.) [: ^@o.@:j. i.@(%&8) % -:   NB. Better accuracy for N>=8
rou       =:            [: (, j.) [: (, (j.~%:0.5) , |."1&.:+.@|.@}.) [: ^@o.@:j. i.@(%&8) % -:   NB. roots of unity
floop     =: 4 : 'for_r. i.#\$x do. (y=.{."1 y) ] x=.(+/x) ,&,:"r (-/x)*y end.'
fft       =: (     ] floop&.cube   rou@#) f. :. ifft
ifft      =: (# %~ ] floop&.cube +@rou@#) f. :. fft

NB. Add a verb to join polynomials and extend with zeros

extend    =: >.&.(2&^.)@<:@+&# {."_1 ,:

NB. Define convolution for reference:

convolve  =: +//.@:(*/)

NB. convolution using FFT would be:

rconvolve =: *&.fft/@extend

1 2 3 4 convolve 2 3 4 5
2 7 16 30 34 31 20
1 2 3 4 rconvolve 2 3 4 5
2 7 16 30 34 31 20 _1.77636e_15
```

The real part looks right, but there is some numerical inaccuracy. That might make this approach unsuitable for polynomials needing high precision in the coefficients, but if we are dealing with integers, we can take the real part and round off:

```roundreal =: [: <. 0.5 + 9&o.
iconvolve =: roundreal@(*&.fft/@extend)

1 2 3 4 iconvolve 2 3 4 5
2 7 16 30 34 31 20 0
```

It works! There are those extra zeros to contend with. Try it on longer operands:

```    'a b' =. 2 1024 ?@\$ 1000
a (iconvolve -:/@,: convolve) b
1
```

It matches! How's the speed?

```    'a b' =. 2 8192 ?@\$ 1000
ts 'a convolve b'
2.61487 5.37003e8
ts 'a iconvolve b'
0.183339 4.72448e6
```

Pretty good. What about for bigger operands?

```    'a b' =. 2 16384 ?@\$ 1000
ts 'a convolve b'
|limit error: convolve
|   a     convolve b
ts 'a iconvolve b'
0.405789 9.44307e6
```

As another tweak, we can use the conjugate symmetry of the FFT to take the two forward FFTs at the same time. The idea is that the FFT of a N-long real sequence (which has all imaginary parts zero) exhibits conjugate symmetry: element i is the conjugate of element N-i. Another way to say that is to say that the real parts are symmetric and the imaginary parts are antisymmetric. In J terms, F=+ _1 |. |. F . Similarly, the FFT of an N-long imaginary sequence has symmetric imaginary parts and antisymmetric real parts. This means that if we have two real sequences, we make one of them the real part of the sequence to be transformed and the other the imaginary part, and then after the transform we can find the symmetric and antisymmetric parts of the real and imaginary components of the transform result and piece them together to create the transforms of the two starting sequences.

Since we don't need the sequences, but only their product, we can save a little work:

```roundimag =: [: <. 0.5 + 11&o.
squash0   =: (- _1&|.@|.@:+)@:*:@:-:
iconvolve =: roundimag@(squash0&.fft)@(j./)@extend

'a b' =. 2 1e6 ?@\$ 10
ts 'a iconvolve b'
26.9542 6.71095e8
```

The final pass, which produces an imaginary-only result, can be replaced by a step to halve the size of the final FFT, which saves 25% overall. The first pass of the IFFT splits its input into sections that produce the even and odd result values. We perform that step by itself, and then multiply one of the values by 0j_1 to reroute those results to the real parts of the output. In this way both the real and imaginary parts of the result are meaningful.

The final step of adding 0.5 prior to conversion to integer can be done elegantly in the frequency domain by adding the frequency-domain representation of constant 0.5. This is 0.5j0.5 in the first item of the transformed value, and 0 elsewhere.

```squash1   =: -:@# ({. ((- * +@rou@+&#) j. -@:+) }.) -:
biasfreq  =: 0&((j.~@-:@#@] + {)`[`]})
jconvolve =: <.!.0@,@:+.@(biasfreq@squash1@squash0&.fft)@(j./)@extend

'a b' =. 2 1e6 ?@\$ 10
ts 'a jconvolve b'
22.5309 6.04018e8
```

That's multiplication of two million-digit polynomials in 23 seconds.

If you want to treat the polynomials as numbers you can get the digits by a carry-collection pass like

```polytodigits =: }. @ (((0,~[)+0,])/) @ |: @ (0 10&#:)^: _ @ (0 , |.)

polytodigits 3 2 4 6 7 jconvolve 8 3 4 6 2 6 7
0 0 0 0 0 5 8 2 8 3 5 2 7 1 2 7 4
76423*7626438x
582835271274
```

This code is for understanding, not for performance measurement. Several improvements are possible:

1. The code recomputes roots of _1 (3 times) every time it runs; a serious implementation would reuse the roots, and would save roots between convolutions.
2. You would just use the fftw library anyway, which is optimized already and is a few times faster.

The important point is that this in an O(*^.) algorithm. It can feasibly multiply polynomials of a million terms, or numbers with millions of digits.

## Collected Definitions

```convolve     =: +//.@:(*/)                              NB. for reference

cube         =: (\$~ q:@#) :. ,
roots        =: +@]^:(0>[)  (_1^2%]) ^ i.@-:            NB. The elegant version, but less accurate
roots        =: +@]^:(0>[) [: (, j.) [: (, (j.~%:0.5) , |."1&.:+.@|.@}.) [: ^@o.@:j. i.@(%&8) % -:
rou          =:            [: (, j.) [: (, (j.~%:0.5) , |."1&.:+.@|.@}.) [: ^@o.@:j. i.@(%&8) % -:   NB. roots of unity
floop        =: 4 : 'for_r. i.#\$x do. (y=.{."1 y) ] x=.(+/x) ,&,:"r (-/x)*y end.'
fft          =: (     ] floop&.cube   rou@#) f. :. ifft
ifft         =: (# %~ ] floop&.cube +@rou@#) f. :. fft
extend       =: >.&.(2&^.)@<:@+&# {."_1 ,:              NB. join polynomials and extend with zeros
rconvolve    =: *&.fft/@extend                          NB. convolution using FFT, produces complex result

roundimag    =: [: <. 0.5 + 11&o.
squash0      =: (- _1&|.@|.@:+)@:*:@:-:
iconvolve    =: roundimag@(squash0&.fft)@(j./)@extend   NB. convolution using FFT, integer result

squash1      =: -:@# ({. ((- * +@rou@+&#) j. -@:+) }.) -:
biasfreq     =: 0&((j.~@-:@#@] + {)`[`]})
jconvolve    =: <.!.0@,@:+.@(biasfreq@squash1@squash0&.fft)@(j./)@extend

polytodigits =: }. @ (((0,~[)+0,])/) @ |: @ (0 10&#:)^: _ @ (0 , |.)
```