Replies: 8 comments 31 replies
-
Hi Martin, yes in general having more more wrapped libraries is a good thing. If this is just for comparison with NFFT.jl then we can host that within NFFT.jl itself, see https://github.com/JuliaMath/NFFT.jl/tree/master/Wrappers If you want to provide real Julia bindings to ducc0 I would recommend that you create a dedicated Julia package and implement the Having said that, I can support you creating the Wrapper. The most important thing (in which I unfortunately don't have experience) is to create a "Binary Julia" package like this one: https://github.com/JuliaBinaryWrappers/finufft_jll.jl. You can see the build script of finufft here: https://github.com/JuliaPackaging/Yggdrasil/blob/master/F/finufft/build_tarballs.jl Yggdrasil is the repository that holds the build scripts and the binaries itself are then created using CI services. Regarding C++, the best would be to provide a C Wrapper, since we can call that directly without any problems. When you are at that point, I can help you. Independent of this, I would be interested in what you do different than other NFFT implementations:
Side comment: Best Tobi |
Beta Was this translation helpful? Give feedback.
-
Hi Tobi, many thanks for the quick reply! I will have a look at the necessary steps on the C side to come up with a wrapper. It should not be too hard, probably similar to the FFTW way of describing multi-D arrays. The only slightly tedious thing is that all template specializations might have to be written down explicitly.
It's the classical approach, using a slightly tweaked version of the finufft ES kernel. Kernels are evaluated via precomputed polynomials, so in principle any kernel could be used. The special feature of the library is that, given the grid size, the number of nonuniform points and the desired accuracy, it will automatically determine a combination of kernel support, oversampling factor and concrete kernel function that results in near-optimal performance. (More details at https://arxiv.org/abs/2010.10122; this describes a specialized application for radio interferometry, but most of the technical details also apply to the NFFT implementation.)
Not quite. I subdivide the grid into linear/quadratic/cubic patches that fit into L1 cache (except for large kernel supports in 3D, where this is no longer possible). While processing a patch, I make a local copy of the grid data (even for the uniform-to-nonuniform case), which seems to help with parallelization.
I'm not using FFTW because I"d like to avoid external dependencies as much as possible.
I have seen the paper, but unfortunately didn't have enough time yet for a closer look; I certainly plan to read it carefully soon.
It certainly is. But it means keeping global state in a library, which can cause all sorts of headaches. For example, you have to guarantee the thread safety of the planning functions. And everyone has slightly different ideas about the guarantees made by the library, as you have discovered, for example, with FINUFFT, if I interpret the comments in FINUFFT.jl correctly :-) Cheers, |
Beta Was this translation helpful? Give feedback.
-
Some initial benchmarks on the same computer indeed indicate that |
Beta Was this translation helpful? Give feedback.
-
Oh wow, I just realize that you change the oversampling factor adaptively. This is very important information. Do you use some heuristic there? I would be very much interested in such a thing. For the benchmarks shown on our homepage it would be great to have version with and without adaptive parameter selection. This allows to understand where the performance gain comes from. |
Beta Was this translation helpful? Give feedback.
-
What is quite interesting that NFFT is faster in the "spreading" routine (in 3D) but slower in "interpolation". And this at the same oversampling factor kernel size. So there is definitely something we can learn when analyzing both libraries. And your FFT faster than FFTW, which is quite a statement. It would be great if there would be a Julia package wrapping the FFT part of |
Beta Was this translation helpful? Give feedback.
-
OK, I fear that a nice Julia interface using For the moment, how about the following minimalistic C interface functions? /*
ndim: number of dimensions (1/2/3)
npoints: number of non-uniform points
shape: points to a dense Julia array of shape(ndim,) containing the grid
dimensions in Julia order
grid: points to a dense Julia array of shape (2,shape)
the leading dimension is for real and imaginary parts
coord: points to a dense Julia array of shape(ndim,npoints)
forward: 0 ==> FFT exponent is 1
1 ==> FFT exponent is -1
out: points to a mutable dense Julia array of shape (2,npoints)
the leading dimension is for real and imaginary parts
*/
void nufft_u2nu_julia_double (size_t ndim,
size_t npoints,
const size_t *shape,
const double *grid,
const double *coord,
int forward,
double epsilon,
size_t nthreads,
double *out,
size_t verbosity,
double sigma_min,
double sigma_max);
/*
ndim: number of dimensions (1/2/3)
npoints: number of non-uniform points
shape: points to a dense Julia array of shape(ndim,) containing the grid
dimensions in Julia order
points: points to a dense Julia array of shape (2,npoints)
the leading dimension is for real and imaginary parts
coord: points to a dense Julia array of shape(ndim,npoints)
forward: 0 ==> FFT exponent is 1
1 ==> FFT exponent is -1
out: points to a dense mutable Julia array of shape (2,shape)
the leading dimension is for real and imaginary parts
*/
void nufft_nu2u_julia_double (size_t ndim,
size_t npoints,
const size_t *shape,
const double *points,
const double *coord,
int forward,
double epsilon,
size_t nthreads,
double *out,
size_t verbosity,
double sigma_min,
double sigma_max); Sorry that I cannot use C99 complex data types, since they do not play well with the C++ standard; this is one of the very few situations where legal C99 is not legal C++ (as far as I understand). From the Julia standpoint it shouldn't make a difference though, its should be fine just to pass the pointers to the complex data. I should be able to prepare everything necessary on the compiled side, including a simple Makefile which generates a shared library exporting these two functions. (Equivalents for single precision data should be easy once we have these two functions working.) |
Beta Was this translation helpful? Give feedback.
-
I have now released a new version which allows planned NFFTs. If only looking at the execution times of the pre-planned transforms, this provides significant speedups for 1D and 2D. In 3D the planning cost is dominated by the actual execution cost, so the effect is small there. I'm attaching a few benchmark plots similar to the NFFT.jl ones (sorry, I still cannot compare directly with the Julia results, so Finufft will have to do for the moment). |
Beta Was this translation helpful? Give feedback.
-
I tried to hack something together myself; if you are interested, please have a look at #107 ! |
Beta Was this translation helpful? Give feedback.
-
Hi,
do you think it might be worthwhile to provide a wrapper for
ducc0.nufft
(https://gitlab.mpcdf.mpg.de/mtr/ducc)?This is currently packaged for Python only, but the backend is C++ and consists only of two global functions for carrying out type 1 and 2 non-uniform FFTs, so I think making this accessible from Julia should not be too difficult.
This implementation
Unfortunately I don't have any exprience with Julia myself (yet), so I was not able to create direct benchmarks against NFFT.jl.
If you are interested, I have created a small benchmark script, which tries to imtate the calculations you are showing in the "performance" section of your docs.
Beta Was this translation helpful? Give feedback.
All reactions