LOGLO-FNO: Efficient Learning of Local and Global Features in Fourier Neural Operators

🎯 Towards tackling the Spectral Bias of Neural Operators!

Marimuthu Kalimuthu1,2,3, David Holzmüller 4, Mathias Niepert1,2,5

1Universität Stuttgart, 2Stuttgart Center for Simulation Science - SimTech,
3International Max Planck Research School for Intelligent Systems (IMPRS-IS), 4INRIA Paris, École Normale Supérieure, PSL University, 5NEC Labs Europe

Logo Logo Accepted for Oral Presentation at the ICLR 2025 workshop on Machine Learning Multiscale Processes (MLMP) Logo Logo

Paper ( arXiv) OpenReview ICLR.CC Slides Poster (Soon) Code (Soon) Citation

LOGLO-FNO: Local Global Fourier NO

Modeling high-frequency information is a critical challenge in Scientific Machine Learning. For instance, fully turbulent flow simulations of Navier-Stokes equations at Reynolds numbers 3500 and above can generate high-frequency signals due to swirling fluid motions caused by eddies and vortices. Faithfully modeling such signals using neural networks depends on the accurate reconstruction of moderate to high frequencies. However, it has been well known that deep neural nets exhibit the so-called spectral bias toward learning low-frequency components. Meanwhile, Fourier Neural Operators (FNOs) have emerged as a popular class of data-driven models in recent years for solving Partial Differential Equations (PDEs) and for surrogate modeling in general. Although impressive results have been achieved on several PDE benchmark problems, FNOs often perform poorly in learning non-dominant frequencies characterized by local features. This limitation stems from the spectral bias inherent in neural networks and the explicit exclusion of high-frequency modes in FNO and its variants. Therefore, to mitigate these issues and improve FNO’s spectral learning capabilities to represent a broad range of frequency components, we propose two key architectural enhancements: (i) a parallel branch performing local spectral convolutions and (ii) a high-frequency propagation module. Moreover, we propose a novel frequency-sensitive loss term based on radially binned spectral errors. This introduction of a parallel branch for local convolutions reduces the number of trainable parameters by up to 50% while achieving the accuracy of baseline FNO that relies solely on global convolutions. Experiments on three challenging PDE problems in fluid mechanics (Kolmogorov Flow 2D & Turbulent Radiative Layer 3D) and biological pattern formation (Diffusion-Reaction 2D), and the qualitative and spectral analysis of predictions show the effectiveness of our method over the state-of-the-art neural operator baselines.

Core Building Blocks of Base FNO vs. LOGLO-FNO

The core idea of LOGLO-FNO for achieving local spectral convolution in FNO is to first partition or decompose the domain D into M non-overlapping hypercubes called patches, such that m=1MPm=D, and then performing (learnable) spectral convolutions on these sub-domains without any truncation of Fourier modes.

Fourier layer of FNO:

Let  ZR×Nc×Nx×Ny[×Nz]  be the output of the (initial) lifting layer. Then, Υ:=σ[K(Z)+WfZ+bf] L(Z):=Wc2(σ[Wc1Υ+bc1])+bc2+[WcZ+bc] where W and b are learnable parameters and K() is global kernel integral.

Fourier layer of LOGLO-FNO:

Υg:= σ[Kg(Z)+WgZ+bg],Υl:=σ[Kl(Z^)+WlZ^+bl], L(Z,Z^,Z):=Wgc2(σ[Wgc1Υg+bgc1])+bgc2+[WgcZ+bgc] + [WlcZ^+blc]+Wlc2(σ[Wlc1Υl+blc1])+blc2  + Whfc2(σ[Whfc1Z+bhfc1])+bhfc2,

Complexity Analysis of Local and Global Spectral Convolutions in Baseline FNO and LOGLO-FNO:

(i) Global Branch - 2D Spectral Convolutions on 2D Spatial Data

Let Nb be the batch size, Nx and Ny the resolutions of the spatial dimensions, Nc be the width or the number of hidden channels, and cin and cout are typically set to be of same dimension (e.g., 128). FLOPsFFT=5NbCinNxNylog2(NxNy) FLOPsIFFT=5NbCoutNxNylog2(NxNy) Therefore, FFT computation on the global branch with the full 2D spatial resolution has a per channel complexity of O(NxNylog2(NxNy)), making it expensive for large spatial resolutions such as 2048×2048 or higher.

(ii) Local Branch - 2D Spectral Convolutions on 2D Spatial Data

Let Np be the number of patches obtained by NxNyPs2, Nx and Ny the resolutions of the spatial dimensions, Ps×Ps the patch size (e.g., 16×16 ), Nc be the width or the number of hidden channels, and cin and cout are typically set to be of same dimension (e.g., 128). FLOPsFFT=5NbCinNpPsPslog2(PsPs)=5NbCinNxNylog2(PsPs) FLOPsIFFT=5NbCoutNpPsPslog2(PsPs)=5NbCoutNxNylog2(PsPs) Thus, the FFT and IFFT computations on the local branch, which is operating on the patches has a per channel computational complexity of O(NxNylog2(PsPs)). Since, in practice, O(NxNylog2(PsPs))O(NxNylog2(NxNy)), the FFT computations are significantly cheaper in the local branch. Moreover, it is highly parallelizable when computing FFTs since each patch can be processed independently, leveraging modern accelerators such as GPUs, TPUs, and other forms of processing units.

(iii) Global Branch - 3D Spectral Convolutions on 3D Spatial Data

Let Nb be the batch size, Nx, Ny, and Nz the resolutions of the spatial dimensions, Nc be the width or the number of hidden channels, and cin and cout are typically set to be of same dimension (e.g., 128). FLOPsFFT=5NbCinNxNyNzlog2(NxNyNz) FLOPsIFFT=5NbCoutNxNyNzlog2(NxNyNz) Therefore, FFT computation on the global branch with the full 3D spatial resolution has a per channel complexity of O(NxNyNzlog2(NxNyNz)), making it highly expensive for large values of Nx, Ny, and Nz such as 512×512×512 or further higher spatial resolutions.

(iv) Local Branch - 3D Spectral Convolutions on 3D Spatial Data

Let Np be the number of patches obtained by NxNyNzPs3, Nx, Ny, and Nz the resolutions of the spatial dimensions, Ps×Ps×Ps the patch size (e.g., 16×16×16, 32×32×32, etc.), Nc be the width or the number of hidden channels, and cin and cout are typically set to be of same dimension (e.g., 128). FLOPsFFT=5NbCinNpPsPsPslog2(PsPsPs)=5NbCinNxNyNzlog2(PsPsPs) FLOPsIFFT=5NbCoutNpPsPsPslog2(PsPsPs)=5NbCoutNxNyNzlog2(PsPsPs) Thus, the FFT and IFFT computations on the local branch operating on 3D patches has a per channel computational complexity of O(NxNyNzlog2(PsPsPs)). Since, in practice, O(NxNyNzlog2(PsPsPs))O(NxNyNzlog2(NxNyNz)), the FFT computations are significantly cheaper in the local branch. Furthermore, it is highly parallelizable when computing FFTs since each patch can be processed independently, leveraging modern accelerators such as GPUs, TPUs, and other forms of PUs.

🌈 Radially Binned Spectral Energy of Errors

In addition, we propose a spectral loss term based on the radial binning of spectral energy of errors, which is as follows:

radial-binned-spectral-error-algo

🖥 Pseudocode - Radially Binned Spectral Energy of Errors

            
                def RadialBinnedSpectralLoss(preds, target):
                    # input data shape and params
                    nb, nc, nx, ny, nt = target.size()
                    iLow, iHigh = 4, 12
                    Lx, Ly = 1.0, 1.0

                    # Compute error in Fourier space
                    err_phys = preds - target
                    err_fft = torch.fft.fftn(err_phys, dim=[2, 3])
                    err_fft_sq = torch.abs(err_fft)**2
                    err_fft_sq_h = err_fft_sq[Ellipsis, :nx//2, :ny//2, :]

                    # Create radial indices
                    x = torch.arange(nx//2)
                    y = torch.arange(ny//2)
                    X, Y = torch.meshgrid(x, y, indexing="ij")
                    radii = torch.sqrt(X**2 + Y**2).floor().to(torch.int) # Radial dist.
                    max_radius = int(torch.max(radii))

                    # flatten radii for binary mask
                    radii_flat = radii.flatten() # (nx//2 * ny//2)
                    
                    # Spatially flatten Fourier space error; (nb, nc, nx//2 * ny//2, nt)
                    err_fft_sq_flat = err_fft_sq_h.contiguous().reshape(nb, nc, -1, nt)
                    
                    # initialize output tensor to hold the Fourier error
                    # for each radial bin at distance r from the origin
                    err_F_vect_full = torch.zeros(nb, nc, max_radius + 1, nt)
                    
                    # Apply ‘index_add_‘ for all radii and accumulate the errors
                    valid_r = radii_flat <= max_radius # binary mask to find valid radii
                    
                    # Sum for all valid radial indices
                    err_F_vect_full.index_add_(2,
                                               radii_flat[valid_r],
                                               err_fft_sq_flat[:, :, valid_r]
                                              )

                    # Normalize & compute mean over batch; (nc, min(nx//2, ny//2), nt)
                    nrm = (nx * ny) * Lx * Ly
                    _err_F = torch.sqrt(torch.mean(err_F_vect_full, dim=0)) / nrm

                    # Classify Fourier space error into three bands
                    err_F = torch.zeros([nc, 3, nt])
                    err_F[:, 0] += torch.mean(_err_F[:, :iLow], dim=1) # low freqs
                    err_F[:, 1] += torch.mean(_err_F[:, iLow:iHigh], dim=1) # mid freqs
                    err_F[:, 2] += torch.mean(_err_F[:, iHigh:], dim=1) # high freqs

                    # mean or sum over the channels and time dimensions
                    if reduction == "mean":
                        freq_loss = torch.mean(err_F, dim=[0, -1])
                    elif reduction == "sum":
                        freq_loss = torch.sum(err_F, dim=[0, -1])

                    return freq_loss
            
        

🏹 Training Objective

The 1-step training loss for N trajectories, each comprising T timesteps, is given by,

θ=argminθn=1Nt=1T1C(Nθ(ut),ut+1),C=CMSE+λCfreq,0λ1

High-Frequency Adaptive Gaussian Noise

Let XhfRNb×Nx×Ny×(NtNc) denote the input tensor of HF features, where Nb is the batch size, Nx and Ny are the resolutions of the spatial dimensions, and NtNc represents the combined temporal and channel dimensions. The high-frequency feature adaptive Gaussian noise Ndynamic is then computed as follows:




            1.  Compute (per sample) Mean μb and Standard Deviation σb of High-Frequency Features 
               μb=1NxNyNtNci=1Nxj=1Nyk=1NtNcXb,i,j,khf
               σb=1NxNyNtNci=1Nxj=1Nyk=1NtNc(Xb,i,j,khfμ)2+ϵ
               where ϵ is a small constant added for numerical stability, and μ and σ are obtained by stacking the per sample statistics along the batch dimension. 


            2. Generate Standard Gaussian Noise
               NN(0,1)

            3. Scale Noise Dynamically
               Ndynamic=μ+ασN
              where α is a small value such as 0.025 and Ndynamic has the same shape as the input Xhf. 


            Ndynamic can now be added to the batch of inputs to the global and local branches during the training phase of LOGLO-FNO.
        

🌀 Visualizing the Radially Binned Spectral Loss

In the below slideshow, we visualize the radially binned spectral energy errors of predictions of the considered neural operators and LOGLO-FNO on the Kolmogorov Flow 2D PDE. Note that we show only alternate radial bins prioritizing uncluttered representation over completeness.

Radial Spectral Loss of Base FNO model predictions on the turbulent Kolmogorov Flow 2D


🌪 Quantitative Results on Benchmark PDE Problems

(1) Kolmogorov Flow 2D PDE   💥

We evaluate LOGLO-FNO on the challenging turbulent version of Kolmogorov Flow 2D benchmark. ut+uu1ReΔu=p+sin(ny)x^,u=0,on [0,2π]2×(0,) The setup comprises training with 1-step loss and evaluating 1-step and 5-step autoregressive rollout. The results are compared against a diverse set of competitive neural operator baselines such as Modern UNet, FNO, F-FNO, U-FNO, LSM, and NO-LIDK.

kolmogorov-flow-2d-1-step-5-step-ar

(2) Turbulent Radiative Layer 3D PDE   🔥

We evaluate LOGLO-FNO on the challenging setup of Turbulent Radiative Mixing Layer 3D benchmark. The mass, momentum, and energy conservation equations read as, ρt+(ρv)=0ρvt+(ρvv+P)=0Et+((E+P)v)=EtcoolE=P/(γ1),whereγ=53

turb-rad-mix-layer-2d-density turb-rad-mix-layer-2d-density

The setup comprises training with 1-step loss and evaluating 1-step results on a host of metrics. The results are compared against a diverse set of competitive neural operator baselines such as Modern UNet, ConvNext U-Net, and FNO.

turbulent-radiative-layer-3d-1-step

(3) Diffusion Reaction 2D PDE   🐝

We evaluate LOGLO-FNO on the challenging coupled PDE, Diffusion Reaction 2D, benchmark. ut=Duxxu+Duyyu+Ru(u,v),vt=Dvxxv+Dvyyv+Rv(u,v),on (1,1)2×(0,5]Ru(u,v)=uu3kv,Rv(u,v)=uv, The setup comprises training with full autoregression and evaluating full AR rollout results on a whole host of spectral, physics, and data-view based metrics. The results are compared against a diverse set of competitive neural operator baselines such as Modern UNet, F-FNO, U-FNO, LSM, FNO, and NO-LIDK.

diffusion-reaction-2d-autoregressive

Qualitative Results

We visualize the predictions of FNO and LOGLO-FNO on the time-dependent Turbulent Radiative Mixing Layer 3D PDE.

Turbulent Radiative Layer 3D (Density)

trl3d_well_64x64x128_density_t4_tcool0.03_numerical_simulator
Ground Truth
trl3d_well_64x64x128_density_t4_tcool0.03_loglo_fno
LOGLO-FNO
trl3d_well_64x64x128_density_t4_tcool0.03_fno
Base FNO

Turbulent Radiative Layer 3D (Velocity)

trl3d_well_64x64x128_velocity_z_t4_tcool0.03_numerical_simulator
Ground Truth
trl3d_well_64x64x128_velocity_z_t4_tcool0.03_loglo_fno
LOGLO-FNO
trl3d_well_64x64x128_velocity_z_t4_tcool0.03_fno
Base FNO

BibTeX Citation

@inproceedings{loglo-fno-kalimuthu:2025,
    title={{LOGLO}-{FNO}: Efficient Learning of Local and Global Features in Fourier Neural Operators},
    author={Marimuthu Kalimuthu and David Holzm{\"u}ller and Mathias Niepert},
    booktitle={ICLR 2025 Workshop on Machine Learning Multiscale Processes},
    year={2025},
    url={https://openreview.net/forum?id=OCM7OkVg9C}
}