FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers

¹SKLCCSE, School of Computer Science and Engineering, Beihang University, China; ²School of Software, Beihang University, China; ³SKLMIP, School of Computer Science, Peking University, China; ⁴School of Reliability and Systems Engineering, Beihang University, China

^✉Corresponding Author

{tianyuc, haoyi, lijx}@buaa.edu.cn

Abstract

Fourier Neural Operators (FNO) have emerged as promising solutions for efficiently solving partial differential equations (PDEs) by learning infinite-dimensional function mappings through frequency domain transformations. However, the sparsity of high-frequency signals limits computational efficiency for high-dimensional inputs, and fixed-pattern truncation often causes high-frequency signal loss, reducing performance in scenarios such as high-resolution inputs or long-term predictions. To address these challenges, we propose FreqMoE, an efficient and progressive training framework that exploits the dependency of high-frequency signals on low-frequency components. The model first learns low-frequency weights and then applies a sparse upward-cycling strategy to construct a mixture of experts (MoE) in the frequency domain, effectively extending the learned weights to high-frequency regions. Experiments on both regular and irregular grid PDEs demonstrate that FreqMoE achieves up to 16.6% accuracy improvement while using merely 2.1% parameters (47.32x reduction) compared to dense FNO. Furthermore, the approach demonstrates remarkable stability in long-term predictions and generalizes seamlessly to various FNO variants and grid structures, establishing a new "Low frequency Pretraining, High frequency Fine-tuning" paradigm for solving PDEs.

Motivation

Though the popularity and usefulness of FNO, there are still several drawbacks of it. Fixed high-frequency truncation in FNO limits computational efficiency and degrades accuracy in high-resolution or long-term prediction tasks. Post-training refinement approaches (e.g., Pde-refiner) incur heavy computational costs and lack generalizability. Leveraging frequency dependencies in physics--high-frequency signals often depend on low-frequency components, we propose FreqMoe, a novel framework that learns low-frequency weights and then applies a upward-cycling strategy to construct a mixture of experts (MoE) in the frequency domain for efficient high-frequency processing.

Method Overview

Methods overview of FreqMoE. (a) The standard Fourier Neural Operator (FNO) architecture consisting of input lifting (P), a sequence of Fourier layers, and output projection (Q). (b) Our modified Fourier layer design with a mixture-of-experts mechanism, where the gating networker dynamically assigns frequency components to specialized experts after FFT decomposition. High-frequency components (lighter shades) are processed by high-frequency experts, while low-frequency components are handled by the base expert. (c) Our expert initialization strategy, where pre-trained weights $R$ are used as a shared base component $R_{base}$ and expert-specific delta weights $\Delta R$ are initialized with LoRA trick, enabling efficient parameter sharing and specialized frequency processing.

Highlights

We propose FreqMoE, a lightweight post-training framework that dynamically enhances high-frequency processing capabilities in neural PDE solvers. Our approach generalizes seamlessly across the FNO family on both structured and unstructured grids, establishing an efficient "low-frequency pretraining, high-frequency fine-tuning" paradigm.
Inspired by physical principles of frequency dependencies in PDEs, we develop a LoRA-based expert initialization scheme that efficiently reuses low-frequency weights. This design achieves remarkable parameter efficiency (47.32$\times$ reduction) while maintaining competitive performance through sparse dynamic computation.
Through comprehensive evaluation on diverse PDE systems, we demonstrate that FreqMoE significantly outperforms conventional FNO variants, achieving up to 16.6% accuracy improvement in high-resolution tasks (512×512) and superior stability in long-term predictions, all while maintaining minimal computational overhead.

Results

Tab.1 Performance on Regular-Grid PDEs. Comparison of models with varying frequency modes, where # Params indicates the number of parameters activated during inference. Underlined values represent the best performance achieved by FNO baselines. Results with blue background show our FreqMoE, where superscript ∗ and † denote models trained from scratch and upcycled from dense FNO, respectively. The bold values highlight our best performance.

Tab.2 Performance on Irregular-Grid PDEs. Comparison of models on two representative irregular-grid tasks: AirFoil and Elasticity, where # Params indicates the number of parameters activated during inference. Underlined values represent the best performance achieved by Geo-FNO baselines. Results with blue background show our FreqMoE approach, where superscript ∗ and † denote models trained from scratch and upcycled from dense Geo-FNO, respectively. The bold values highlight our best performance.

Fig.1 Visualization of prediction errors. Left Column: Irregular Grid Results from AirFoil. Right Column: Regular Grid Results from CFD-Turb 512. Red circles highlight regions with high-frequency components, where our FreqMoE demonstrates better capability in capturing fine-grained spatial details compared to FNO.