RecFM

Abstract

Generative models have emerged as a powerful paradigm for solving physics systems and modeling complex spatiotemporal dynamics. However, achieving high physical accuracy without incurring high computational cost remains a fundamental challenge, as existing approaches face a critical speed-fidelity trade-off. In this work, we introduce Recursive Flow Matching (RecFM), a generative framework for forecasting complex spatiotemporal dynamics. RecFM enforces self-consistency to align trajectories across discretization scales, reducing discretization errors and improving performance across metrics for physics-based tasks. To our knowledge, this is the first method to achieve high-fidelity one- and few-step (2-4 step) dynamic generation for scientific systems with performance comparable to state-of-the-art multi-step solvers. Across challenging scientific benchmarks, RecFM achieves up to a 20× speedup over leading diffusion-based emulators while improving predictive accuracy. Furthermore, RecFM reduces mean squared error by over 15% compared to vanilla flow matching, offering a scalable and efficient solution for real-time scientific emulation.

Scientific Forecasting Results

Aggregate radar plot comparing forecasting methods across normalized CRPS, MSE, SSR, and speed scores.

Four-axis summary of the dynamic forecasting tasks. Metrics are averaged across datasets and normalized between [0, 1]; closer to 1 is better.

RecFM achieves strong forecasting performance across diverse scientific benchmarks, including Sea Surface Temperature (SST) forecasting, Navier-Stokes fluid dynamics, and the Helmholtz Staircase equation. Compared to diffusion-based emulators such as VideoPDE (Li et al., 2025), RecFM substantially reduces rollout cost while improving predictive accuracy and physical consistency. In particular, RecFM achieves up to 20× faster rollout generation than VideoPDE and over 15% lower MSE than vanilla Flow Matching while operating in the one- and few-step generation regime.

Quantitative forecasting results for Sea Surface Temperature, Navier-Stokes Flow, and Helmholtz Staircase Equation. Lower values are better for MSE and CRPS, while the optimal SSR is 1. Best results in bold, second best underlined, third best in gray.

Method	SST				Navier-Stokes			Helmholtz Staircase
Method	CRPS	MSE	SSR	Time [s]	CRPS	MSE	SSR	CRPS	MSE	SSR
Perturbation^*	0.281	0.180	0.411	0.4241	0.090	0.028	0.448	0.218	0.111	0.004
Dropout^*	0.267	0.164	0.406	0.4241	0.078	0.027	0.715	0.099	0.049	0.631
DDPM^*	0.246	0.177	0.674	0.3054	0.180	0.105	0.573	0.156	0.153	0.563
MCVD^*	0.216	0.161	0.926	79.167	0.154	0.070	0.524	0.137	0.128	0.867
DYffusion^*	0.224	0.173	1.033	4.6722	0.067	0.022	0.877	0.144	0.106	1.121
VideoPDE (Li et al., 2025)	0.216	0.162	0.746	19.753	0.033	0.0068	0.205	0.026	5.6e-4	4.334
Vanilla FM	0.260	0.232	0.914	1.5202	0.036	0.0076	0.911	0.030	6.5e-4	1.485
RecFM (1-step)	0.217	0.162	0.984	0.4310	0.031	0.0064	0.959	0.0034	4.2e-5	1.090
RecFM (2-step)	0.216	0.161	1.004	0.7353	0.032	0.0068	0.932	0.0027	2.7e-5	1.440

^*Results for SST and Navier-Stokes are reproduced from DYffusion (Cachay et al., 2023).

Gallery

Navier-Stokes roll-out comparisons against FNO (Li et al., 2020) across velocity \(u\), velocity \(v\), and pressure \(p\):

Ground Truth

Navier-Stokes RecFM prediction roll-out, velocity u channel

RecFM (1-step)

FNO

Velocity \(u\)

Ground Truth

Navier-Stokes RecFM prediction roll-out, velocity v channel

RecFM (1-step)

FNO

Velocity \(v\)

Ground Truth

Navier-Stokes RecFM prediction roll-out, pressure p channel

RecFM (1-step)

FNO

Pressure \(p\)

Training Stability

We compare training convergence on the Navier-Stokes benchmark using cumulative function evaluations (NFE, i.e., number of forward passes) during optimization. RecFM converges faster than the diffusion-based baseline VideoPDE and consistently achieves lower validation error throughout training.

Image Generation Results

Shortcut denoising trajectory at t=1.00 — Denoising trajectories of Shortcut Model, SiT, and RecFM (8 inference steps).

SiT denoising trajectory at t=1.00 — Denoising trajectories of Shortcut Model, SiT, and RecFM (8 inference steps).

We report results on ImageNet-1k dataset. RecFM is competitive in image generation as a multi-step flow matching method, while requiring fewer training epochs and inference steps.

Comparison of generative models under different sampling regimes.

Model	FID ↓	Sampling Steps	Param Count	Epochs Trained
DiT-XL (Peebles & Xie, 2023)	2.27	500	675M	640
SiT-XL (Ma et al., 2024)	2.06	250	675M	640
ADM-G (Dhariwal & Nichol, 2021)	4.59	250	—	426
LDM-4-G (Rombach et al., 2022)	3.6	500	400M	106
Shortcut Model (XL) (Frans et al., 2024)	3.8	128	676M	250
RecFM-XL	2.53	128	675M	160
RecFM-XL	2.49	16	675M	160
RecFM-XL	3.22	8	675M	160

Below we visualize some image generation results for RecFM:

Composite of selected image generation samples from RecFM-XL at 256×256 resolution. — Selected samples from our 256 × 256 resolution RecFM-XL model.

Gallery

Sulphur-crested cockatoo, ImageNet class 89 — Sulphur-crested cockatoo (89)

Arctic wolf, ImageNet class 270 — Arctic wolf (270)

Red panda, ImageNet class 387 — Red panda (387)

Balloon, ImageNet class 417 — Balloon (417)

Cliff drop-off, ImageNet class 972 — Cliff drop-off (972)

Coral reef, ImageNet class 973 — Coral reef (973)

Volcano, ImageNet class 980 — Volcano (980)

Method Overview

Standard Flow Matching learns a single trajectory between the data \(x_0\) and noise \(x_1\) distributions. RecFM instead models a family of recursively scaled trajectories that intersect at shared spatial states \(x_t\), enabling cross-scale consistency training and stable few-step generation.

Physics Intuition

Recursive Attenuation

After each collision, the pendulum loses energy and follows a shorter trajectory with reduced velocity.

Trajectory Hierarchy

The recursive motion naturally produces a family of progressively scaled trajectories that pass through the same point.

Connection to RecFM

RecFM enforces consistency by matching velocity predictions from recursively scaled trajectories at the same spatial point.

Details

Recursive Flow Matching Formulation

Given a data sample \(x_0 \sim p_0\) and a noise sample \(x_1 \sim p_1\), RecFM defines the standard linear interpolation

\[x_t = (1-t)x_0 + tx_1, \qquad \bm{v}^* = x_1 - x_0.\]

RecFM recursively constructs a family of trajectories across different discretization scales. For recursion depth \(D\), trajectories are parameterized by aligned time-scale pairs \(\{(\tau^{(i)}, \alpha^{(i)})\}_{i=1}^{D}\), where

\[\tau^{(i)} = t/\alpha^{(i)}, \qquad \alpha^{(1)} = 1.\]

Under this alignment, all trajectories pass through the same spatial state \(x_t\), leading to the recursive velocity relation

\[\hat{v}^{(i+1)} = \alpha \hat{v}^{(i)}.\]

RecFM trains a shared velocity network \(v_\theta(x,\tau,\alpha)\) using multi-scale trajectory supervision and cross-scale consistency constraints inspired by the wall-bouncing pendulum dynamics.

Algorithm 1 Recursive Trajectory Training with Consistency Alignment

1	Require Data distribution \(p_0\), Noise distribution \(p_1\)
2	Require Velocity network \(v_\theta(x,t,\alpha)\), recursion depth \(D\)
3	Require Consistency weight \(\lambda\), total training iterations \(N\)
4	for iteration \(n = 1\) to \(N\) do
5	Sample \(x_0 \sim p_0\) and \(x_1 \sim p_1\) ▷ Data and noise samples
6	Sample \(t \sim \mathcal{U}(0,1)\) and \(\alpha \sim \mathcal{U}(t,1)\) ▷ Primary trajectory time and base recursion scale
7	\(\bm{v}^* \gets x_1 - x_0\) ▷ Ground-truth primary velocity
8	\(x_t \gets (1-t)x_0 + t x_1\) ▷ Shared spatial point
9	for \(i = 1\) to \(D\) do
10	\(\alpha^{(i)} \gets \alpha^{i-1}\) ▷ Recursive trajectory scale
11	\(\tau^{(i)} \gets t / \alpha^{(i)}\) ▷ Aligned trajectory time
12	\(\hat{v}^{(i)} \gets v_\theta(x_t, \tau^{(i)}, \alpha^{(i)})\) ▷ Predicted trajectory velocity
13	\(\mathcal{L}_{\text{traj}}^{(i)} \gets \\|\hat{v}^{(i)} - \alpha^{(i)} \bm{v}^*\\|_2^2\) ▷ Trajectory supervision
14	end for
15	for \(i = 2\) to \(D\) do
16	\(\mathcal{L}_{\text{cons}}^{(i)} \gets \\|\hat{v}^{(i)} - \alpha^{(i)} \hat{v}^{(1)}\\|_2^2\) ▷ Cross-scale consistency
17	end for
18	\(\mathcal{L}_{\text{total}} \gets \sum_{i=1}^{D} \mathcal{L}_{\text{traj}}^{(i)} + \lambda \sum_{i=2}^{D} \mathcal{L}_{\text{cons}}^{(i)}\)
19	Update \(\theta\) using \(\nabla_\theta \mathcal{L}_{\text{total}}\)
20	end for

BibTeX

@misc{huang2026recursiveflowmatching, title={Recursive Flow Matching}, author={Jiahe Huang and Sihan Xu and Sharvaree Vadgama and Rose Yu}, year={2026}, eprint={2605.26535}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2605.26535}, }

References

Li, E., Wang, Z., Huang, J., & Park, J. J. (2025). VideoPDE: Unified generative PDE solving via video inpainting diffusion models. arXiv preprint arXiv:2506.13754.

Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., & Anandkumar, A. (2020). Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895.

Frans, K., Hafner, D., Levine, S., & Abbeel, P. (2024). One step diffusion via shortcut models. arXiv preprint arXiv:2410.12557.

Rühling Cachay, S., Zhao, B., Joren, H., & Yu, R. (2023). Dyffusion: A dynamics-informed diffusion model for spatiotemporal forecasting. Advances in Neural Information Processing Systems, 36, 45259-45287.

Peebles, W., & Xie, S. (2023). Scalable diffusion models with transformers. Proceedings of the IEEE/CVF International Conference on Computer Vision, 4195-4205.

Dhariwal, P., & Nichol, A. (2021). Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems, 34, 8780-8794.

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10684-10695.

Ma, N., Goldstein, M., Albergo, M. S., Boffi, N. M., Vanden-Eijnden, E., & Xie, S. (2024). SiT: Exploring flow and diffusion-based generative models with scalable interpolant transformers. European Conference on Computer Vision, 23-40.

Recursive Flow Matching

Channel 0

Channel 1

Abstract

Scientific Forecasting Results

Velocity \(u\)

Velocity \(v\)

Pressure \(p\)

Training Stability

Image Generation Results

Shortcut Model

SiT

RecFM

Method Overview

Physics Intuition

Recursive Attenuation

Trajectory Hierarchy

Connection to RecFM

Recursive Flow Matching Formulation

BibTeX

References