Recursive Flow Matching

1University of California, San Diego 
2University of Michigan 

TL;DR: RecFM is a new framework that aligns trajectories across scales for accurate generation, achieving one- and two-step generation for physics datasets comparable to state-of-the-art multi-step methods.

Ground truth roll-out, channel 0

Ground Truth

RecFM prediction roll-out, channel 0

RecFM (1-step)

VideoPDE prediction roll-out, channel 0

VideoPDE

Channel 0

Ground truth roll-out, channel 1

Ground Truth

RecFM prediction roll-out, channel 1

RecFM (1-step)

VideoPDE prediction roll-out, channel 1

VideoPDE

Channel 1

RecFM tracks the Helmholtz Staircase dynamics in one step, staying close to the ground-truth evolution across channels.

Abstract

Generative models have emerged as a powerful paradigm for solving physics systems and modeling complex spatiotemporal dynamics. However, achieving high physical accuracy without incurring high computational cost remains a fundamental challenge, as existing approaches face a critical speed-fidelity trade-off. In this work, we introduce Recursive Flow Matching (RecFM), a generative framework for forecasting complex spatiotemporal dynamics. RecFM enforces self-consistency to align trajectories across discretization scales, reducing discretization errors and improving performance across metrics for physics-based tasks. To our knowledge, this is the first method to achieve high-fidelity one- and few-step (2-4 step) dynamic generation for scientific systems with performance comparable to state-of-the-art multi-step solvers. Across challenging scientific benchmarks, RecFM achieves up to a 20× speedup over leading diffusion-based emulators while improving predictive accuracy. Furthermore, RecFM reduces mean squared error by over 15% compared to vanilla flow matching, offering a scalable and efficient solution for real-time scientific emulation.


Scientific Forecasting Results

Aggregate radar plot comparing forecasting methods across normalized CRPS, MSE, SSR, and speed scores.

Four-axis summary of the dynamic forecasting tasks. Metrics are averaged across datasets and normalized between [0, 1]; closer to 1 is better.

RecFM achieves strong forecasting performance across diverse scientific benchmarks, including Sea Surface Temperature (SST) forecasting, Navier-Stokes fluid dynamics, and the Helmholtz Staircase equation. Compared to diffusion-based emulators such as VideoPDE (Li et al., 2025), RecFM substantially reduces rollout cost while improving predictive accuracy and physical consistency. In particular, RecFM achieves up to 20× faster rollout generation than VideoPDE and over 15% lower MSE than vanilla Flow Matching while operating in the one- and few-step generation regime.

Quantitative forecasting results for Sea Surface Temperature, Navier-Stokes Flow, and Helmholtz Staircase Equation. Lower values are better for MSE and CRPS, while the optimal SSR is 1. Best results in bold, second best underlined, third best in gray.
Method SST Navier-Stokes Helmholtz Staircase
CRPS MSE SSR Time [s] CRPS MSE SSR CRPS MSE SSR
Perturbation* 0.2810.1800.4110.4241 0.0900.0280.448 0.2180.1110.004
Dropout* 0.2670.1640.4060.4241 0.0780.0270.715 0.0990.0490.631
DDPM* 0.2460.1770.6740.3054 0.1800.1050.573 0.1560.1530.563
MCVD* 0.2160.1610.92679.167 0.1540.0700.524 0.1370.1280.867
DYffusion* 0.2240.1731.0334.6722 0.0670.0220.877 0.1440.1061.121
VideoPDE (Li et al., 2025) 0.2160.1620.74619.753 0.0330.00680.205 0.0265.6e-44.334
Vanilla FM 0.2600.2320.9141.5202 0.0360.00760.911 0.0306.5e-41.485
RecFM (1-step) 0.2170.1620.9840.4310 0.0310.00640.959 0.00344.2e-51.090
RecFM (2-step) 0.2160.1611.0040.7353 0.0320.00680.932 0.00272.7e-51.440

*Results for SST and Navier-Stokes are reproduced from DYffusion (Cachay et al., 2023).

Gallery

Navier-Stokes roll-out comparisons against FNO (Li et al., 2020) across velocity \(u\), velocity \(v\), and pressure \(p\):

Navier-Stokes ground truth roll-out, velocity u channel

Ground Truth

Navier-Stokes RecFM prediction roll-out, velocity u channel

RecFM (1-step)

Navier-Stokes FNO prediction roll-out, velocity u channel

FNO

Velocity \(u\)

Navier-Stokes ground truth roll-out, velocity v channel

Ground Truth

Navier-Stokes RecFM prediction roll-out, velocity v channel

RecFM (1-step)

Navier-Stokes FNO prediction roll-out, velocity v channel

FNO

Velocity \(v\)

Navier-Stokes ground truth roll-out, pressure p channel

Ground Truth

Navier-Stokes RecFM prediction roll-out, pressure p channel

RecFM (1-step)

Navier-Stokes FNO prediction roll-out, pressure p channel

FNO

Pressure \(p\)

Training Stability

We compare training convergence on the Navier-Stokes benchmark using cumulative function evaluations (NFE, i.e., number of forward passes) during optimization. RecFM converges faster than the diffusion-based baseline VideoPDE and consistently achieves lower validation error throughout training.

Image Generation Results

Shortcut Model

Shortcut denoising trajectory at t=1.00

SiT

SiT denoising trajectory at t=1.00

RecFM

RecFM denoising trajectory at t=1.00
t scheduler t = 1.00
Denoising trajectories of Shortcut Model, SiT, and RecFM (8 inference steps).

We report results on ImageNet-1k dataset. RecFM is competitive in image generation as a multi-step flow matching method, while requiring fewer training epochs and inference steps.

Comparison of generative models under different sampling regimes.
Model FID ↓ Sampling Steps Param Count Epochs Trained
DiT-XL (Peebles & Xie, 2023) 2.27500675M640
SiT-XL (Ma et al., 2024) 2.06250675M640
ADM-G (Dhariwal & Nichol, 2021) 4.59250426
LDM-4-G (Rombach et al., 2022) 3.6500400M106
Shortcut Model (XL) (Frans et al., 2024) 3.8128676M250
RecFM-XL 2.53128675M160
RecFM-XL 2.4916675M160
RecFM-XL 3.228675M160

Below we visualize some image generation results for RecFM:

Composite of selected image generation samples from RecFM-XL at 256×256 resolution.
Selected samples from our 256 × 256 resolution RecFM-XL model.
Gallery
Macaw, ImageNet class 88
Macaw (88)
Sulphur-crested cockatoo, ImageNet class 89
Sulphur-crested cockatoo (89)
Husky, ImageNet class 250
Husky (250)
Arctic wolf, ImageNet class 270
Arctic wolf (270)
Lion, ImageNet class 291
Lion (291)
Otter, ImageNet class 360
Otter (360)
Red panda, ImageNet class 387
Red panda (387)
Panda, ImageNet class 388
Panda (388)
Balloon, ImageNet class 417
Balloon (417)
Cliff drop-off, ImageNet class 972
Cliff drop-off (972)
Coral reef, ImageNet class 973
Coral reef (973)
Volcano, ImageNet class 980
Volcano (980)

Method Overview

Standard Flow Matching learns a single trajectory between the data \(x_0\) and noise \(x_1\) distributions. RecFM instead models a family of recursively scaled trajectories that intersect at shared spatial states \(x_t\), enabling cross-scale consistency training and stable few-step generation.


Physics Intuition

Illustration of a wall-bouncing pendulum and recursive attenuation across scales.

Recursive Attenuation

After each collision, the pendulum loses energy and follows a shorter trajectory with reduced velocity.

Trajectory Hierarchy

The recursive motion naturally produces a family of progressively scaled trajectories that pass through the same point.

Connection to RecFM

RecFM enforces consistency by matching velocity predictions from recursively scaled trajectories at the same spatial point.

Details

Recursive Flow Matching Formulation

Given a data sample \(x_0 \sim p_0\) and a noise sample \(x_1 \sim p_1\), RecFM defines the standard linear interpolation

\[x_t = (1-t)x_0 + tx_1, \qquad \bm{v}^* = x_1 - x_0.\]

RecFM recursively constructs a family of trajectories across different discretization scales. For recursion depth \(D\), trajectories are parameterized by aligned time-scale pairs \(\{(\tau^{(i)}, \alpha^{(i)})\}_{i=1}^{D}\), where

\[\tau^{(i)} = t/\alpha^{(i)}, \qquad \alpha^{(1)} = 1.\]

Under this alignment, all trajectories pass through the same spatial state \(x_t\), leading to the recursive velocity relation

\[\hat{v}^{(i+1)} = \alpha \hat{v}^{(i)}.\]

RecFM trains a shared velocity network \(v_\theta(x,\tau,\alpha)\) using multi-scale trajectory supervision and cross-scale consistency constraints inspired by the wall-bouncing pendulum dynamics.

Algorithm 1 Recursive Trajectory Training with Consistency Alignment
1
Require Data distribution \(p_0\), Noise distribution \(p_1\)
2
Require Velocity network \(v_\theta(x,t,\alpha)\), recursion depth \(D\)
3
Require Consistency weight \(\lambda\), total training iterations \(N\)
4
for iteration \(n = 1\) to \(N\) do
5
Sample \(x_0 \sim p_0\) and \(x_1 \sim p_1\) ▷ Data and noise samples
6
Sample \(t \sim \mathcal{U}(0,1)\) and \(\alpha \sim \mathcal{U}(t,1)\) ▷ Primary trajectory time and base recursion scale
7
\(\bm{v}^* \gets x_1 - x_0\) ▷ Ground-truth primary velocity
8
\(x_t \gets (1-t)x_0 + t x_1\) ▷ Shared spatial point
9
for \(i = 1\) to \(D\) do
10
\(\alpha^{(i)} \gets \alpha^{i-1}\) ▷ Recursive trajectory scale
11
\(\tau^{(i)} \gets t / \alpha^{(i)}\) ▷ Aligned trajectory time
12
\(\hat{v}^{(i)} \gets v_\theta(x_t, \tau^{(i)}, \alpha^{(i)})\) ▷ Predicted trajectory velocity
13
\(\mathcal{L}_{\text{traj}}^{(i)} \gets \|\hat{v}^{(i)} - \alpha^{(i)} \bm{v}^*\|_2^2\) ▷ Trajectory supervision
14
end for
15
for \(i = 2\) to \(D\) do
16
\(\mathcal{L}_{\text{cons}}^{(i)} \gets \|\hat{v}^{(i)} - \alpha^{(i)} \hat{v}^{(1)}\|_2^2\) ▷ Cross-scale consistency
17
end for
18
\(\mathcal{L}_{\text{total}} \gets \sum_{i=1}^{D} \mathcal{L}_{\text{traj}}^{(i)} + \lambda \sum_{i=2}^{D} \mathcal{L}_{\text{cons}}^{(i)}\)
19
Update \(\theta\) using \(\nabla_\theta \mathcal{L}_{\text{total}}\)
20
end for

BibTeX


@misc{huang2026recursiveflowmatching,
    title={Recursive Flow Matching}, 
    author={Jiahe Huang and Sihan Xu and Sharvaree Vadgama and Rose Yu},
    year={2026},
    eprint={2605.26535},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2605.26535}, 
}
            

References

  1. Li, E., Wang, Z., Huang, J., & Park, J. J. (2025). VideoPDE: Unified generative PDE solving via video inpainting diffusion models. arXiv preprint arXiv:2506.13754.
  2. Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., & Anandkumar, A. (2020). Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895.
  3. Frans, K., Hafner, D., Levine, S., & Abbeel, P. (2024). One step diffusion via shortcut models. arXiv preprint arXiv:2410.12557.
  4. Rühling Cachay, S., Zhao, B., Joren, H., & Yu, R. (2023). Dyffusion: A dynamics-informed diffusion model for spatiotemporal forecasting. Advances in Neural Information Processing Systems, 36, 45259-45287.
  5. Peebles, W., & Xie, S. (2023). Scalable diffusion models with transformers. Proceedings of the IEEE/CVF International Conference on Computer Vision, 4195-4205.
  6. Dhariwal, P., & Nichol, A. (2021). Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems, 34, 8780-8794.
  7. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10684-10695.
  8. Ma, N., Goldstein, M., Albergo, M. S., Boffi, N. M., Vanden-Eijnden, E., & Xie, S. (2024). SiT: Exploring flow and diffusion-based generative models with scalable interpolant transformers. European Conference on Computer Vision, 23-40.