Flow Matching Policy for Behavioral Cloning

Official implementation of Flow Matching Policy for Behavioral Cloning paper presented at IcETRAN 2026.

See the paper (coming soon) for more details.

Mihailo Radović¹, Filip Marčić¹.
Flow Matching Policy for Behavioral Cloning.
IcETRAN, 2026.

¹University of Belgrade, School of Electrical Engineering

Citation

Our paper has been accepted to IcETRAN 2026. The citation below is a preprint format; the official IEEE Xplore DOI and publication details will be added here once they are available.

If you find our work useful, please consider citing:

@inproceedings{radovic2026flow,
  title={Flow Matching Policy for Behavioral Cloning},
  author={Radovi{\'c}, Mihailo and Mar{\v{c}}i{\'c}, Filip},
  booktitle={Proceedings of the 13th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN)},
  year={2026},
  organization={IEEE},
  note={To appear}
}

Abstract

Behavioral cloning (BC) is a foundational imitation learning paradigm, but many standard continuous-control BC baselines rely on unimodal Gaussian policies or other relatively low-expressivity action parameterizations. Consequently, they struggle to capture the complex, multi-modal strategies present in diverse offline datasets, such as those containing human, medium- quality, or mixed trajectories, leading to a significant performance gap. To address this limitation, we introduce the Flow Matching Policy (FMP), a highly expressive representation for continuous control BC. Our approach models the conditional action distribution as a continuous-time normalizing flow, learning an observation-conditioned velocity field to transport a simple base noise distribution into the empirical action distribution. Evaluations against strong Gaussian and diffusion policy baselines across standard continuous control benchmarks demonstrate that the FMP consistently achieves competitive or superior performance. These results suggest that continuous-time flow models are a promising alternative for capturing highly complex and varied behaviors from noisy data.

Usage

Prerequisites

1. Install `uv`

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Install dependencies

uv sync

3. Download a dataset

minari download <DATASET_NAME>

To view all available datasets:

minari list remote

4. (Optional) Configure Weights & Biases

If you want to use Weights & Biases logging, generate a W&B API key from https://wandb.ai/ and create a .env file in the project root:

WANDB_API_KEY=<YOUR_API_KEY>

Training

Examples for HalfCheetah:

Train with Weights & Biases logging

uv run python train.py --config config/halfcheetah/flow_matching.yaml

Train without Weights & Biases

uv run python train.py --config config/halfcheetah/flow_matching.yaml --disable-wandb

Run a Weights & Biases sweep

wandb sweep config/halfcheetah/flow_matching.yaml
wandb agent <AGENT_NAME>

Evaluation

Example for HalfCheetah:

uv run python test.py \
    --config config/halfcheetah/flow_matching.yaml \
    --video-dir videos/halfcheetah \
    --num-episodes 5

Results: Average return in 100 episodes ± Std

Mujoco

Env	Dataset	Gaussian	DDIM	FMP (ours)
Half Cheetah	medium-v0	14871.47 ± 3046.86	15081.97 ± 1440.60	15499.46 ± 51.01
Hopper	medium-v0	3577.98 ± 29.51	3243.76 ± 656.69	3593.45 ± 31.59
Humanoid	medium-v0	7581.16 ± 1838.37	7775.93 ± 1533.97	8213.03 ± 36.86
Swimmer	medium-v0	274.17 ± 19.95	220.11 ± 3.67	227.50 ± 12.08
Walker2d	medium-v0	6235.86 ± 29.79	6150.82 ± 106.45	6204.74 ± 79.81
Ant	medium-v0	5769.32 ± 1602.93	5931.90 ± 1272.18	6027.36 ± 1050.83

D4RL

Env	Dataset	Gaussian	DDIM	FMP (ours)
Door	human-v2	158.00 ± 203.86	314.26 ± 323.47	279.99 ± 327.75
Pen	human-v2	2636.42 ± 4007.15	5143.41 ± 4304.77	5410.04 ± 4336.66
Kitchen	mixed-v2	603.38 ± 234.78	500.84 ± 194.93	740.16 ± 151.83
Ant Maze*	medium-play-v1	0.00 ± 0.00	17.65 ± 75.39	61.21 ± 134.58

* Note: The AntMaze results reported here are slightly higher than those in the official IcETRAN 2026 camera-ready paper. During post-submission evaluation, we found that reducing the number of ODE integration and denoising steps from 15 to 12 improved performance for both the Diffusion and Flow Matching policies.

Ablation study on Integration Steps

Humanoid-medium-v0

ODE steps	Avg Return ± Std	Latency (ms)
1	7031.33 ± 2263.08	0.11
2	7642.93 ± 1824.08	0.16
4	7957.36 ± 1214.06	0.29
8	7836.65 ± 1445.65	0.53
10	8172.81 ± 574.24	0.65
12	8213.03 ± 36.86	0.78
16	7871.77 ± 1317.44	1.03
20	7979.39 ± 1233.45	1.26
24	7973.92 ± 1186.77	1.50
36	8061.47 ± 967.44	2.21
50	8034.26 ± 1178.72	3.09

Kitchen-mixed-v2

ODE steps	Avg Return ± Std	Latency (ms)
1	17.35 ± 61.87	0.13
2	17.78 ± 78.61	0.20
4	594.98 ± 236.30	0.34
8	674.19 ± 161.76	0.62
10	611.45 ± 286.67	0.75
12	539.10 ± 295.77	0.91
16	740.16 ± 151.83	1.18
20	681.41 ± 213.52	1.45
24	707.02 ± 206.12	1.71
36	722.46 ± 172.70	2.58
50	730.37 ± 166.26	3.51

AntMaze Trajectory Comparison

Qualitative comparison of policy behaviors in the antmaze-medium-play-v1 environment.

Gaussian Policy

DDIM

FMP (Ours)

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assets		assets
config		config
scripts		scripts
videos		videos
weights		weights
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
bc.py		bc.py
model.py		model.py
nn_utils.py		nn_utils.py
policy.py		policy.py
pyproject.toml		pyproject.toml
test.py		test.py
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flow Matching Policy for Behavioral Cloning

Citation

Abstract

Usage

Prerequisites

1. Install `uv`

2. Install dependencies

3. Download a dataset

4. (Optional) Configure Weights & Biases

Training

Train with Weights & Biases logging

Train without Weights & Biases

Run a Weights & Biases sweep

Evaluation

Results: Average return in 100 episodes ± Std

Mujoco

D4RL

Ablation study on Integration Steps

Humanoid-medium-v0

Kitchen-mixed-v2

AntMaze Trajectory Comparison

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Flow Matching Policy for Behavioral Cloning

Citation

Abstract

Usage

Prerequisites

1. Install uv

2. Install dependencies

3. Download a dataset

4. (Optional) Configure Weights & Biases

Training

Train with Weights & Biases logging

Train without Weights & Biases

Run a Weights & Biases sweep

Evaluation

Results: Average return in 100 episodes ± Std

Mujoco

D4RL

Ablation study on Integration Steps

Humanoid-medium-v0

Kitchen-mixed-v2

AntMaze Trajectory Comparison

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Install `uv`

Packages