Generative Anchored Fields: Controlled Data Generation via Emergent Velocity Fields and Transport Algebra (GAF)
Deressa Wodajo, Hannes Mareen, Peter Lambert, Glenn Van Wallendael
This repository contains the implementation code for Generative Anchored Fields: Controlled Data Generation via Emergent Velocity Fields and Transport Algebra (GAF) paper. Find the full paper on arXiv .
GAF introduces algebraic compositionality to generative models. Instead of learning implicit class relationships, GAF provides explicit, mathematically guaranteed control over class composition through independent K-heads anchored to a shared origin J.
The GAF model consists of a Trunk and two twin networks, J (Noise Anchor) and K (Data Anchor).
Each K-head is an independent linear projection. Blending K-heads = blending endpoints in latent space.
The code in this repository enables training and testing of the GAF model for image generation.
* Python 3.10
* PyTorch
* CUDA >=11.8
* tqdm, imageio
* duffusers, timm, LPIPS
- Clone this repository:
git clone https://github.com/IDLabMedia/GAF.git- Install the required dependencies:
pip install -r requirements.txtTo train the GAF model, follow these steps:
-
Prepare the training dataset. CIFAR-10 is trained on pixel space, and CelebA-256, AFHQ-512, and ImageNet-256 are trained on precomputed latent dataset.
setprecompute=truein the config file to precompile your dataset (found in the configs folder. Each dataset has its own config file, edit the params as needed). -
Run the training script:
set precompute=false to train GAF on the precomputed dataset.
Example:
-
To train GAF from scratch:
python train.py --data imagenet --image_size 256
-
Train using original DiT weights (Retrunking):
You can load the original DiT model weights into the GAF trunk .
Download the weights from the DiT github page and specify the weight path in the config file.
set
retrunk: downloaded_dit_weight_file_pathNote: Depending on your batch size, setting iters between 20,000 and 100,000 is sufficient to generate high-quality images (we used 100,000 iterations).
python train.py --data imagenet --image_size 256 --retrunk
-
To resume training
python train.py --data imagenet --image_size 256 --mode resume
Image Generation with GAF
To generate images using the trained GAF model, follow these steps:
- Download the pretrained models from Huggingface and save it in the
weightsfolder.
| Dataset | Image Resolution | FID-50K | Model Weight |
|---|---|---|---|
| CIFAR-10 | 32x32 | 9.53 | GAF-CIFAR-B/2 |
| AFHQ | 512x512 | -- | GAF-AFHQ-XL/2 |
| CelebA-HQ | 256x256 | 7.27 | GAF-CelebA-XL/2 |
| ImageNet | 256x256 | 7.51 | GAF-ImageNet-XL/2 |
python sampler.py --data imagenet --classes 979or
python sampler.py --data imagenet --classes cliffor generate multiple pure classes
python sampler.py --data imagenet --classes 979 984 321
Example: v = 0.3 * v979 + 0.7 * v984
python sampler.py --data imagenet --classes 979 984 --weight 0.3 0.7Example: v = 0.5 * v979 + 0.3 * v984 + 0.2 * v321
python sampler.py --data imagenet --classes 979 984 321 --weight 0.5 0.3 0.2Use the fixed weights [0.5, 0.3, 0.2], but rotate which class is assigned each weight.
python sampler.py --data imagenet --classes 979 984 321 --weight 0.5 0.3 0.2 --permute2 class spatial composition using horizontal mask ( _2 reperesents the number of class). Top row region -> class 510, bottom row region -> class 984
python sampler.py --data imagenet --classes 510 984 --mask_tpye horizontal_23 class spatial composition. Top row region -> class 510, middle row region -> class 984, bottom row region -> class 979
python sampler.py --data imagenet --classes 510 984 979 --mask_tpye horizontalwith class permutuation
3 class spatial composition with mask region rotation. Top row region -> class 510, middle row region 984, bottom row region -> class 979
python sampler.py --data imagenet --classes 510 984 979 --mask_tpye horizontal --permute2 class spatial composition with user provided mask. Top row region -> class 510, bottom row region -> class 984
python sampler.py --data imagenet --classes 510 984 --mask_img masks/contrainer.png --seed 42Spatial regions with per-class weights
python sampler.py --data imagenet --classes 510 984 --mask_img masks/ship.png --weight 0.8 0.5K = alpha * K1 + (1-alpha) * K2
OR
v = alpha * v1 + (1-alpha) * v2
python sampler.py --data afhq --classes 1 2 --alpha 0.6 --permuteThe images shown are generated by the IER sampler
| Argument | Type | Default | Description |
|---|---|---|---|
--data |
str | imagenet | Dataset: imagenet, celeb, afhq, cifar |
--classes |
int[] | required | Class indices |
--weight |
float[] | None | Weights per class (should sum to 1) |
--alpha |
float | None | Scalar blend ratio (2 classes only) |
--mask_type |
str | None | Preset mask layout |
--mask_img |
str | None | Path to custom mask image |
--steps |
int | 20 | Integration steps |
--solver |
str | euler | ODE solver: endpoint, euler, heun, rk4 |
--seed |
int | 42 | Random seed |
--permute |
flag | False | Generate all class permutations |
--giffer |
flag | False | Save trajectory as GIF |
--skip |
flag | False | Skip pure class generation |
--h, --w |
int | 512 | Output dimensions |
--b |
int | 1 | Batch size |
--sigma |
float | 1.25 | Mask blur strength |
python sampler.py --list_maskshorizontal_2, vertical_2, radial_2, diagonal_2, horizontal, vertical, radial, diagonal, quadrant, horizontal_4, vertical_4, radial_4, custom
suffix _2 -> 2 classes, suffix _4 and quadrant -> 4 classes, no suffix -> 3 classes
custom - custom mask image from disk
Smooth cyclic interpolation through K-heads and a shared J-head, returning to start:
python metrics.py --data imagenet --classes 979 984 321 --mode interpolation --steps 250 --n_interp 10Tests algebraic closure: K1 -> J -> K2 -> J -> K3 -> J -> K1
python metrics.py --data imagenet --classes 979 984 321 --mode cycle --steps 250Generates grid of 3-way blends across simplex:
python metrics.py --data imagenet --classes 979 984 321 --mode barycentric --grid_size 7| Argument | Type | Default | Description |
|---|---|---|---|
--ckpt |
str | required | Path to checkpoint |
--outdir |
str | required | Output directory |
--classes |
int[] | required | Classes for cycle/interpolation |
--mode |
str | None | interpolation, barycentric, cycle |
--steps |
int | 20 | Integration steps (250+ for high fidelity) |
--n_interp |
int | 10 | Interpolation frames per transition |
--grid_size |
int | 7 | Barycentric grid size (grid_size²) |
--h, --w |
int | 512 | Output dimensions |
--data |
str | imagenet | Dataset: imagenet, afhq |
GAF's K-heads enable exact velocity composition:
Weighted blend
v = 0.5 * gaf.velocity(x, t, y=class_1) + 0.3 * gaf.velocity(x, t, y=class_2) + 0.2 * gaf.velocity(x, t, y=class_3)Spatial composition
v = mask_1 * gaf.velocity(x, t, y=class_1) + mask_2 * gaf.velocity(x, t, y=class_2)
Weighted Spatial Composition
v = w_1 * mask_1 * gaf.velocity(x, t, y=class_1) + w_2 * mask_2 * gaf.velocity(x, t, y=class_2)
The transport algebra guarantees:
K₁ -> J -> K₂ -> J -> K₃ -> J -> K₁ = Identity
python sampler.py --data imagenet --classes 979 984 321 --weight 0.5 0.3 0.2 --permute --seed 42Output: 6 rows showing same weights applied to different class orderings.
python sampler.py --data imagenet --classes 979 984 --weight 0.3 0.7 --gifferOutput: GIF showing noise -> image trajectory.
| Mode | Condition | Formula | Use Case |
|---|---|---|---|
| Pure | K=1 or no args | v = v_k |
Single class generation |
| Scalar Blend | K=2 + --alpha |
v = alpha*v1 + (1-alpha)*v2 |
Simple interpolation |
| Weighted | --weight |
v = Σ w_i*v_i |
Multi-class blend |
| Spatial | --mask_img |
v = Σ mask_i*v_i |
Region-based composition |
| Spatial+Weighted | both | v = Σ w_i*mask_i*v_i |
Full control |
We perform multi-class spatial editing by composing the velocity fields for each class.
Example: $v = v_{\mathrm{c}}|{E} + v{\mathrm{d}}|{I} + v{\mathrm{w}}|_{R}$
Where:
-
$v_{\mathrm{c}}|_{E}$ : Velocity towards the Cat (restricted to the Ear mask). -
$v_{\mathrm{d}}|_{I}$ : Velocity towards the Dog (restricted to the Eye mask). -
$v_{\mathrm{w}}|_{R}$ : Velocity towards the Wild image (restricted to the Rest of the image, i.e., $R=1 - (E \cup I)$).
Note: This composition is applied during the generation process (at the velocity level), not by blending the finished images.
The "Base Images" shown on the grid are provided only as a reference, showing the outcome when the velocity is directed exclusively towards the cat, dog, or wild class.
Generate a dog with an eye of a cat
(use --permute to generate a cat with an eye of a dog)
python sampler.py --data afhq --classes 0 1 --mask_type afhq --image_size 512 --regions eyeGenerate a wild with a mouth of a cat and an ear of a dog
python sampler.py --data afhq --classes 0 1 2 --mask_type afhq --image_size 512 --regions mouth earsSkip generating the base classes
python sampler.py --data afhq --classes 1 0 2 --mask_type afhq --image_size 512 --regions eyes ears --skipExplore all combinations with current region (nose, eyes)
python sampler.py --data afhq --classes 2 0 1 --mask_type afhq --image_size 512 --regions nose eyes --permute --gifferGenerate only specific regions. Currently implemented for AFHQ model, and accepts exactly three regions to work.
python sampler.py --data afhq --classes 2 0 1 --mask_type afhq --image_size 512 --regions eyes nose mouth --permute --gifferThis command generates wild eyes, a cat nose, and a dog mouth. The rest remains noise since the velocity field only flows toward masked regions.
$ \begin{aligned} \textbf{Example: } v &= v_{\mathrm{512}}|{mask1} + v{\mathrm{985}}|_{mask2} \ \end{aligned} $
where
-
$v_{\mathrm{512}}|_{mask1}$ : Velocity towards class 512 (restricted to mask1 region). -
$v_{\mathrm{985}}|_{mask2}$ : Velocity towards class 985 (restricted to mask2 region).
$ \begin{aligned} \textbf{Example: Editing with custom mask: } v &= v_{\mathrm{985}}|{mask1} + v{\mathrm{512}}|_{mask2} \end{aligned} $
@article{deressa2025generativeanchoredfieldscontrolled,
title={Generative Anchored Fields: Controlled Data Generation via Emergent Velocity Fields and Transport Algebra},
author={Deressa Wodajo Deressa and Hannes Mareen and Peter Lambert and Glenn Van Wallendael},
year={2025},
journal={arXiv preprint arXiv:2511.22693}
}This work was funded in part by the Research Foundation Flanders (FWO) under Grant G0A2523N, IDLab (Ghent University-imec), Flanders Innovation and Entrepreneurship (VLAIO), and the European Union.
MIT License







