[Feat] Add Ovis-Image-7B text-to-image pipeline by HenryDzy · Pull Request #1117 · hao-ai-lab/FastVideo

HenryDzy · 2026-02-20T22:40:07Z

Adds native FastVideo support for Ovis-Image-7B

New files

Models & configs

fastvideo/models/dits/ovisimage.py — Native OvisImageTransformer2DModel:
6 double blocks + 27 single blocks, SwiGLU activations, RoPE, DistributedAttention
fastvideo/models/encoders/qwen3.py — Qwen3Model text encoder
(wraps Ovis2.5-2B for conditioning)
fastvideo/configs/pipelines/ovis_image.py — OvisImageT2IConfig
(flow_shift=3.0, embedded_cfg_scale=5.0, Qwen3 pre/postprocess hooks)
fastvideo/pipelines/basic/ovis_image/ — OvisImagePipeline

Pipeline

fastvideo/pipelines/basic/ovis_image/__init__.py
fastvideo/pipelines/basic/ovis_image/ovis_image_pipeline.py
fastvideo/training/ovis_image_training_pipeline.py

Tests

fastvideo/tests/transformers/test_ovisimage.py — transformer forward pass
fastvideo/tests/encoders/test_qwen3_encoder.py — HF vs FastVideo Qwen3 parity
fastvideo/tests/ssim/test_ovis_image_similarity.py — MS-SSIM regression test
tests/local_tests/pipelines/test_ovis_image_pipeline_smoke.py — end-to-end VideoGenerator smoke test

Example

examples/inference/basic/basic_ovis_image.py — runnable example

Files modified

fastvideo/registry.py — registered AIDC-AI/Ovis-Image-7B
fastvideo/configs/models/dits/__init__.py — exported OvisImageTransformer2DModelConfig
fastvideo/configs/models/encoders/__init__.py — exported Qwen3Config
fastvideo/configs/models/vaes/base.py — added load_encoder/load_decoder fields
fastvideo/models/registry.py — registered OvisImageTransformer2DModel, Qwen3Model
fastvideo/pipelines/pipeline_registry.py — registered OvisImagePipeline
fastvideo/pipelines/stages/denoising.py — except (ImportError, RuntimeError) for Triton guards
fastvideo/pipelines/stages/causal_denoising.py — same fix
fastvideo/pipelines/stages/matrixgame_denoising.py — same fix
fastvideo/training/__init__.py — exported OvisImageTrainingPipeline
docs/inference/support_matrix.md — added Ovis-Image-7B row

gemini-code-assist · 2026-02-20T22:40:44Z

Summary of Changes

Hello @HenryDzy, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the FastVideo framework by integrating the Ovis-Image-7B text-to-image model. It introduces new model architectures for the diffusion transformer and text encoder, along with their respective configurations and pipeline implementations. The changes enable users to perform high-quality text-to-image generation and fine-tune the Ovis-Image model within the FastVideo ecosystem. Additionally, the PR includes important refactorings to the model and pipeline registration systems, improving modularity and maintainability, and adds comprehensive test coverage to ensure the stability and correctness of the new features.

Highlights

Ovis-Image-7B Integration: Added comprehensive native support for the Ovis-Image-7B text-to-image pipeline, including its custom 2D diffusion transformer, Qwen3 text encoder, and associated configurations.
New Model Implementations: Introduced native FastVideo implementations for OvisImageTransformer2DModel (a FLUX-like MM-DiT with double and single stream blocks, 3D RoPE, and DistributedAttention) and Qwen3Model (a text encoder featuring GQA attention with QK-Norm and Tensor Parallelism).
Pipeline and Training Support: Implemented a dedicated OvisImagePipeline for text-to-image generation and an OvisImageTrainingPipeline for fine-tuning, both leveraging the new model components and a FlowMatchEulerDiscreteScheduler.
Refactored Model and Pipeline Registries: Refactored the model and pipeline registries to improve organization and support architecture-based grouping, making it easier to manage different model types and their configurations.
Robustness and Testing: Enhanced robustness in denoising stages by updating error handling for attention backend imports and added extensive testing, including encoder parity, transformer forward pass, SSIM regression, and end-to-end smoke tests for the Ovis-Image pipeline.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docs/inference/support_matrix.md
- Added a new row for Ovis-Image 7B to the inference support matrix.
examples/inference/basic/basic_ovis_image.py
- Added a new runnable example demonstrating Ovis-Image text-to-image generation with various text rendering prompts.
fastvideo/configs/models/dits/init.py
- Imported and exported OvisImageTransformer2DModelConfig.
- Removed HunyuanGameCraftConfig import and export.
fastvideo/configs/models/dits/ovisimage.py
- Added a new configuration file for OvisImageTransformer2DModel, defining its architecture and FSDP/compile sharding conditions.
fastvideo/configs/models/encoders/init.py
- Imported and exported Qwen3ArchConfig and Qwen3Config.
fastvideo/configs/models/encoders/qwen3.py
- Added a new configuration file for the Qwen3 text encoder, including its architecture, tokenizer kwargs, and stacked parameter mapping for weight loading.
fastvideo/configs/models/vaes/base.py
- Added several new fields to VAEArchConfig to align with diffusers.AutoencoderKL for more comprehensive VAE configuration.
fastvideo/configs/ovis_image_7b_t2i_pipeline.json
- Added a new JSON configuration file for the Ovis-Image 7B text-to-image pipeline, specifying parameters like embedded_cfg_scale, flow_shift, and component precisions.
fastvideo/configs/pipelines/ovis_image.py
- Added a new pipeline configuration for Ovis-Image T2I, defining its DiT and text encoder configurations, and custom text pre/post-processing functions.
fastvideo/models/dits/ovisimage.py
- Added a native FastVideo implementation of OvisImageTransformer2DModel, featuring double and single stream blocks, FLUX-style 3D RoPE, and DistributedAttention.
fastvideo/models/encoders/qwen3.py
- Added a native FastVideo implementation of Qwen3Model, including RoPE, SwiGLU MLP, GQA attention with QK-Norm, and Tensor Parallelism support.
fastvideo/models/registry.py
- Removed the ast import.
- Removed HunyuanGameCraftTransformer3DModel from _TEXT_TO_VIDEO_DIT_MODELS.
- Added OvisImageTransformer2DModel to the text-to-image models list.
- Updated _TEXT_ENCODER_MODELS to include Qwen3Model and removed CLIPTextModelWithProjection.
- Refactored _VAE_MODELS to include AutoencoderKL and removed AutoencoderKLCausal3D.
- Simplified the model discovery and registration logic by removing _discover_and_register_models and _LEGACY_FAST_VIDEO_MODELS.
fastvideo/pipelines/basic/ovis_image/init.py
- Added an __init__.py file to export OvisImagePipeline.
fastvideo/pipelines/basic/ovis_image/ovis_image_pipeline.py
- Added a new OvisImagePipeline implementation, defining the stages for Ovis-Image text-to-image generation.
fastvideo/pipelines/pipeline_registry.py
- Added OvisImagePipeline to the _PIPELINE_NAME_TO_ARCHITECTURE_NAME mapping.
- Refactored _PipelineRegistry to support a three-level hierarchy (pipeline_type -> architecture -> pipeline_name) for better organization.
- Updated pipeline loading logic to use the new architecture-based grouping.
fastvideo/pipelines/stages/causal_denoising.py
- Updated try-except blocks for SlidingTileAttentionBackend and VideoSparseAttentionBackend to catch RuntimeError in addition to ImportError.
- Changed global_end_index and local_end_index in KV cache initialization to torch.tensor objects instead of integers.
fastvideo/pipelines/stages/denoising.py
- Updated try-except blocks for attention backend imports to catch RuntimeError.
- Removed c2ws_plucker_emb and camera_states from extra function arguments passed to the transformer.
- Removed timestep_r logic and related use_meanflow checks.
- Simplified Cosmos2.5 denoising stage logic, removing conditional frame handling and explicit is_conditioned checks.
fastvideo/pipelines/stages/matrixgame_denoising.py
- Updated try-except blocks for attention backend imports to catch RuntimeError.
- Changed global_end_index and local_end_index in KV cache initialization to torch.tensor objects.
fastvideo/registry.py
- Registered OvisImageT2IConfig with its corresponding HuggingFace model path and model detectors.
fastvideo/tests/encoders/test_qwen3_encoder.py
- Added a new parity test to compare FastVideo's Qwen3Model against the HuggingFace baseline, including weight spot-checks and forward-pass output comparison.
fastvideo/tests/ssim/test_ovis_image_similarity.py
- Added a new SSIM regression test for the Ovis-Image-7B text-to-image pipeline, generating an image and comparing it against a reference.
fastvideo/tests/transformers/test_ovisimage.py
- Added a new distributed forward-pass test for OvisImageTransformer2DModel, verifying output shape, finiteness, and numerical consistency.
fastvideo/training/init.py
- Exported OvisImageTrainingPipeline.
- Removed LTX2TrainingPipeline from exports.
fastvideo/training/ovis_image_training_pipeline.py
- Added a new OvisImageTrainingPipeline for fine-tuning the Ovis-Image model, including scheduler initialization and validation pipeline setup.
tests/local_tests/pipelines/test_ovis_image_pipeline_smoke.py
- Added a new end-to-end smoke test for the Ovis-Image pipeline, verifying output tensor shape, finiteness, and file saving.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is a great pull request that adds comprehensive support for the Ovis-Image-7B text-to-image model. The changes include native model implementations, configurations, a new pipeline, and a full suite of tests, which is excellent. The refactoring in the model and pipeline registries also helps to simplify the codebase. I've identified a few critical issues related to type correctness in configurations and hardcoded paths in tests and examples that need to be addressed. Once these are resolved, this will be a very solid contribution.

fastvideo/configs/models/encoders/qwen3.py

gemini-code-assist · 2026-02-20T22:42:35Z

fastvideo/pipelines/pipeline_registry.py

 def import_pipeline_classes(
    pipeline_types: list[PipelineType] | PipelineType | None = None
-) -> dict[str, dict[str, type[ComposedPipelineBase] | None]]:
-    pipeline_types_key: tuple[PipelineType, ...] | PipelineType | None
-    if isinstance(pipeline_types, list):
-        pipeline_types_key = tuple(pipeline_types)
-    else:
-        pipeline_types_key = pipeline_types
-    return _import_pipeline_classes_cached(pipeline_types_key)
-
-
-@lru_cache
-def _import_pipeline_classes_cached(
-    pipeline_types: tuple[PipelineType, ...] | PipelineType | None = None
-) -> dict[str, dict[str, type[ComposedPipelineBase] | None]]:
+) -> dict[str, dict[str, dict[str, type[ComposedPipelineBase] | None]]]:


The @lru_cache decorator requires all arguments to be hashable. The pipeline_types argument is typed as a list, which is not hashable and will raise a TypeError at runtime if a list is passed. To fix this, the function signature should be changed to accept a tuple instead of a list.

Suggested change

def import_pipeline_classes(

pipeline_types: list[PipelineType] | PipelineType | None = None

) -> dict[str, dict[str, type[ComposedPipelineBase] | None]]:

pipeline_types_key: tuple[PipelineType, ...] | PipelineType | None

if isinstance(pipeline_types, list):

pipeline_types_key = tuple(pipeline_types)

else:

pipeline_types_key = pipeline_types

return _import_pipeline_classes_cached(pipeline_types_key)

@lru_cache

def _import_pipeline_classes_cached(

pipeline_types: tuple[PipelineType, ...] | PipelineType | None = None

) -> dict[str, dict[str, type[ComposedPipelineBase] | None]]:

) -> dict[str, dict[str, dict[str, type[ComposedPipelineBase] | None]]]:

def import_pipeline_classes(

pipeline_types: tuple[PipelineType, ...] | PipelineType | None = None

) -> dict[str, dict[str, dict[str, type[ComposedPipelineBase] | None]]]:

gemini-code-assist · 2026-02-20T22:42:35Z

fastvideo/pipelines/pipeline_registry.py

+    if isinstance(pipeline_types, list):
        pipeline_types_to_scan = [
            pipeline_type.value for pipeline_type in pipeline_types
        ]


Following the change to the function signature to accept a tuple for caching purposes, this check should be updated to look for a tuple instead of a list.

Suggested change

if isinstance(pipeline_types, list):

pipeline_types_to_scan = [

pipeline_type.value for pipeline_type in pipeline_types

]

if isinstance(pipeline_types, tuple):

pipeline_types_to_scan = [

pipeline_type.value for pipeline_type in pipeline_types

]

fastvideo/tests/ssim/test_ovis_image_similarity.py

examples/inference/basic/basic_ovis_image.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

HenryDzy added 2 commits February 20, 2026 21:44

Add Ovis-Image-7B text-to-image pipeline

8a9e5a1

Add Ovis-Image-7B tests

f6ffce3

gemini-code-assist bot reviewed Feb 20, 2026

View reviewed changes

HenryDzy and others added 2 commits February 20, 2026 14:45

Update fastvideo/tests/ssim/test_ovis_image_similarity.py

156d68e

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update examples/inference/basic/basic_ovis_image.py

94fa3ce

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Add Ovis-Image-7B text-to-image pipeline#1117

[Feat] Add Ovis-Image-7B text-to-image pipeline#1117
HenryDzy wants to merge 4 commits intohao-ai-lab:mainfrom
HenryDzy:feat-contribution

HenryDzy commented Feb 20, 2026

Uh oh!

gemini-code-assist bot commented Feb 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Feb 20, 2026

Uh oh!

gemini-code-assist bot Feb 20, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HenryDzy commented Feb 20, 2026

New files

Files modified

Uh oh!

gemini-code-assist bot commented Feb 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant