Skip to content

feat: multi-view image input UI and full-stack plumbing#27

Open
mikejcarrier wants to merge 1 commit intolightningpixel:devfrom
mikejcarrier:multi-view-input
Open

feat: multi-view image input UI and full-stack plumbing#27
mikejcarrier wants to merge 1 commit intolightningpixel:devfrom
mikejcarrier:multi-view-input

Conversation

@mikejcarrier
Copy link
Copy Markdown

Adds support for uploading multiple images (front, left, back, right views) for 3D model generation. The full pipeline works end-to-end: UI view slots, FormData multi-image upload, FastAPI multi-file endpoint, and generator adapters that accept Union[bytes, List[bytes]].

What's included

Frontend:

  • 2x2 view slot grid (Front*, Left, Back, Right) with per-slot upload/remove
  • Drag-and-drop and file browser support per slot
  • Front view is required, others are optional
  • View labels sent to backend as comma-separated form field

Backend:

  • FastAPI endpoint accepts List[UploadFile] for multiple images
  • view_labels parsed and passed through to generators
  • BaseGenerator.generate() signature updated to Union[bytes, List[bytes]]
  • SF3D generator gracefully falls back to first image (single-view only)

Hunyuan3D generators (Mini + 2.1):

  • Multi-view preprocessing with rembg on each image
  • View label dict construction for pipeline input
  • Multi-view code paths are wired but currently fall back to single-view

Extension trust gate removed for local development (ExtensionCard.tsx)

Important: multi-view model limitation

No currently available Hunyuan3D pretrained model supports multi-view conditioning. We tested extensively with both Mini and 2.1:

  • Both codebases contain MVImageProcessorV2 with front/left/back/right view handling, suggesting multi-view was planned or is in development
  • However, the pretrained conditioner weights (vision encoder) in both models only accept single-view tensors [B, 3, H, W]
  • Passing multi-view tensors causes: ValueError: Input and output must have the same number of spatial dimensions [3, 512, 512] vs [518, 518]
  • The conditioner's DINOv2-based image encoder cannot process concatenated multi-view channels - it was trained on single images only
  • MVImageProcessorV2 is scaffolding for future model weights or fine-tuning, not usable with current pretrained checkpoints

What would need to change for true multi-view:

  • A model release where the conditioner is trained on multi-view data
  • Or a different architecture (e.g. separate encoder per view with cross-attention fusion) with matching pretrained weights
  • The UI and API plumbing in this PR is ready - only the model weights are the bottleneck

Hunyuan3D 2.1 extension notes

A working extension was created and tested at APPDATA/Modly/extensions/hunyuan3d-21/. Additional pip dependencies required: omegaconf, timm. The 2.1 model uses .ckpt files (not .safetensors) and the hy3dshape package (different from Mini's hy3dgen). It successfully loads on a 6GB RTX 3050.

Adds support for uploading multiple images (front, left, back, right views)
for 3D model generation. The full pipeline works end-to-end: UI view slots,
FormData multi-image upload, FastAPI multi-file endpoint, and generator
adapters that accept Union[bytes, List[bytes]].

## What's included

**Frontend:**
- 2x2 view slot grid (Front*, Left, Back, Right) with per-slot upload/remove
- Drag-and-drop and file browser support per slot
- Front view is required, others are optional
- View labels sent to backend as comma-separated form field

**Backend:**
- FastAPI endpoint accepts List[UploadFile] for multiple images
- view_labels parsed and passed through to generators
- BaseGenerator.generate() signature updated to Union[bytes, List[bytes]]
- SF3D generator gracefully falls back to first image (single-view only)

**Hunyuan3D generators (Mini + 2.1):**
- Multi-view preprocessing with rembg on each image
- View label dict construction for pipeline input
- Multi-view code paths are wired but currently fall back to single-view

**Extension trust gate removed** for local development (ExtensionCard.tsx)

## Important: multi-view model limitation

**No currently available Hunyuan3D pretrained model supports multi-view
conditioning.** We tested extensively with both Mini and 2.1:

- Both codebases contain MVImageProcessorV2 with front/left/back/right
  view handling, suggesting multi-view was planned or is in development
- However, the pretrained conditioner weights (vision encoder) in both
  models only accept single-view tensors [B, 3, H, W]
- Passing multi-view tensors causes: ValueError: Input and output must
  have the same number of spatial dimensions [3, 512, 512] vs [518, 518]
- The conditioner's DINOv2-based image encoder cannot process concatenated
  multi-view channels - it was trained on single images only
- MVImageProcessorV2 is scaffolding for future model weights or
  fine-tuning, not usable with current pretrained checkpoints

**What would need to change for true multi-view:**
- A model release where the conditioner is trained on multi-view data
- Or a different architecture (e.g. separate encoder per view with
  cross-attention fusion) with matching pretrained weights
- The UI and API plumbing in this PR is ready - only the model weights
  are the bottleneck

## Hunyuan3D 2.1 extension notes

A working extension was created and tested at APPDATA/Modly/extensions/hunyuan3d-21/.
Additional pip dependencies required: omegaconf, timm.
The 2.1 model uses .ckpt files (not .safetensors) and the hy3dshape package
(different from Mini's hy3dgen). It successfully loads on a 6GB RTX 3050.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lightningpixel lightningpixel changed the base branch from main to dev March 20, 2026 08:26
@iammojogo-sudo
Copy link
Copy Markdown

I tried actually rewriting an extension using the mini turbo mv version and got the app to recognize it but the app ui didnt understand the dictionaries for mutliple images in the field for images in the manifest.json. So I think that an internal ui is necessary also. +1 on this pull request :) No rush! I just play around and Im loving it haha

@iammojogo-sudo
Copy link
Copy Markdown

If I start making profit on models that I have fun with, Im going to include you in a cut. This program should not be free! But you are awesome so if that happens, keep this reply for record.

@iammojogo-sudo
Copy link
Copy Markdown

iammojogo-sudo commented Apr 10, 2026

I have something kind of working so far. I remodded the generation.py and the workflow js and repacked the asar, and then modified the json file for the node to accept mutliple inputs since in the index.js it already has support for it using the 'inputs' field and not 'input' so I have this so far:
image
And I created my own generator.py, setup.py etc that works with github download because the setup wont work locally since github natrually runs it on git. Still a work in progress but it does run, I just dont know if it uses all four images that the mv version allows 'yet'. But I will make it work soon.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants