feat: multi-view image input UI and full-stack plumbing by mikejcarrier · Pull Request #27 · lightningpixel/modly

mikejcarrier · 2026-03-20T01:44:35Z

Adds support for uploading multiple images (front, left, back, right views) for 3D model generation. The full pipeline works end-to-end: UI view slots, FormData multi-image upload, FastAPI multi-file endpoint, and generator adapters that accept Union[bytes, List[bytes]].

What's included

Frontend:

2x2 view slot grid (Front*, Left, Back, Right) with per-slot upload/remove
Drag-and-drop and file browser support per slot
Front view is required, others are optional
View labels sent to backend as comma-separated form field

Backend:

FastAPI endpoint accepts List[UploadFile] for multiple images
view_labels parsed and passed through to generators
BaseGenerator.generate() signature updated to Union[bytes, List[bytes]]
SF3D generator gracefully falls back to first image (single-view only)

Hunyuan3D generators (Mini + 2.1):

Multi-view preprocessing with rembg on each image
View label dict construction for pipeline input
Multi-view code paths are wired but currently fall back to single-view

Extension trust gate removed for local development (ExtensionCard.tsx)

Important: multi-view model limitation

No currently available Hunyuan3D pretrained model supports multi-view conditioning. We tested extensively with both Mini and 2.1:

Both codebases contain MVImageProcessorV2 with front/left/back/right view handling, suggesting multi-view was planned or is in development
However, the pretrained conditioner weights (vision encoder) in both models only accept single-view tensors [B, 3, H, W]
Passing multi-view tensors causes: ValueError: Input and output must have the same number of spatial dimensions [3, 512, 512] vs [518, 518]
The conditioner's DINOv2-based image encoder cannot process concatenated multi-view channels - it was trained on single images only
MVImageProcessorV2 is scaffolding for future model weights or fine-tuning, not usable with current pretrained checkpoints

What would need to change for true multi-view:

A model release where the conditioner is trained on multi-view data
Or a different architecture (e.g. separate encoder per view with cross-attention fusion) with matching pretrained weights
The UI and API plumbing in this PR is ready - only the model weights are the bottleneck

Hunyuan3D 2.1 extension notes

A working extension was created and tested at APPDATA/Modly/extensions/hunyuan3d-21/. Additional pip dependencies required: omegaconf, timm. The 2.1 model uses .ckpt files (not .safetensors) and the hy3dshape package (different from Mini's hy3dgen). It successfully loads on a 6GB RTX 3050.

Adds support for uploading multiple images (front, left, back, right views) for 3D model generation. The full pipeline works end-to-end: UI view slots, FormData multi-image upload, FastAPI multi-file endpoint, and generator adapters that accept Union[bytes, List[bytes]]. ## What's included **Frontend:** - 2x2 view slot grid (Front*, Left, Back, Right) with per-slot upload/remove - Drag-and-drop and file browser support per slot - Front view is required, others are optional - View labels sent to backend as comma-separated form field **Backend:** - FastAPI endpoint accepts List[UploadFile] for multiple images - view_labels parsed and passed through to generators - BaseGenerator.generate() signature updated to Union[bytes, List[bytes]] - SF3D generator gracefully falls back to first image (single-view only) **Hunyuan3D generators (Mini + 2.1):** - Multi-view preprocessing with rembg on each image - View label dict construction for pipeline input - Multi-view code paths are wired but currently fall back to single-view **Extension trust gate removed** for local development (ExtensionCard.tsx) ## Important: multi-view model limitation **No currently available Hunyuan3D pretrained model supports multi-view conditioning.** We tested extensively with both Mini and 2.1: - Both codebases contain MVImageProcessorV2 with front/left/back/right view handling, suggesting multi-view was planned or is in development - However, the pretrained conditioner weights (vision encoder) in both models only accept single-view tensors [B, 3, H, W] - Passing multi-view tensors causes: ValueError: Input and output must have the same number of spatial dimensions [3, 512, 512] vs [518, 518] - The conditioner's DINOv2-based image encoder cannot process concatenated multi-view channels - it was trained on single images only - MVImageProcessorV2 is scaffolding for future model weights or fine-tuning, not usable with current pretrained checkpoints **What would need to change for true multi-view:** - A model release where the conditioner is trained on multi-view data - Or a different architecture (e.g. separate encoder per view with cross-attention fusion) with matching pretrained weights - The UI and API plumbing in this PR is ready - only the model weights are the bottleneck ## Hunyuan3D 2.1 extension notes A working extension was created and tested at APPDATA/Modly/extensions/hunyuan3d-21/. Additional pip dependencies required: omegaconf, timm. The 2.1 model uses .ckpt files (not .safetensors) and the hy3dshape package (different from Mini's hy3dgen). It successfully loads on a 6GB RTX 3050. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

iammojogo-sudo · 2026-04-08T22:38:27Z

I tried actually rewriting an extension using the mini turbo mv version and got the app to recognize it but the app ui didnt understand the dictionaries for mutliple images in the field for images in the manifest.json. So I think that an internal ui is necessary also. +1 on this pull request :) No rush! I just play around and Im loving it haha

iammojogo-sudo · 2026-04-08T23:43:49Z

If I start making profit on models that I have fun with, Im going to include you in a cut. This program should not be free! But you are awesome so if that happens, keep this reply for record.

iammojogo-sudo · 2026-04-10T04:09:10Z

I have something kind of working so far. I remodded the generation.py and the workflow js and repacked the asar, and then modified the json file for the node to accept mutliple inputs since in the index.js it already has support for it using the 'inputs' field and not 'input' so I have this so far:

And I created my own generator.py, setup.py etc that works with github download because the setup wont work locally since github natrually runs it on git. Still a work in progress but it does run, I just dont know if it uses all four images that the mv version allows 'yet'. But I will make it work soon.

Lorchie assigned lightningpixel Mar 20, 2026

lightningpixel changed the base branch from main to dev March 20, 2026 08:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: multi-view image input UI and full-stack plumbing#27

feat: multi-view image input UI and full-stack plumbing#27
mikejcarrier wants to merge 1 commit intolightningpixel:devfrom
mikejcarrier:multi-view-input

mikejcarrier commented Mar 20, 2026

Uh oh!

iammojogo-sudo commented Apr 8, 2026

Uh oh!

iammojogo-sudo commented Apr 8, 2026

Uh oh!

iammojogo-sudo commented Apr 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mikejcarrier commented Mar 20, 2026

What's included

Important: multi-view model limitation

Hunyuan3D 2.1 extension notes

Uh oh!

iammojogo-sudo commented Apr 8, 2026

Uh oh!

iammojogo-sudo commented Apr 8, 2026

Uh oh!

iammojogo-sudo commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

iammojogo-sudo commented Apr 10, 2026 •

edited

Loading