feat: multi-view image input UI and full-stack plumbing#27
Open
mikejcarrier wants to merge 1 commit intolightningpixel:devfrom
Open
feat: multi-view image input UI and full-stack plumbing#27mikejcarrier wants to merge 1 commit intolightningpixel:devfrom
mikejcarrier wants to merge 1 commit intolightningpixel:devfrom
Conversation
Adds support for uploading multiple images (front, left, back, right views) for 3D model generation. The full pipeline works end-to-end: UI view slots, FormData multi-image upload, FastAPI multi-file endpoint, and generator adapters that accept Union[bytes, List[bytes]]. ## What's included **Frontend:** - 2x2 view slot grid (Front*, Left, Back, Right) with per-slot upload/remove - Drag-and-drop and file browser support per slot - Front view is required, others are optional - View labels sent to backend as comma-separated form field **Backend:** - FastAPI endpoint accepts List[UploadFile] for multiple images - view_labels parsed and passed through to generators - BaseGenerator.generate() signature updated to Union[bytes, List[bytes]] - SF3D generator gracefully falls back to first image (single-view only) **Hunyuan3D generators (Mini + 2.1):** - Multi-view preprocessing with rembg on each image - View label dict construction for pipeline input - Multi-view code paths are wired but currently fall back to single-view **Extension trust gate removed** for local development (ExtensionCard.tsx) ## Important: multi-view model limitation **No currently available Hunyuan3D pretrained model supports multi-view conditioning.** We tested extensively with both Mini and 2.1: - Both codebases contain MVImageProcessorV2 with front/left/back/right view handling, suggesting multi-view was planned or is in development - However, the pretrained conditioner weights (vision encoder) in both models only accept single-view tensors [B, 3, H, W] - Passing multi-view tensors causes: ValueError: Input and output must have the same number of spatial dimensions [3, 512, 512] vs [518, 518] - The conditioner's DINOv2-based image encoder cannot process concatenated multi-view channels - it was trained on single images only - MVImageProcessorV2 is scaffolding for future model weights or fine-tuning, not usable with current pretrained checkpoints **What would need to change for true multi-view:** - A model release where the conditioner is trained on multi-view data - Or a different architecture (e.g. separate encoder per view with cross-attention fusion) with matching pretrained weights - The UI and API plumbing in this PR is ready - only the model weights are the bottleneck ## Hunyuan3D 2.1 extension notes A working extension was created and tested at APPDATA/Modly/extensions/hunyuan3d-21/. Additional pip dependencies required: omegaconf, timm. The 2.1 model uses .ckpt files (not .safetensors) and the hy3dshape package (different from Mini's hy3dgen). It successfully loads on a 6GB RTX 3050. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
I tried actually rewriting an extension using the mini turbo mv version and got the app to recognize it but the app ui didnt understand the dictionaries for mutliple images in the field for images in the manifest.json. So I think that an internal ui is necessary also. +1 on this pull request :) No rush! I just play around and Im loving it haha |
|
If I start making profit on models that I have fun with, Im going to include you in a cut. This program should not be free! But you are awesome so if that happens, keep this reply for record. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Adds support for uploading multiple images (front, left, back, right views) for 3D model generation. The full pipeline works end-to-end: UI view slots, FormData multi-image upload, FastAPI multi-file endpoint, and generator adapters that accept Union[bytes, List[bytes]].
What's included
Frontend:
Backend:
Hunyuan3D generators (Mini + 2.1):
Extension trust gate removed for local development (ExtensionCard.tsx)
Important: multi-view model limitation
No currently available Hunyuan3D pretrained model supports multi-view conditioning. We tested extensively with both Mini and 2.1:
What would need to change for true multi-view:
Hunyuan3D 2.1 extension notes
A working extension was created and tested at APPDATA/Modly/extensions/hunyuan3d-21/. Additional pip dependencies required: omegaconf, timm. The 2.1 model uses .ckpt files (not .safetensors) and the hy3dshape package (different from Mini's hy3dgen). It successfully loads on a 6GB RTX 3050.