The current implementation in inference_basic.py has
video_frames = pipeline(
image=validation_image,
image_pose=validation_control_images,
...
which basically handles all pose images in one pipeline, this requires significant amount of GPU memory and limits the length of the video. I tried to split the validation_control_images into smaller chunks such as 40 images in a sublist and run the pipeline in a for loop. This works well for me and the final video quality looks good to me. Not sure whether this will affect the ID consistency but at least from what i tried results are good.
The current implementation in inference_basic.py has
video_frames = pipeline(
image=validation_image,
image_pose=validation_control_images,
...
which basically handles all pose images in one pipeline, this requires significant amount of GPU memory and limits the length of the video. I tried to split the validation_control_images into smaller chunks such as 40 images in a sublist and run the pipeline in a for loop. This works well for me and the final video quality looks good to me. Not sure whether this will affect the ID consistency but at least from what i tried results are good.