I am planning to finetune the Wan2.1-1.3B-Camera-Control, so I need to process my training dataset to align with your pretrained data.
I am wondering what training dataset you used for camera-controlled video generation model.
How did you process the camera trajectories of the training dataset to solve the scale issue.
Did you use the metric scale?