Scale Issue about Camera Controlled Video Model

I am planning to finetune the Wan2.1-1.3B-Camera-Control, so I need to process my training dataset to align with your pretrained data. 
I am wondering what training dataset you used for camera-controlled video generation model. 
How did you process the camera trajectories of the training dataset to solve the scale issue. 
Did you use the metric scale?