Hi @lhmd ,
Follow-up question regarding the sampling trajectory:
In the current implementation, we use 24 SDE steps for training (with the remaining steps running on deterministic ODE). If we aggressively modify this setup to use only 2 SDE steps (and run all the remaining steps on ODE via the ODE-to-SDE hybrid sampling framework), can the policy still successfully converge?
I am curious if you have ever tried this "2 SDE + remaining ODE" setup in video tasks, and whether it would suffer from severe reward sparsity. Thanks!
Hi @lhmd ,
Follow-up question regarding the sampling trajectory:
In the current implementation, we use 24 SDE steps for training (with the remaining steps running on deterministic ODE). If we aggressively modify this setup to use only 2 SDE steps (and run all the remaining steps on ODE via the ODE-to-SDE hybrid sampling framework), can the policy still successfully converge?
I am curious if you have ever tried this "2 SDE + remaining ODE" setup in video tasks, and whether it would suffer from severe reward sparsity. Thanks!