Convergence feasibility of compressing training timesteps to 2 steps in video RL

Hi @lhmd  ,
Follow-up question regarding the sampling trajectory:
In the current implementation, we use 24 SDE steps for training (with the remaining steps running on deterministic ODE). If we aggressively modify this setup to use only 2 SDE steps (and run all the remaining steps on ODE via the ODE-to-SDE hybrid sampling framework), can the policy still successfully converge?
I am curious if you have ever tried this "2 SDE + remaining ODE" setup in video tasks, and whether it would suffer from severe reward sparsity. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convergence feasibility of compressing training timesteps to 2 steps in video RL #9

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Convergence feasibility of compressing training timesteps to 2 steps in video RL #9

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions