Use snapshot to prevent warmup from affecting training and refactor warmup#68
Conversation
85caa27 to
5582afb
Compare
PatrickRMiles
left a comment
There was a problem hiding this comment.
The snapshot is a cool approach, but I wonder if we can do this simpler by just creating another copy of the model and using that separate copy for warmup. We'd create two separate models the same way in worker.py, warmup on one, then train on the other. This would achieve what we want -- preventing warmup from impacting the model state before training -- without needing this snapshot logic. Worth testing at least before we merge this
I did test that copying the model using |
PatrickRMiles
left a comment
There was a problem hiding this comment.
We've decided to keep the snapshot approach. This looks good!
With this configuration, runtime of the first epoch should now be similar to subsequent epochs.