Post
536
🤩warmup -> stable -> decay leanring rate scheduler:
😎use the Stable Phase CheckPoints to Continue Training the model on Any New Dataset without spikes of the training!!!
JingzeShi/Doge-20M-checkpoint
JingzeShi/Doge-60M-checkpoint
😎use the Stable Phase CheckPoints to Continue Training the model on Any New Dataset without spikes of the training!!!
JingzeShi/Doge-20M-checkpoint
JingzeShi/Doge-60M-checkpoint