yukiontheiceberg commited on
Commit
d51728b
1 Parent(s): f65e752

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -4,8 +4,8 @@ license: apache-2.0
4
 
5
  # LLM360 Research Suite: K2 Loss Spike 2
6
  We encountered two major loss spikes while [training K2](https://huggingface.co/LLM360/K2).
7
- * The [first loss spike](https://huggingface.co/LLM360/K2-Spike-1/) occured after X checkpoints and lasted over ~34 checkpoints. We restarted training at checkpoint 160 and training returned to normal.
8
- * The second loss spike occured after restarting training to fix the first loss spike at checkpoint 160 and lasted from ~8 checkpoints.
9
 
10
  We are releasing these checkpoints so others can study this interesting phenomena in large model training.
11
  <img src="loss_spike.png" alt="k2 loss spikes"/>
 
4
 
5
  # LLM360 Research Suite: K2 Loss Spike 2
6
  We encountered two major loss spikes while [training K2](https://huggingface.co/LLM360/K2).
7
+ * The [first loss spike](https://huggingface.co/LLM360/K2-Spike-1/) occured after 160 checkpoints and lasted over ~34 checkpoints. We restarted training at checkpoint 160 and training returned to normal.
8
+ * The second loss spike occured after restarting training to fix the first loss spike at checkpoint 186 and lasted from ~8 checkpoints.
9
 
10
  We are releasing these checkpoints so others can study this interesting phenomena in large model training.
11
  <img src="loss_spike.png" alt="k2 loss spikes"/>