File size: 2,276 Bytes
751dbe2
 
 
7b8a52a
74bef6c
75a7453
32b8445
 
74bef6c
32b8445
74bef6c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3e59d13
74bef6c
 
 
 
 
0315cf4
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
license: apache-2.0
---

# LLM360 Research Suite: K2 Loss Spike 2
We encountered two major loss spikes while [training K2](https://huggingface.co/LLM360/K2). 
* The [first loss spike](https://huggingface.co/LLM360/K2-Spike-1/) occured after X checkpoints and lasted over ~34 checkpoints. We restarted training at checkpoint X and training returned to normal.
* The second loss spike occured after restarting training to fix the first loss spike at checkpoint X and lasted from ~8 checkpoints.

We are releasing these checkpoints so others can study this interesting phenomena in large model training.
<img src="k2_spike_1.png" alt="k2 spike 1"/>

# Purpose
Loss spikes are still a relatively unknown phenomena. By making these spikes and associated training details available, we hope others use these artifacts to further the worlds knowledge on this topic.

## All Checkpoints
| Checkpoints      |  |
| ----------- | ----------- |
| [Checkpoint 186](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_186)     | [Checkpoint 194](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_194)       |
| [Checkpoint 188](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_188)   | [Checkpoint 196](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_196)        |
| [Checkpoint 190](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_190)   | [Checkpoint 198](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_198)        |
| [Checkpoint 192](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_192)   | [Checkpoint 200](https://huggingface.co/LLM360/K2-Spike-2/tree/spike_ckpt_200)        |


[to find all branches: git branch -a]

## Loss Spike's on the LLM360 Evaluation Suite

View all the evaluations on our [Weights & Biases here](https://wandb.ai/llm360/K2?nw=inng96ujjmr)

## About the LLM360 Research Suite
The LLM360 Research Suite is a comprehensive set of large language model (LLM) artifacts from Amber, CrystalCoder, and K2 for academic and industry researchers to explore LLM training dynamics. Additional resources can be found at llm360.ai.


## Citation

**BibTeX:**

```bibtex
@misc{
      title={LLM360-K2-65B: Scaling Up Open and Transparent Language Models}, 
      author={The LLM360 Team},
      year={2024},
}
```