tpeng726 commited on
Commit
f993ebb
1 Parent(s): 4cf2015

Update eval plot

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -17,7 +17,7 @@ For more info see our [End-to-end development service for custom LLMs and AI sys
17
 
18
  This model extends LLama-3 70B's context length from 8k to > 1048K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. We trained on 34M tokens for this stage, and ~430M tokens total for all stages, which is < 0.003% of Llama-3's original pre-training data.
19
 
20
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/644fac0ce1d7a97f3b653ab1/GLrUQLqji6qFu_pxpBW7O.png)
21
 
22
  **Approach:**
23
 
 
17
 
18
  This model extends LLama-3 70B's context length from 8k to > 1048K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. We trained on 34M tokens for this stage, and ~430M tokens total for all stages, which is < 0.003% of Llama-3's original pre-training data.
19
 
20
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/655bb613e8a8971e89944f3e/gb6Z7nMQR5WoLUsW35CIh.png)
21
 
22
  **Approach:**
23