michaelfeil
commited on
Commit
•
ea2f5de
1
Parent(s):
126be6a
Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,8 @@ For more info see our [End-to-end development service for custom LLMs and AI sys
|
|
17 |
|
18 |
This model extends LLama-3 70B's context length from 8k to > 1048K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. We trained on 34M tokens for this stage, and ~430M tokens total for all stages, which is < 0.003% of Llama-3's original pre-training data.
|
19 |
|
20 |
-
|
|
|
21 |
|
22 |
**Approach:**
|
23 |
|
|
|
17 |
|
18 |
This model extends LLama-3 70B's context length from 8k to > 1048K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. We trained on 34M tokens for this stage, and ~430M tokens total for all stages, which is < 0.003% of Llama-3's original pre-training data.
|
19 |
|
20 |
+
|
21 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/644fac0ce1d7a97f3b653ab1/GLrUQLqji6qFu_pxpBW7O.png)
|
22 |
|
23 |
**Approach:**
|
24 |
|