leo-pekelis-gradient
commited on
Commit
•
5cfd414
1
Parent(s):
9411de7
Update README.md
Browse files
README.md
CHANGED
@@ -7,10 +7,9 @@ tags:
|
|
7 |
- llama-3
|
8 |
---
|
9 |
|
|
|
10 |
|
11 |
-
|
12 |
-
|
13 |
-
This model extends LLama-3 8B's context length from 8k to > 130K, developed by Gradient, sponsored by compute from Crusoe Energy. It demonstrates that SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens) by appropriately adjusting RoPE theta.
|
14 |
|
15 |
**Approach:**
|
16 |
|
|
|
7 |
- llama-3
|
8 |
---
|
9 |
|
10 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6585dc9be92bc5f258156bd6/hiHWva3CbsrnPvZTp5-lu.png)
|
11 |
|
12 |
+
This model extends LLama-3 8B's context length from 8k to > 160K, developed by Gradient, sponsored by compute from Crusoe Energy. It demonstrates that SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens) by appropriately adjusting RoPE theta.
|
|
|
|
|
13 |
|
14 |
**Approach:**
|
15 |
|