michaelfeil commited on
Commit
5e409cf
1 Parent(s): 42919cd
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -9,17 +9,17 @@ tags:
9
 
10
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6585dc9be92bc5f258156bd6/hiHWva3CbsrnPvZTp5-lu.png)
11
 
12
- This model extends LLama-3 8B's context length from 8k to > 160K, developed by Gradient, sponsored by compute from Crusoe Energy. It demonstrates that SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens) by appropriately adjusting RoPE theta.
13
 
14
  **Approach:**
15
 
16
  - [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) as the base
17
  - NTK-aware interpolation [1] to initialize an optimal schedule for RoPE theta, followed by a new data-driven RoPE theta optimization technique
18
- - progressive training on increasing context lengths similar to the [Large World Model](https://huggingface.co/LargeWorldModel) [2]
19
 
20
  **Infra:**
21
 
22
- We build on top of the EasyContext Blockwise RingAttention library [3] to scalably and efficiently train on contexts up to 256k in length on Crusoe Energy's high performance L40S cluster.
23
 
24
  **Data:**
25
 
 
9
 
10
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6585dc9be92bc5f258156bd6/hiHWva3CbsrnPvZTp5-lu.png)
11
 
12
+ This model extends LLama-3 8B's context length from 8k to > 160K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens) by appropriately adjusting RoPE theta.
13
 
14
  **Approach:**
15
 
16
  - [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) as the base
17
  - NTK-aware interpolation [1] to initialize an optimal schedule for RoPE theta, followed by a new data-driven RoPE theta optimization technique
18
+ - Progressive training on increasing context lengths similar to the [Large World Model](https://huggingface.co/LargeWorldModel) [2] (See details below)
19
 
20
  **Infra:**
21
 
22
+ We build on top of the EasyContext Blockwise RingAttention library [3] to scalably and efficiently train on contexts up to 262144 tokens on [Crusoe Energy](https://huggingface.co/crusoeai) high performance L40S cluster.
23
 
24
  **Data:**
25