michaelfeil commited on
Commit
9687487
1 Parent(s): 9834c46

Update readme: improvements (#18)

Browse files

- Update readme: improvements (78ffd748ac052c1791af0152ade1bf040d1156f5)

Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -13,7 +13,10 @@ license: llama3
13
  Gradient incorporates your data to deploy autonomous assistants that power critical operations across your business. To learn more or collaborate on a custom model, drop us a message at contact@gradient.ai.
14
 
15
  This model extends LLama-3 8B's context length from 8k to > 160K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens) by appropriately adjusting RoPE theta.
16
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6585dc9be92bc5f258156bd6/hiHWva3CbsrnPvZTp5-lu.png)
 
 
 
17
 
18
  **Approach:**
19
 
@@ -38,7 +41,7 @@ Exl2 is available on Bullerwins's huggingface account. Check it out here:
38
 
39
  **Data:**
40
 
41
- For training data, we generate long contexts by augmenting [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
42
 
43
  **Progressive Training Details:**
44
 
 
13
  Gradient incorporates your data to deploy autonomous assistants that power critical operations across your business. To learn more or collaborate on a custom model, drop us a message at contact@gradient.ai.
14
 
15
  This model extends LLama-3 8B's context length from 8k to > 160K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens) by appropriately adjusting RoPE theta.
16
+
17
+ **Update (5/3): We further fine-tuned our model to strengthen its assistant-like chat ability as well. The NIAH result is updated.**
18
+
19
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/644fac0ce1d7a97f3b653ab1/s9T8L-6Jh5fYH6Q_88r3g.png)
20
 
21
  **Approach:**
22
 
 
41
 
42
  **Data:**
43
 
44
+ For training data, we generate long contexts by augmenting [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B). We also fine-tune on a chat dataset based on UltraChat [4], following a similar recipe for data augmentation to [2].
45
 
46
  **Progressive Training Details:**
47