gradientai
/

Llama-3-8B-Instruct-262k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

michaelfeil commited on May 4

Commit

78ffd74

•

1 Parent(s): 9834c46

Update readme: improvements

@leo-pekelis-gradient

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -13,7 +13,10 @@ license: llama3
 Gradient incorporates your data to deploy autonomous assistants that power critical operations across your business. To learn more or collaborate on a custom model, drop us a message at contact@gradient.ai.
 This model extends LLama-3 8B's context length from 8k to > 160K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens) by appropriately adjusting RoPE theta.
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6585dc9be92bc5f258156bd6/hiHWva3CbsrnPvZTp5-lu.png)
 **Approach:**
@@ -38,7 +41,7 @@ Exl2 is available on Bullerwins's huggingface account. Check it out here:
 **Data:**
-For training data, we generate long contexts by augmenting [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
 **Progressive Training Details:**

 Gradient incorporates your data to deploy autonomous assistants that power critical operations across your business. To learn more or collaborate on a custom model, drop us a message at contact@gradient.ai.
 This model extends LLama-3 8B's context length from 8k to > 160K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens) by appropriately adjusting RoPE theta.
+**Update (5/3): We further fine-tuned our model to strengthen its assistant-like chat ability as well. The NIAH result is updated.**
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/644fac0ce1d7a97f3b653ab1/s9T8L-6Jh5fYH6Q_88r3g.png)
 **Approach:**
 **Data:**
+ For training data, we generate long contexts by augmenting [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B). We also fine-tune on a chat dataset based on UltraChat [4], following a similar recipe for data augmentation to [2].
 **Progressive Training Details:**