gradientai
/

Llama-3-70B-Instruct-Gradient-1048k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

markpreemo commited on May 5, 2024

Commit

e2d6333

·

verified ·

1 Parent(s): 5f13efc

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -17,6 +17,8 @@ Gradient incorporates your data to deploy autonomous assistants that power criti
 For more info see our [End-to-end development service for custom LLMs and AI systems](https://gradient.ai/development-lab)
 This model extends LLama-3 70B's context length from 8k to > 1048K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. We trained on 34M tokens for this stage, and ~430M tokens total for all stages, which is < 0.003% of Llama-3's original pre-training data.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/655bb613e8a8971e89944f3e/gb6Z7nMQR5WoLUsW35CIh.png)

 For more info see our [End-to-end development service for custom LLMs and AI systems](https://gradient.ai/development-lab)
+[Join our Discord](https://discord.com/invite/2QVy2qt2mf)
 This model extends LLama-3 70B's context length from 8k to > 1048K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. We trained on 34M tokens for this stage, and ~430M tokens total for all stages, which is < 0.003% of Llama-3's original pre-training data.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/655bb613e8a8971e89944f3e/gb6Z7nMQR5WoLUsW35CIh.png)