DiscoResearch
/

Llama3-DiscoLeo-Instruct-8B-32k-v0.1

@@ -8,14 +8,14 @@ library_name: transformers
 ## Thanks and Accreditation
-[DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1](https://huggingface.co/collections/DiscoResearch/discoleo-8b-llama3-for-german-6650527496c0fafefd4c9729)
 is the result of a joint effort between [DiscoResearch](https://huggingface.co/DiscoResearch) and [Occiglot](https://huggingface.co/occiglot)
 with support from the [DFKI](https://www.dfki.de/web/) (German Research Center for Artificial Intelligence) and [hessian.Ai](https://hessian.ai).
 Occiglot kindly handled data preprocessing, filtering, and deduplication as part of their latest [dataset release](https://huggingface.co/datasets/occiglot/occiglot-fineweb-v0.5), as well as sharing their compute allocation at hessian.Ai's 42 Supercomputer.
 ## Model Overview
-DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1 is an instruction tuned version of our long-context [Llama3_German_8B_32k](https://huggingface.co/DiscoResearch/Llama3_German_8B_32k).
 The base model was derived from [Meta's Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) through continuous pretraining on 65 billion high-quality German tokens, similar to previous [LeoLM](https://huggingface.co/LeoLM) or [Occiglot](https://huggingface.co/collections/occiglot/occiglot-eu5-7b-v01-65dbed502a6348b052695e01) models.
 For the long-context version we trained on an additional 100 million tokens at 32k context length, using a rope_theta value of 1.5e6 and a learning rate of 1.5e-5 with a batch size of 256*8192 and otherwise equal hyperparameters to the base model.
 We finetuned this checkpoint on the German Instruction dataset from DiscoResearch created by [Jan-Philipp Harries](https://huggingface.co/jphme) and [Daniel Auras](https://huggingface.co/rasdani) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)).

 ## Thanks and Accreditation
+[DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1](https://huggingface.co/collections/DiscoResearch/discoleo-8b-llama3-for-german-6650527496c0fafefd4c9729)
 is the result of a joint effort between [DiscoResearch](https://huggingface.co/DiscoResearch) and [Occiglot](https://huggingface.co/occiglot)
 with support from the [DFKI](https://www.dfki.de/web/) (German Research Center for Artificial Intelligence) and [hessian.Ai](https://hessian.ai).
 Occiglot kindly handled data preprocessing, filtering, and deduplication as part of their latest [dataset release](https://huggingface.co/datasets/occiglot/occiglot-fineweb-v0.5), as well as sharing their compute allocation at hessian.Ai's 42 Supercomputer.
 ## Model Overview
+DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1 is an instruction tuned version of our long-context [Llama3-German-8B-32k](https://huggingface.co/DiscoResearch/Llama3_German_8B_32k).
 The base model was derived from [Meta's Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) through continuous pretraining on 65 billion high-quality German tokens, similar to previous [LeoLM](https://huggingface.co/LeoLM) or [Occiglot](https://huggingface.co/collections/occiglot/occiglot-eu5-7b-v01-65dbed502a6348b052695e01) models.
 For the long-context version we trained on an additional 100 million tokens at 32k context length, using a rope_theta value of 1.5e6 and a learning rate of 1.5e-5 with a batch size of 256*8192 and otherwise equal hyperparameters to the base model.
 We finetuned this checkpoint on the German Instruction dataset from DiscoResearch created by [Jan-Philipp Harries](https://huggingface.co/jphme) and [Daniel Auras](https://huggingface.co/rasdani) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)).