mbrack commited on
Commit
8cceabf
1 Parent(s): 73d589c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -8,14 +8,14 @@ library_name: transformers
8
 
9
  ## Thanks and Accreditation
10
 
11
- [DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1](https://huggingface.co/collections/DiscoResearch/discoleo-8b-llama3-for-german-6650527496c0fafefd4c9729)
12
  is the result of a joint effort between [DiscoResearch](https://huggingface.co/DiscoResearch) and [Occiglot](https://huggingface.co/occiglot)
13
  with support from the [DFKI](https://www.dfki.de/web/) (German Research Center for Artificial Intelligence) and [hessian.Ai](https://hessian.ai).
14
  Occiglot kindly handled data preprocessing, filtering, and deduplication as part of their latest [dataset release](https://huggingface.co/datasets/occiglot/occiglot-fineweb-v0.5), as well as sharing their compute allocation at hessian.Ai's 42 Supercomputer.
15
 
16
  ## Model Overview
17
 
18
- DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1 is an instruction tuned version of our long-context [Llama3_German_8B_32k](https://huggingface.co/DiscoResearch/Llama3_German_8B_32k).
19
  The base model was derived from [Meta's Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) through continuous pretraining on 65 billion high-quality German tokens, similar to previous [LeoLM](https://huggingface.co/LeoLM) or [Occiglot](https://huggingface.co/collections/occiglot/occiglot-eu5-7b-v01-65dbed502a6348b052695e01) models.
20
  For the long-context version we trained on an additional 100 million tokens at 32k context length, using a rope_theta value of 1.5e6 and a learning rate of 1.5e-5 with a batch size of 256*8192 and otherwise equal hyperparameters to the base model.
21
  We finetuned this checkpoint on the German Instruction dataset from DiscoResearch created by [Jan-Philipp Harries](https://huggingface.co/jphme) and [Daniel Auras](https://huggingface.co/rasdani) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)).
 
8
 
9
  ## Thanks and Accreditation
10
 
11
+ [DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1](https://huggingface.co/collections/DiscoResearch/discoleo-8b-llama3-for-german-6650527496c0fafefd4c9729)
12
  is the result of a joint effort between [DiscoResearch](https://huggingface.co/DiscoResearch) and [Occiglot](https://huggingface.co/occiglot)
13
  with support from the [DFKI](https://www.dfki.de/web/) (German Research Center for Artificial Intelligence) and [hessian.Ai](https://hessian.ai).
14
  Occiglot kindly handled data preprocessing, filtering, and deduplication as part of their latest [dataset release](https://huggingface.co/datasets/occiglot/occiglot-fineweb-v0.5), as well as sharing their compute allocation at hessian.Ai's 42 Supercomputer.
15
 
16
  ## Model Overview
17
 
18
+ DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1 is an instruction tuned version of our long-context [Llama3-German-8B-32k](https://huggingface.co/DiscoResearch/Llama3_German_8B_32k).
19
  The base model was derived from [Meta's Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) through continuous pretraining on 65 billion high-quality German tokens, similar to previous [LeoLM](https://huggingface.co/LeoLM) or [Occiglot](https://huggingface.co/collections/occiglot/occiglot-eu5-7b-v01-65dbed502a6348b052695e01) models.
20
  For the long-context version we trained on an additional 100 million tokens at 32k context length, using a rope_theta value of 1.5e6 and a learning rate of 1.5e-5 with a batch size of 256*8192 and otherwise equal hyperparameters to the base model.
21
  We finetuned this checkpoint on the German Instruction dataset from DiscoResearch created by [Jan-Philipp Harries](https://huggingface.co/jphme) and [Daniel Auras](https://huggingface.co/rasdani) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)).