mbrack commited on
Commit
cefadc7
1 Parent(s): bef1a6f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -20,7 +20,7 @@ Note that the model was not safety aligned and might generate problematic output
20
  This is the first release of an ongoing open research project for multilingual language models.
21
  If you want to train a model for your own language or are working on evaluations, please contact us or join our [Discord server](https://discord.gg/wUpvYs4XvM). **We are open for collaborations!**
22
 
23
- *Special thanks go to **Disco Research** and **Björn Plüster** for sharing the German dataset with us*
24
 
25
  ### Model details
26
 
@@ -57,13 +57,13 @@ set a seed for reproducibility:
57
 
58
  ## Dataset
59
 
60
- The training data was split evenly amongst the 5 languages based on the total number of tokens. We would like to thank Disco Research and Björn Plüster for making their dataset available to us.
61
 
62
  **English and Code**
63
  - [Open-Hermes-2B](https://huggingface.co/datasets/teknium/OpenHermes-2.5)
64
 
65
  **German**
66
- - [DiscoLM German Dataset](https://huggingface.co/DiscoResearch)
67
  - [OASST-2](https://huggingface.co/datasets/OpenAssistant/oasst2) (German subset)
68
  - [Aya-Dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset) (German subset)
69
 
 
20
  This is the first release of an ongoing open research project for multilingual language models.
21
  If you want to train a model for your own language or are working on evaluations, please contact us or join our [Discord server](https://discord.gg/wUpvYs4XvM). **We are open for collaborations!**
22
 
23
+ *Special thanks go to **[Disco Research](https://huggingface.co/DiscoResearch)**, **[Jan Philipp Harries](https://huggingface.co/jphme)**, and **[Björn Plüster](https://huggingface.co/bjoernp)** for sharing the German dataset with us*
24
 
25
  ### Model details
26
 
 
57
 
58
  ## Dataset
59
 
60
+ The training data was split evenly amongst the 5 languages based on the total number of tokens. We would like to thank [Disco Research](https://huggingface.co/DiscoResearch), [Jan Philipp Harries](https://huggingface.co/jphme), and [Björn Plüster](https://huggingface.co/bjoernp) for making their dataset available to us.
61
 
62
  **English and Code**
63
  - [Open-Hermes-2B](https://huggingface.co/datasets/teknium/OpenHermes-2.5)
64
 
65
  **German**
66
+ - [DiscoLM German Dataset](https://huggingface.co/DiscoResearch) includes the publicly available [germanrag](https://huggingface.co/datasets/DiscoResearch/germanrag) dataset
67
  - [OASST-2](https://huggingface.co/datasets/OpenAssistant/oasst2) (German subset)
68
  - [Aya-Dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset) (German subset)
69