mbrack commited on
Commit
f517ec0
1 Parent(s): e5c34a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -59,13 +59,14 @@ set a seed for reproducibility:
59
 
60
  ## Dataset
61
 
62
- The training data was split evenly amongst the 5 languages based on the total number of tokens. We would like to thank Disco Research and Björn Plüster for making their dataset available to us.
 
63
 
64
  **English and Code**
65
  - [Open-Hermes-2B](https://huggingface.co/datasets/teknium/OpenHermes-2.5)
66
 
67
  **German**
68
- - [DiscoLM German Dataset](https://huggingface.co/DiscoResearch)
69
  - [OASST-2](https://huggingface.co/datasets/OpenAssistant/oasst2) (German subset)
70
  - [Aya-Dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset) (German subset)
71
 
 
59
 
60
  ## Dataset
61
 
62
+ The training data was split evenly amongst the 5 languages based on the total number of tokens. We would like to thank [Disco Research](https://huggingface.co/DiscoResearch), [Jan Philipp Harries](https://huggingface.co/jphme), and [Björn Plüster](https://huggingface.co/bjoernp) for making their dataset available to us.
63
+
64
 
65
  **English and Code**
66
  - [Open-Hermes-2B](https://huggingface.co/datasets/teknium/OpenHermes-2.5)
67
 
68
  **German**
69
+ - [DiscoLM German Dataset](https://huggingface.co/DiscoResearch) includes the publicly available [germanrag](https://huggingface.co/datasets/DiscoResearch/germanrag) dataset
70
  - [OASST-2](https://huggingface.co/datasets/OpenAssistant/oasst2) (German subset)
71
  - [Aya-Dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset) (German subset)
72