occiglot
/

occiglot-7b-de-en-instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mbrack commited on Mar 8, 2024

Commit

cefadc7

·

verified ·

1 Parent(s): bef1a6f

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ Note that the model was not safety aligned and might generate problematic output
 This is the first release of an ongoing open research project for multilingual language models.
 If you want to train a model for your own language or are working on evaluations, please contact us or join our [Discord server](https://discord.gg/wUpvYs4XvM). **We are open for collaborations!**
-*Special thanks go to **Disco Research** and **Björn Plüster** for sharing the German dataset with us*
 ### Model details
@@ -57,13 +57,13 @@ set a seed for reproducibility:
 ## Dataset
-The training data was split evenly amongst the 5 languages based on the total number of tokens. We would like to thank Disco Research and Björn Plüster for making their dataset available to us.
 **English and Code**
  - [Open-Hermes-2B](https://huggingface.co/datasets/teknium/OpenHermes-2.5)
 **German**
- - [DiscoLM German Dataset](https://huggingface.co/DiscoResearch)
  - [OASST-2](https://huggingface.co/datasets/OpenAssistant/oasst2) (German subset)
  - [Aya-Dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset) (German subset)

 This is the first release of an ongoing open research project for multilingual language models.
 If you want to train a model for your own language or are working on evaluations, please contact us or join our [Discord server](https://discord.gg/wUpvYs4XvM). **We are open for collaborations!**
+*Special thanks go to **[Disco Research](https://huggingface.co/DiscoResearch)**, **[Jan Philipp Harries](https://huggingface.co/jphme)**, and **[Björn Plüster](https://huggingface.co/bjoernp)** for sharing the German dataset with us*
 ### Model details
 ## Dataset
+The training data was split evenly amongst the 5 languages based on the total number of tokens. We would like to thank [Disco Research](https://huggingface.co/DiscoResearch), [Jan Philipp Harries](https://huggingface.co/jphme), and [Björn Plüster](https://huggingface.co/bjoernp) for making their dataset available to us.
 **English and Code**
  - [Open-Hermes-2B](https://huggingface.co/datasets/teknium/OpenHermes-2.5)
 **German**
+ - [DiscoLM German Dataset](https://huggingface.co/DiscoResearch) includes the publicly available [germanrag](https://huggingface.co/datasets/DiscoResearch/germanrag) dataset
  - [OASST-2](https://huggingface.co/datasets/OpenAssistant/oasst2) (German subset)
  - [Aya-Dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset) (German subset)