File size: 1,459 Bytes
31d0e95 a92e8ba 31d0e95 5345dfc 31d0e95 5345dfc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
---
license: openrail++
language:
- de
pipeline_tag: text-generation
---
# GPT2 model for German Leichte Sprache (Easy language)
A German Leichte Sprache (Easy language) model based on [german-gpt2](https://huggingface.co/dbmdz/german-gpt2).
See our code here: [https://github.com/MiriUll/Simple-German-language-model](https://github.com/MiriUll/Simple-German-language-model)
See our paper here: [Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training](https://aclanthology.org/2023.findings-acl.74/)
## Dataset
This model was fine-tuned on a collection of monolingual Leichte Sprache data. This corpus can be recreated [here](https://github.com/brjezierski/scrapers).
## Citation
If you use this model, please cite our paper:
@inproceedings{anschutz-etal-2023-language,
  title = "Language Models for {G}erman Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training",
  author = {Ansch{\"u}tz, Miriam and Oehms, Joshua and Wimmer, Thomas and Jezierski, Bart{\l}omiej and Groh, Georg},
  booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
  month = jul,
  year = "2023",
  address = "Toronto, Canada",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2023.findings-acl.74",
  pages = "1147--1158",
} |