File size: 1,469 Bytes
37d9dc6
9c97dc7
37d9dc6
 
 
 
 
 
 
 
4b1cf6b
ef46750
37d9dc6
 
 
 
 
 
ef46750
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
license: openrail++
language:
- de
pipeline_tag: text-generation
---
# GPT2 model for German Leichte Sprache (Easy language)
A German Leichte Sprache (Easy language) model based on [mGPT](https://huggingface.co/sberbank-ai/mGPT).


See our code here: [https://github.com/MiriUll/Language-Models-German-Simplification](https://github.com/MiriUll/Language-Models-German-Simplification)  
See our paper here: [Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training](https://aclanthology.org/2023.findings-acl.74/)

## Dataset
This model was fine-tuned on a collection of monolingual Leichte Sprache data. This corpus can be recreated [here](https://github.com/brjezierski/scrapers).

## Citation
If you use this model, please cite our paper:  
@inproceedings{anschutz-etal-2023-language,  
   title = "Language Models for {G}erman Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training",  
   author = {Ansch{\"u}tz, Miriam  and Oehms, Joshua  and Wimmer, Thomas  and Jezierski, Bart{\l}omiej  and Groh, Georg},  
   booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",  
   month = jul,  
   year = "2023",  
   address = "Toronto, Canada",  
   publisher = "Association for Computational Linguistics",  
   url = "https://aclanthology.org/2023.findings-acl.74",  
   pages = "1147--1158",  
}