|
--- |
|
license: apache-2.0 |
|
language: |
|
- de |
|
datasets: |
|
- DEplain/DEplain-APA-doc |
|
metrics: |
|
- sari |
|
- bleu |
|
- bertscore |
|
library_name: transformers |
|
pipeline_tag: text2text-generation |
|
tags: |
|
- text simplification |
|
- plain language |
|
- easy-to-read language |
|
- document simplification |
|
--- |
|
|
|
# DEplain German Text Simplification |
|
|
|
This model belongs to the experiments done at the work of Stodden, Momen, Kallmeyer (2023). ["DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification."](https://arxiv.org/abs/2305.18939) In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada. Association for Computational Linguistics. |
|
Detailed documentation can be found on this GitHub repository [https://github.com/rstodden/DEPlain](https://github.com/rstodden/DEPlain) |
|
|
|
### Model Description |
|
|
|
The model is a finetuned checkpoint of the pre-trained LongmBART model based on `mbart-large-cc25`. With a trimmed vocabulary to the most frequent 30k words in the German language. |
|
|
|
The model was finetuned towards the task of German text simplification of documents. |
|
|
|
The finetuning dataset included manually aligned sentences from the datasets `DEplain-APA-doc` only. |