File size: 1,308 Bytes
7aa12f7
de02017
a453dd5
de02017
 
 
 
 
 
 
 
 
a453dd5
 
 
 
88b4b4f
7aa12f7
de02017
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
---
datasets:
- DEplain/DEplain-APA-sent
- DEplain/DEplain-web-sent
language:
- de
metrics:
- sari
- bleu
- bertscore
library_name: transformers
pipeline_tag: text2text-generation
tags:
  - text simplification
  - plain language
  - easy-to-read language
  - sentence simplification
---

# DEplain German Text Simplification

This model belongs to the experiments done at the work of Stodden, Momen, Kallmeyer (2023). ["DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification."](https://arxiv.org/abs/2305.18939) In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada. Association for Computational Linguistics. 
Detailed documentation can be found on this GitHub repository [https://github.com/rstodden/DEPlain](https://github.com/rstodden/DEPlain)

### Model Description

The model is a finetuned checkpoint of the pre-trained mBART model `mbart-large-cc25`. With a trimmed vocabulary to the most frequent 30k words in the German language. 

The model was finetuned towards the task of German text simplification of sentences.

The finetuning dataset included manually aligned sentences from the datasets `DEplain-APA-sent` and `DEplain-web-sent-manual-open`