File size: 2,092 Bytes
e2100e8
ac7cefd
e2100e8
07ea877
 
8da597c
becd7c8
8da597c
 
 
 
 
 
becd7c8
 
 
 
 
8da597c
 
 
 
 
 
 
ce740be
 
8da597c
 
 
 
 
 
ce740be
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
inference: false
license: apache-2.0
language:
- de
datasets:
- DEplain/DEplain-APA-doc
metrics:
- sari
- bleu
- bertscore
library_name: transformers
pipeline_tag: text2text-generation
tags:
  - text simplification
  - plain language
  - easy-to-read language
  - document simplification
---

# DEplain German Text Simplification

This model belongs to the experiments done at the work of Stodden, Momen, Kallmeyer (2023). ["DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification."](https://arxiv.org/abs/2305.18939) In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada. Association for Computational Linguistics. 
Detailed documentation can be found on this GitHub repository [https://github.com/rstodden/DEPlain](https://github.com/rstodden/DEPlain)

We reused the codes from [https://github.com/a-rios/ats-models](https://github.com/a-rios/ats-models) to do our experiments.

### Model Description

The model is a finetuned checkpoint of the pre-trained LongmBART model based on `mbart-large-cc25`. With a trimmed vocabulary to the most frequent 30k words in the German language. 

The model was finetuned towards the task of German text simplification of documents.

The finetuning dataset included manually aligned sentences from the datasets `DEplain-APA-doc` only.

### Model Usage

This model can't be used in the HuggingFace interface or via the .from_pretrained method currently.

To test this model checkpoint, you need to clone the checkpoint repository as follows:

```
  # Make sure you have git-lfs installed (https://git-lfs.com)
  git lfs install
  git clone https://huggingface.co/DEplain/trimmed_longmbart_docs_apa
  
  # if you want to clone without large files – just their pointers
  # prepend your git clone with the following env var:
  GIT_LFS_SKIP_SMUDGE=1
```

Then set up the conda environment via:
```
  conda env create -f environment.yaml
```

Then follow the procedure in the notebook `generation.ipynb`.