omarmomen commited on
Commit
ce740be
1 Parent(s): 340c8dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -22,10 +22,35 @@ tags:
22
  This model belongs to the experiments done at the work of Stodden, Momen, Kallmeyer (2023). ["DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification."](https://arxiv.org/abs/2305.18939) In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada. Association for Computational Linguistics.
23
  Detailed documentation can be found on this GitHub repository [https://github.com/rstodden/DEPlain](https://github.com/rstodden/DEPlain)
24
 
 
 
25
  ### Model Description
26
 
27
  The model is a finetuned checkpoint of the pre-trained LongmBART model based on `mbart-large-cc25`. With a trimmed vocabulary to the most frequent 30k words in the German language.
28
 
29
  The model was finetuned towards the task of German text simplification of documents.
30
 
31
- The finetuning dataset included manually aligned sentences from the datasets `DEplain-APA-doc` only.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  This model belongs to the experiments done at the work of Stodden, Momen, Kallmeyer (2023). ["DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification."](https://arxiv.org/abs/2305.18939) In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada. Association for Computational Linguistics.
23
  Detailed documentation can be found on this GitHub repository [https://github.com/rstodden/DEPlain](https://github.com/rstodden/DEPlain)
24
 
25
+ We reused the codes from [https://github.com/a-rios/ats-models](https://github.com/a-rios/ats-models) to do our experiments.
26
+
27
  ### Model Description
28
 
29
  The model is a finetuned checkpoint of the pre-trained LongmBART model based on `mbart-large-cc25`. With a trimmed vocabulary to the most frequent 30k words in the German language.
30
 
31
  The model was finetuned towards the task of German text simplification of documents.
32
 
33
+ The finetuning dataset included manually aligned sentences from the datasets `DEplain-APA-doc` only.
34
+
35
+ ### Model Usage
36
+
37
+ This model can't be used in the HuggingFace interface or via the .from_pretrained method currently.
38
+
39
+ To test this model checkpoint, you need to clone the checkpoint repository as follows:
40
+
41
+ ```
42
+ # Make sure you have git-lfs installed (https://git-lfs.com)
43
+ git lfs install
44
+ git clone https://huggingface.co/DEplain/trimmed_longmbart_docs_apa
45
+
46
+ # if you want to clone without large files – just their pointers
47
+ # prepend your git clone with the following env var:
48
+ GIT_LFS_SKIP_SMUDGE=1
49
+ ```
50
+
51
+ Then set up the conda environment via:
52
+ ```
53
+ conda env create -f environment.yaml
54
+ ```
55
+
56
+ Then follow the procedure in the notebook `generation.ipynb`.