DEplain
/

trimmed_longmbart_docs_apa

Text2Text Generation

text simplification

easy-to-read language

document simplification

Model card Files Files and versions Community

omarmomen commited on Jul 3, 2023

Commit

ce740be

•

1 Parent(s): 340c8dd

Update README.md

Files changed (1) hide show

README.md +26 -1

README.md CHANGED Viewed

@@ -22,10 +22,35 @@ tags:
 This model belongs to the experiments done at the work of Stodden, Momen, Kallmeyer (2023). ["DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification."](https://arxiv.org/abs/2305.18939) In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada. Association for Computational Linguistics.
 Detailed documentation can be found on this GitHub repository [https://github.com/rstodden/DEPlain](https://github.com/rstodden/DEPlain)
 ### Model Description
 The model is a finetuned checkpoint of the pre-trained LongmBART model based on `mbart-large-cc25`. With a trimmed vocabulary to the most frequent 30k words in the German language.
 The model was finetuned towards the task of German text simplification of documents.
-The finetuning dataset included manually aligned sentences from the datasets `DEplain-APA-doc` only.

 This model belongs to the experiments done at the work of Stodden, Momen, Kallmeyer (2023). ["DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification."](https://arxiv.org/abs/2305.18939) In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada. Association for Computational Linguistics.
 Detailed documentation can be found on this GitHub repository [https://github.com/rstodden/DEPlain](https://github.com/rstodden/DEPlain)
+We reused the codes from [https://github.com/a-rios/ats-models](https://github.com/a-rios/ats-models) to do our experiments.
 ### Model Description
 The model is a finetuned checkpoint of the pre-trained LongmBART model based on `mbart-large-cc25`. With a trimmed vocabulary to the most frequent 30k words in the German language.
 The model was finetuned towards the task of German text simplification of documents.
+The finetuning dataset included manually aligned sentences from the datasets `DEplain-APA-doc` only.
+### Model Usage
+This model can't be used in the HuggingFace interface or via the .from_pretrained method currently.
+To test this model checkpoint, you need to clone the checkpoint repository as follows:
+```
+  # Make sure you have git-lfs installed (https://git-lfs.com)
+  git lfs install
+  git clone https://huggingface.co/DEplain/trimmed_longmbart_docs_apa
+  # if you want to clone without large files – just their pointers
+  # prepend your git clone with the following env var:
+  GIT_LFS_SKIP_SMUDGE=1
+```
+Then set up the conda environment via:
+```
+  conda env create -f environment.yaml
+```
+Then follow the procedure in the notebook `generation.ipynb`.