DEplain
/

trimmed_longmbart_docs_apa

Text2Text Generation

text simplification

easy-to-read language

document simplification

Model card Files Files and versions Community

trimmed_longmbart_docs_apa / README.md

rstodden's picture

Update README.md

becd7c8 over 1 year ago

|

1.28 kB

	---
	license: apache-2.0
	language:
	- de
	datasets:
	- DEplain/DEplain-APA-doc
	metrics:
	- sari
	- bleu
	- bertscore
	library_name: transformers
	pipeline_tag: text2text-generation
	tags:
	- text simplification
	- plain language
	- easy-to-read language
	- document simplification
	---

	# DEplain German Text Simplification

	This model belongs to the experiments done at the work of Stodden, Momen, Kallmeyer (2023). ["DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification."](https://arxiv.org/abs/2305.18939) In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada. Association for Computational Linguistics.
	Detailed documentation can be found on this GitHub repository [https://github.com/rstodden/DEPlain](https://github.com/rstodden/DEPlain)

	### Model Description

	The model is a finetuned checkpoint of the pre-trained LongmBART model based on `mbart-large-cc25`. With a trimmed vocabulary to the most frequent 30k words in the German language.

	The model was finetuned towards the task of German text simplification of documents.

	The finetuning dataset included manually aligned sentences from the datasets `DEplain-APA-doc` only.