This is a German summarization model. It is based on the multilingual T5 model google/mt5-small. The special characteristic of this model is that, unlike many other models, it is licensed under a permissive open source license (MIT). Among other things, this license allows commercial use.
The training was conducted with the following hyperparameters:
- base model: google/mt5-small
- batch size: 3 (6)
- max_source_length: 800
- max_target_length: 96
- warmup_ratio: 0.3
- number of train epochs: 10
- gradient accumulation steps: 2
- learning rate: 5e-5
The datasets were preprocessed as follows:
The summary was tokenized with the google/mt5-small tokenizer. Then only the records with no more than 94 summary tokens were selected.
This model is trained on the following dataset:
|SwissText 2019 - Train||de||84,564||Concrete license is unclear. The data was published in the German Text Summarization Challenge.|
We have permission to use the Swisstext dataset and release the resulting summarization model under MIT license (see permission-declaration-swisstext.pdf).
Copyright (c) 2021 Philip May, Deutsche Telekom AG
Licensed under the MIT License (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License by reviewing the file LICENSE in the repository.
- Downloads last month