Philip May commited on
Commit
f14e901
1 Parent(s): 8bdbca6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md CHANGED
@@ -18,6 +18,74 @@ datasets:
18
 
19
  # mT5-small-sum-de-en-v2
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ## License
22
 
23
  Copyright (c) 2021 Philip May, T-Systems on site services GmbH
18
 
19
  # mT5-small-sum-de-en-v2
20
 
21
+
22
+ This is a bilingual summarization model for English and German. It is based on the multilingual T5 model [google/mt5-small](https://huggingface.co/google/mt5-small).
23
+
24
+ ## Training
25
+
26
+ The training was conducted with the following hyperparameters:
27
+
28
+ - base model: [google/mt5-small](https://huggingface.co/google/mt5-small)
29
+ - source_prefix: `"summarize: "`
30
+ - batch size: 3
31
+ - max_source_length: 800
32
+ - max_target_length: 96
33
+ - warmup_ratio: 0.3
34
+ - number of train epochs: 10
35
+ - gradient accumulation steps: 2
36
+ - learning rate: 5e-5
37
+
38
+ ## Datasets and Preprocessing
39
+
40
+ The datasets were preprocessed as follows:
41
+
42
+ The summary was tokenized with the [google/mt5-small](https://huggingface.co/google/mt5-small) tokenizer. Then only the records with no more than 94 summary tokens were selected.
43
+
44
+ The MLSUM dataset has a special characteristic. In the text, the summary is often included completely as one or more sentences. These have been removed from the texts. The reason is that we do not want to train a model that ultimately extracts only sentences as a summary.
45
+
46
+ This model is trained on the following datasets:
47
+
48
+ | Name | Language | Size | License
49
+ |------|----------|------|--------
50
+ | [CNN Daily - Train](https://github.com/abisee/cnn-dailymail) | en | 218,223 | The license is unclear. The data comes from CNN and Daily Mail. We assume that it may only be used for research purposes and not commercially.
51
+ | [Extreme Summarization (XSum) - Train](https://github.com/EdinburghNLP/XSum) | en | 204,005 | The license is unclear. The data comes from BBC. We assume that it may only be used for research purposes and not commercially.
52
+ | [MLSUM German - Train](https://github.com/ThomasScialom/MLSUM) | de | 218,043 | Usage of dataset is restricted to non-commercial research purposes only. Copyright belongs to the original copyright holders (see [here](https://github.com/ThomasScialom/MLSUM#mlsum)).
53
+ | [SwissText 2019 - Train](https://www.swisstext.org/2019/shared-task/german-text-summarization-challenge.html) | de | 84,564 | The license is unclear. The data was published in the [German Text Summarization Challenge](https://www.swisstext.org/2019/shared-task/german-text-summarization-challenge.html). We assume that they may be used for research purposes and not commercially.
54
+
55
+ | Language | Size
56
+ |------|------
57
+ | German | xxx
58
+ | English | xxx
59
+ | Total | xxx
60
+
61
+ ## Evaluation on MLSUM German Test Set (no beams)
62
+
63
+ | Model | rouge1 | rouge2 | rougeL | rougeLsum
64
+ |-------|--------|--------|--------|----------
65
+ | [ml6team/mt5-small-german-finetune-mlsum](https://huggingface.co/ml6team/mt5-small-german-finetune-mlsum) | 18.3607 | 5.3604 | 14.5456 | 16.1946
66
+ | deutsche-telekom/mT5-small-sum-de-en-01 | 21.7336 | 7.2614 | 17.1323 | 19.3977
67
+
68
+ ## Evaluation on CNN Daily English Test Set (no beams)
69
+
70
+ | Model | rouge1 | rouge2 | rougeL | rougeLsum
71
+ |-------|--------|--------|--------|----------
72
+ | [sshleifer/distilbart-xsum-12-6](https://huggingface.co/sshleifer/distilbart-xsum-12-6) | 26.7664 | 8.8243 | 18.3703 | 23.2614
73
+ | [facebook/bart-large-xsum](https://huggingface.co/facebook/bart-large-xsum) | 28.5374 | 9.8565 | 19.4829 | 24.7364
74
+ | [mrm8488/t5-base-finetuned-summarize-news](https://huggingface.co/mrm8488/t5-base-finetuned-summarize-news) | 37.576 | 14.7389 | 24.0254 | 34.4634
75
+ | deutsche-telekom/mT5-small-sum-de-en-01 | 37.6339 | 16.5317 | 27.1418 | 34.9951
76
+
77
+
78
+ ## Evaluation on Extreme Summarization (XSum) English Test Set (no beams)
79
+
80
+ | Model | rouge1 | rouge2 | rougeL | rougeLsum
81
+ |-------|--------|--------|--------|----------
82
+ | [mrm8488/t5-base-finetuned-summarize-news](https://huggingface.co/mrm8488/t5-base-finetuned-summarize-news) | 18.6204 | 3.535 | 12.3997 | 15.2111
83
+ | [facebook/bart-large-xsum](https://huggingface.co/facebook/bart-large-xsum) | 28.5374 | 9.8565 | 19.4829 | 24.7364
84
+ | deutsche-telekom/mT5-small-sum-de-en-01 | 32.3416 | 10.6191 | 25.3799 | 25.3908
85
+ | [sshleifer/distilbart-xsum-12-6](https://huggingface.co/sshleifer/distilbart-xsum-12-6) | 44.2553 ♣ | 21.4289 ♣ | 36.2639 ♣ | 36.2696 ♣
86
+
87
+ ♣: These values seem to be unusually high. It could be that the test set was used in the training data.
88
+
89
  ## License
90
 
91
  Copyright (c) 2021 Philip May, T-Systems on site services GmbH