fixes in README
Browse files
README.md
CHANGED
@@ -101,12 +101,14 @@ As the model was trained on news articles from the time range 2015-2021, further
|
|
101 |
|
102 |
The model was evaluated on a held-out test set consisting of 890 article-headline pairs.
|
103 |
|
|
|
|
|
104 |
### Quantitative
|
105 |
|
106 |
| model | Rouge1 | Rouge2 | RougeL | RougeLsum |
|
107 |
|-|-|-|-|-|
|
108 |
| [T-Systems-onsite/mt5-small-sum-de-en-v2](https://huggingface.co/T-Systems-onsite/mt5-small-sum-de-en-v2)| 0.107 | 0.0297 | 0.098 | 0.098 |
|
109 |
-
|
|
110 |
|
111 |
For evaluating the factuality of the generated headlines concerning the input text, we use 3 state-of-the-art metrics for summary evaluation (the parameters were chosen according to the recommendations from the respective papers or GitHub repositories):
|
112 |
|
@@ -135,7 +137,7 @@ Each metric is calculated for all article-headline pairs in the test set and the
|
|
135 |
| model | SummacCZ | QAFactEval | DAE |
|
136 |
|-|-|-|-|
|
137 |
| [T-Systems-onsite/mt5-small-sum-de-en-v2](https://huggingface.co/T-Systems-onsite/mt5-small-sum-de-en-v2) | 0.6969 | 3.3023 | 0.8292 |
|
138 |
-
|
|
139 |
|
140 |
It can be observed that our model scores consistently lower than the T-Systems one. Following human evaluation, it seems that to match the structure and style specific to headlines the headline generation model has to be more abstractive than a model for summarization which leads to a higher frequency of hallucinations in the generated output.
|
141 |
|
|
|
101 |
|
102 |
The model was evaluated on a held-out test set consisting of 890 article-headline pairs.
|
103 |
|
104 |
+
For each model the headlines were generated using beam search with a beam width of 5.
|
105 |
+
|
106 |
### Quantitative
|
107 |
|
108 |
| model | Rouge1 | Rouge2 | RougeL | RougeLsum |
|
109 |
|-|-|-|-|-|
|
110 |
| [T-Systems-onsite/mt5-small-sum-de-en-v2](https://huggingface.co/T-Systems-onsite/mt5-small-sum-de-en-v2)| 0.107 | 0.0297 | 0.098 | 0.098 |
|
111 |
+
| aiautomationlab/german-news-title-gen-mt5 | 0.3131 | 0.0873 | 0.1997 | 0.1997 |
|
112 |
|
113 |
For evaluating the factuality of the generated headlines concerning the input text, we use 3 state-of-the-art metrics for summary evaluation (the parameters were chosen according to the recommendations from the respective papers or GitHub repositories):
|
114 |
|
|
|
137 |
| model | SummacCZ | QAFactEval | DAE |
|
138 |
|-|-|-|-|
|
139 |
| [T-Systems-onsite/mt5-small-sum-de-en-v2](https://huggingface.co/T-Systems-onsite/mt5-small-sum-de-en-v2) | 0.6969 | 3.3023 | 0.8292 |
|
140 |
+
| aiautomationlab/german-news-title-gen-mt5 | 0.4419 | 1.9265 | 0.7438 |
|
141 |
|
142 |
It can be observed that our model scores consistently lower than the T-Systems one. Following human evaluation, it seems that to match the structure and style specific to headlines the headline generation model has to be more abstractive than a model for summarization which leads to a higher frequency of hallucinations in the generated output.
|
143 |
|