lnetze commited on
Commit
55ce960
1 Parent(s): 93ec81d

fixes in README

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -101,12 +101,14 @@ As the model was trained on news articles from the time range 2015-2021, further
101
 
102
  The model was evaluated on a held-out test set consisting of 890 article-headline pairs.
103
 
 
 
104
  ### Quantitative
105
 
106
  | model | Rouge1 | Rouge2 | RougeL | RougeLsum |
107
  |-|-|-|-|-|
108
  | [T-Systems-onsite/mt5-small-sum-de-en-v2](https://huggingface.co/T-Systems-onsite/mt5-small-sum-de-en-v2)| 0.107 | 0.0297 | 0.098 | 0.098 |
109
- | our-model | 0.3131 | 0.0873 | 0.1997 | 0.1997 |
110
 
111
  For evaluating the factuality of the generated headlines concerning the input text, we use 3 state-of-the-art metrics for summary evaluation (the parameters were chosen according to the recommendations from the respective papers or GitHub repositories):
112
 
@@ -135,7 +137,7 @@ Each metric is calculated for all article-headline pairs in the test set and the
135
  | model | SummacCZ | QAFactEval | DAE |
136
  |-|-|-|-|
137
  | [T-Systems-onsite/mt5-small-sum-de-en-v2](https://huggingface.co/T-Systems-onsite/mt5-small-sum-de-en-v2) | 0.6969 | 3.3023 | 0.8292 |
138
- | our-model | 0.4419 | 1.9265 | 0.7438 |
139
 
140
  It can be observed that our model scores consistently lower than the T-Systems one. Following human evaluation, it seems that to match the structure and style specific to headlines the headline generation model has to be more abstractive than a model for summarization which leads to a higher frequency of hallucinations in the generated output.
141
 
 
101
 
102
  The model was evaluated on a held-out test set consisting of 890 article-headline pairs.
103
 
104
+ For each model the headlines were generated using beam search with a beam width of 5.
105
+
106
  ### Quantitative
107
 
108
  | model | Rouge1 | Rouge2 | RougeL | RougeLsum |
109
  |-|-|-|-|-|
110
  | [T-Systems-onsite/mt5-small-sum-de-en-v2](https://huggingface.co/T-Systems-onsite/mt5-small-sum-de-en-v2)| 0.107 | 0.0297 | 0.098 | 0.098 |
111
+ | aiautomationlab/german-news-title-gen-mt5 | 0.3131 | 0.0873 | 0.1997 | 0.1997 |
112
 
113
  For evaluating the factuality of the generated headlines concerning the input text, we use 3 state-of-the-art metrics for summary evaluation (the parameters were chosen according to the recommendations from the respective papers or GitHub repositories):
114
 
 
137
  | model | SummacCZ | QAFactEval | DAE |
138
  |-|-|-|-|
139
  | [T-Systems-onsite/mt5-small-sum-de-en-v2](https://huggingface.co/T-Systems-onsite/mt5-small-sum-de-en-v2) | 0.6969 | 3.3023 | 0.8292 |
140
+ | aiautomationlab/german-news-title-gen-mt5 | 0.4419 | 1.9265 | 0.7438 |
141
 
142
  It can be observed that our model scores consistently lower than the T-Systems one. Following human evaluation, it seems that to match the structure and style specific to headlines the headline generation model has to be more abstractive than a model for summarization which leads to a higher frequency of hallucinations in the generated output.
143