aiautomationlab
/

german-news-title-gen-mt5

text2text-generation

arxiv:2005.00661

arxiv:2111.09525

arxiv:2112.08542

arxiv:2109.09209

Inference Endpoints

Model card Files Files and versions Community

lnetze commited on Oct 26, 2022

Commit

55ce960

•

1 Parent(s): 93ec81d

fixes in README

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -101,12 +101,14 @@ As the model was trained on news articles from the time range 2015-2021, further
 The model was evaluated on a held-out test set consisting of 890 article-headline pairs.
 ### Quantitative
 | model | Rouge1 | Rouge2 | RougeL | RougeLsum |
 |-|-|-|-|-|
 | [T-Systems-onsite/mt5-small-sum-de-en-v2](https://huggingface.co/T-Systems-onsite/mt5-small-sum-de-en-v2)| 0.107 | 0.0297 | 0.098 | 0.098 |
-| our-model | 0.3131 | 0.0873 | 0.1997 | 0.1997 |
 For evaluating the factuality of the generated headlines concerning the input text, we use 3 state-of-the-art metrics for summary evaluation (the parameters were chosen according to the recommendations from the respective papers or GitHub repositories):
@@ -135,7 +137,7 @@ Each metric is calculated for all article-headline pairs in the test set and the
 | model | SummacCZ | QAFactEval | DAE |
 |-|-|-|-|
 | [T-Systems-onsite/mt5-small-sum-de-en-v2](https://huggingface.co/T-Systems-onsite/mt5-small-sum-de-en-v2) | 0.6969 | 3.3023 | 0.8292 |
-| our-model | 0.4419 | 1.9265 | 0.7438 |
 It can be observed that our model scores consistently lower than the T-Systems one. Following human evaluation, it seems that to match the structure and style specific to headlines the headline generation model has to be more abstractive than a model for summarization which leads to a higher frequency of hallucinations in the generated output.

 The model was evaluated on a held-out test set consisting of 890 article-headline pairs.
+For each model the headlines were generated using beam search with a beam width of 5.
 ### Quantitative
 | model | Rouge1 | Rouge2 | RougeL | RougeLsum |
 |-|-|-|-|-|
 | [T-Systems-onsite/mt5-small-sum-de-en-v2](https://huggingface.co/T-Systems-onsite/mt5-small-sum-de-en-v2)| 0.107 | 0.0297 | 0.098 | 0.098 |
+| aiautomationlab/german-news-title-gen-mt5 | 0.3131 | 0.0873 | 0.1997 | 0.1997 |
 For evaluating the factuality of the generated headlines concerning the input text, we use 3 state-of-the-art metrics for summary evaluation (the parameters were chosen according to the recommendations from the respective papers or GitHub repositories):
 | model | SummacCZ | QAFactEval | DAE |
 |-|-|-|-|
 | [T-Systems-onsite/mt5-small-sum-de-en-v2](https://huggingface.co/T-Systems-onsite/mt5-small-sum-de-en-v2) | 0.6969 | 3.3023 | 0.8292 |
+| aiautomationlab/german-news-title-gen-mt5 | 0.4419 | 1.9265 | 0.7438 |
 It can be observed that our model scores consistently lower than the T-Systems one. Following human evaluation, it seems that to match the structure and style specific to headlines the headline generation model has to be more abstractive than a model for summarization which leads to a higher frequency of hallucinations in the generated output.