google
/

t5_11b_trueteacher_and_anli

Text2Text Generation

natural-language-inference

news-articles-summarization

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

zorik commited on Aug 14, 2023

Commit

b24b964

·

1 Parent(s): c7e04e7

Update README.md

Files changed (1) hide show

README.md +18 -2

README.md CHANGED Viewed

@@ -2,8 +2,12 @@
 license: cc-by-nc-4.0
 ---
 This is a **Factual Consistency Evaluation** model, introduced in the [TrueTeacher paper (Gekhman et al, 2023)](https://arxiv.org/pdf/2305.11171.pdf).
 The model is optimized for evaluating factual consistency in **summarization**.
 It is the main model from the paper (see "T5-11B w. ANLI + TrueTeacher full" in Table 1) which is based on a **T5-11B** [(Raffel
@@ -18,8 +22,20 @@ To accomodate the input length of common summarization datasets we recommend set
 The model predicts a binary label ('1' - Factualy Consistent, '0' - Factualy Inconsistent).
-## Usage example - classification:
 ```python
 from transformers import T5ForConditionalGeneration
 from transformers import T5Tokenizer
@@ -43,7 +59,7 @@ for hypothesis, expected in [('the sun is out in the sky', '1'),
   print(f'result: {result} (expected: {expected})\n')
 ```
-## Usage example - scoring:
 ```python
 from transformers import T5ForConditionalGeneration
 from transformers import T5Tokenizer

 license: cc-by-nc-4.0
 ---
+# **TrueTeacher**
 This is a **Factual Consistency Evaluation** model, introduced in the [TrueTeacher paper (Gekhman et al, 2023)](https://arxiv.org/pdf/2305.11171.pdf).
+## Model Details
 The model is optimized for evaluating factual consistency in **summarization**.
 It is the main model from the paper (see "T5-11B w. ANLI + TrueTeacher full" in Table 1) which is based on a **T5-11B** [(Raffel
 The model predicts a binary label ('1' - Factualy Consistent, '0' - Factualy Inconsistent).
+## Evaluation results
+This model achieves the following ROC AUC results on the summarization subset of the [TRUE benchmark (Honovich et al, 2022)](https://arxiv.org/pdf/2204.04991.pdf):
+| **MNBM** | **QAGS-X** | **FRANK** | **SummEval** | **QAGS-C** | **Average** |
+|----------|-----------|-----------|--------------|-----------|-------------|
+| 78.1     | 89.4      | 93.6      | 88.5         | 89.4      | 87.8        |
+## Usage examples
+#### classification
 ```python
 from transformers import T5ForConditionalGeneration
 from transformers import T5Tokenizer
   print(f'result: {result} (expected: {expected})\n')
 ```
+#### scoring
 ```python
 from transformers import T5ForConditionalGeneration
 from transformers import T5Tokenizer