Update README.md
Browse files
README.md
CHANGED
@@ -2,8 +2,12 @@
|
|
2 |
license: cc-by-nc-4.0
|
3 |
---
|
4 |
|
|
|
|
|
5 |
This is a **Factual Consistency Evaluation** model, introduced in the [TrueTeacher paper (Gekhman et al, 2023)](https://arxiv.org/pdf/2305.11171.pdf).
|
6 |
|
|
|
|
|
7 |
The model is optimized for evaluating factual consistency in **summarization**.
|
8 |
|
9 |
It is the main model from the paper (see "T5-11B w. ANLI + TrueTeacher full" in Table 1) which is based on a **T5-11B** [(Raffel
|
@@ -18,8 +22,20 @@ To accomodate the input length of common summarization datasets we recommend set
|
|
18 |
|
19 |
The model predicts a binary label ('1' - Factualy Consistent, '0' - Factualy Inconsistent).
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
-
|
23 |
```python
|
24 |
from transformers import T5ForConditionalGeneration
|
25 |
from transformers import T5Tokenizer
|
@@ -43,7 +59,7 @@ for hypothesis, expected in [('the sun is out in the sky', '1'),
|
|
43 |
print(f'result: {result} (expected: {expected})\n')
|
44 |
```
|
45 |
|
46 |
-
|
47 |
```python
|
48 |
from transformers import T5ForConditionalGeneration
|
49 |
from transformers import T5Tokenizer
|
|
|
2 |
license: cc-by-nc-4.0
|
3 |
---
|
4 |
|
5 |
+
# **TrueTeacher**
|
6 |
+
|
7 |
This is a **Factual Consistency Evaluation** model, introduced in the [TrueTeacher paper (Gekhman et al, 2023)](https://arxiv.org/pdf/2305.11171.pdf).
|
8 |
|
9 |
+
## Model Details
|
10 |
+
|
11 |
The model is optimized for evaluating factual consistency in **summarization**.
|
12 |
|
13 |
It is the main model from the paper (see "T5-11B w. ANLI + TrueTeacher full" in Table 1) which is based on a **T5-11B** [(Raffel
|
|
|
22 |
|
23 |
The model predicts a binary label ('1' - Factualy Consistent, '0' - Factualy Inconsistent).
|
24 |
|
25 |
+
## Evaluation results
|
26 |
+
|
27 |
+
This model achieves the following ROC AUC results on the summarization subset of the [TRUE benchmark (Honovich et al, 2022)](https://arxiv.org/pdf/2204.04991.pdf):
|
28 |
+
|
29 |
+
| **MNBM** | **QAGS-X** | **FRANK** | **SummEval** | **QAGS-C** | **Average** |
|
30 |
+
|----------|-----------|-----------|--------------|-----------|-------------|
|
31 |
+
| 78.1 | 89.4 | 93.6 | 88.5 | 89.4 | 87.8 |
|
32 |
+
|
33 |
+
|
34 |
+
|
35 |
+
|
36 |
+
## Usage examples
|
37 |
|
38 |
+
#### classification
|
39 |
```python
|
40 |
from transformers import T5ForConditionalGeneration
|
41 |
from transformers import T5Tokenizer
|
|
|
59 |
print(f'result: {result} (expected: {expected})\n')
|
60 |
```
|
61 |
|
62 |
+
#### scoring
|
63 |
```python
|
64 |
from transformers import T5ForConditionalGeneration
|
65 |
from transformers import T5Tokenizer
|