zorik commited on
Commit
b24b964
·
1 Parent(s): c7e04e7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -2
README.md CHANGED
@@ -2,8 +2,12 @@
2
  license: cc-by-nc-4.0
3
  ---
4
 
 
 
5
  This is a **Factual Consistency Evaluation** model, introduced in the [TrueTeacher paper (Gekhman et al, 2023)](https://arxiv.org/pdf/2305.11171.pdf).
6
 
 
 
7
  The model is optimized for evaluating factual consistency in **summarization**.
8
 
9
  It is the main model from the paper (see "T5-11B w. ANLI + TrueTeacher full" in Table 1) which is based on a **T5-11B** [(Raffel
@@ -18,8 +22,20 @@ To accomodate the input length of common summarization datasets we recommend set
18
 
19
  The model predicts a binary label ('1' - Factualy Consistent, '0' - Factualy Inconsistent).
20
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
- ## Usage example - classification:
23
  ```python
24
  from transformers import T5ForConditionalGeneration
25
  from transformers import T5Tokenizer
@@ -43,7 +59,7 @@ for hypothesis, expected in [('the sun is out in the sky', '1'),
43
  print(f'result: {result} (expected: {expected})\n')
44
  ```
45
 
46
- ## Usage example - scoring:
47
  ```python
48
  from transformers import T5ForConditionalGeneration
49
  from transformers import T5Tokenizer
 
2
  license: cc-by-nc-4.0
3
  ---
4
 
5
+ # **TrueTeacher**
6
+
7
  This is a **Factual Consistency Evaluation** model, introduced in the [TrueTeacher paper (Gekhman et al, 2023)](https://arxiv.org/pdf/2305.11171.pdf).
8
 
9
+ ## Model Details
10
+
11
  The model is optimized for evaluating factual consistency in **summarization**.
12
 
13
  It is the main model from the paper (see "T5-11B w. ANLI + TrueTeacher full" in Table 1) which is based on a **T5-11B** [(Raffel
 
22
 
23
  The model predicts a binary label ('1' - Factualy Consistent, '0' - Factualy Inconsistent).
24
 
25
+ ## Evaluation results
26
+
27
+ This model achieves the following ROC AUC results on the summarization subset of the [TRUE benchmark (Honovich et al, 2022)](https://arxiv.org/pdf/2204.04991.pdf):
28
+
29
+ | **MNBM** | **QAGS-X** | **FRANK** | **SummEval** | **QAGS-C** | **Average** |
30
+ |----------|-----------|-----------|--------------|-----------|-------------|
31
+ | 78.1 | 89.4 | 93.6 | 88.5 | 89.4 | 87.8 |
32
+
33
+
34
+
35
+
36
+ ## Usage examples
37
 
38
+ #### classification
39
  ```python
40
  from transformers import T5ForConditionalGeneration
41
  from transformers import T5Tokenizer
 
59
  print(f'result: {result} (expected: {expected})\n')
60
  ```
61
 
62
+ #### scoring
63
  ```python
64
  from transformers import T5ForConditionalGeneration
65
  from transformers import T5Tokenizer