gonglinyuan commited on
Commit
3d2a4e4
1 Parent(s): f18db2e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +169 -0
README.md ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - t5
7
+ model-index:
8
+ - name: metro_t0p_basepp
9
+ results:
10
+ - task:
11
+ type: natural-language-inference
12
+ dataset:
13
+ type: super_glue
14
+ name: RTE
15
+ config: rte
16
+ split: validation
17
+ metrics:
18
+ - type: accuracy
19
+ value: 71.44404332129963
20
+ - task:
21
+ type: natural-language-inference
22
+ dataset:
23
+ type: super_glue
24
+ name: CB
25
+ config: cb
26
+ split: validation
27
+ metrics:
28
+ - type: accuracy
29
+ value: 60.714285714285715
30
+ - task:
31
+ type: natural-language-inference
32
+ dataset:
33
+ type: anli
34
+ name: ANLI R1
35
+ split: dev_r1
36
+ metrics:
37
+ - type: accuracy
38
+ value: 36.906666666666666
39
+ - task:
40
+ type: natural-language-inference
41
+ dataset:
42
+ type: anli
43
+ name: ANLI R2
44
+ split: dev_r2
45
+ metrics:
46
+ - type: accuracy
47
+ value: 35.24
48
+ - task:
49
+ type: natural-language-inference
50
+ dataset:
51
+ type: anli
52
+ name: ANLI R3
53
+ split: dev_r3
54
+ metrics:
55
+ - type: accuracy
56
+ value: 36.46666666666666
57
+ - task:
58
+ type: coreference-resolution
59
+ dataset:
60
+ type: super_glue
61
+ name: WSC
62
+ config: wsc.fixed
63
+ split: validation
64
+ metrics:
65
+ - type: accuracy
66
+ value: 62.21153846153847
67
+ - task:
68
+ type: coreference-resolution
69
+ dataset:
70
+ type: winogrande
71
+ name: Winogrande XL
72
+ config: winogrande_xl
73
+ split: validation
74
+ metrics:
75
+ - type: accuracy
76
+ value: 54.08050513022889
77
+ - task:
78
+ type: multiple-choice-qa
79
+ dataset:
80
+ type: super_glue
81
+ name: COPA
82
+ config: copa
83
+ split: validation
84
+ metrics:
85
+ - type: accuracy
86
+ value: 78.875
87
+ - task:
88
+ type: multiple-choice-qa
89
+ dataset:
90
+ type: story_cloze
91
+ name: StoryCloze 2016
92
+ config: '2016'
93
+ split: validation
94
+ metrics:
95
+ - type: accuracy
96
+ value: 90.29396044895778
97
+ - task:
98
+ type: multiple-choice-qa
99
+ dataset:
100
+ type: hellaswag
101
+ name: HellaSwag
102
+ split: validation
103
+ metrics:
104
+ - type: accuracy
105
+ value: 67.56871141206932
106
+ - task:
107
+ type: word-sense-disambiguation
108
+ dataset:
109
+ type: super_glue
110
+ name: WiC
111
+ config: wic
112
+ split: validation
113
+ metrics:
114
+ - type: accuracy
115
+ value: 51.5987460815047
116
+ ---
117
+
118
+ Official repository: https://github.com/gonglinyuan/metro_t0
119
+
120
+ # METRO-T0
121
+
122
+ Paper: Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers (TODO) (ACL 2023)
123
+
124
+ METRO-T0 is a T5-style text-to-text Transformer pretrained using model-generated pretraining signals, prompt-finetuned on a family of public NLP tasks proposed in [T0](https://arxiv.org/abs/2110.08207).
125
+ METRO-T0 is highly parameter efficient. For example, METRO-T0-Large++ (775M parameters) outperforms GPT-3 (175B parameters) and T0-3B (3B parameters) on a wide range of NLP tasks.
126
+
127
+ ![The architecture of METRO-T0 during pretraining using BERT as the auxiliary model to generate signals](https://github.com/gonglinyuan/metro_t0/raw/main/assets/metro_t0_method.png)
128
+
129
+ ![Prompt learning results of METRO-T0 versus our T0 baseline and T03B by Sanh et al. (2022) on 4 tasks in the T0 Eval benchmark. Each point denotes the accuracy using one prompt template, except that the median accuracy over all templates of T03B is indicated by the blue point. The plots of other tasks are in our paper.](https://github.com/gonglinyuan/metro_t0/raw/main/assets/metro_t0_selected_results.png)
130
+
131
+ ## Use METRO-T0+-Base++
132
+
133
+ To use METRO-T0+-Base++ in PyTorch (Python 3.7+, PyTorch 1.12+ and transformers 4.17+ are prerequisites), refer to the code snippet below:
134
+
135
+ ```python
136
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
137
+
138
+ model = AutoModelForSeq2SeqLM.from_pretrained("gonglinyuan/metro_t0p_basepp", trust_remote_code=True)
139
+ tokenizer = AutoTokenizer.from_pretrained("gonglinyuan/metro_t0p_basepp", trust_remote_code=True)
140
+
141
+ input_text = "Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy"
142
+ inputs = tokenizer([input_text], max_length=512, truncation=True, add_special_tokens=True, return_tensors="pt").input_ids
143
+ outputs = model.generate(inputs, max_new_tokens=256, do_sample=False)
144
+
145
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True)) # expected: positive
146
+ ```
147
+
148
+ ## Other METRO-T0 Models
149
+
150
+ | | # Parameters | Pretraining Data | Prompt-Finetuning Data |
151
+ |--------------------|--------------|------------------|------------------------|
152
+ | [METRO-T0-Base](https://huggingface.co/gonglinyuan/metro_t0_base) | 226M | Wikibook (16G) | T0 Train |
153
+ | [METRO-T0+-Base](https://huggingface.co/gonglinyuan/metro_t0p_base) | 226M | Wikibook (16G) | T0+ Train |
154
+ | [METRO-T0++-Base](https://huggingface.co/gonglinyuan/metro_t0pp_base) | 226M | Wikibook (16G) | T0++ Train |
155
+ | [METRO-T0-Base++](https://huggingface.co/gonglinyuan/metro_t0_basepp) | 256M | 160G corpus | T0 Train |
156
+ | [METRO-T0+-Base++](https://huggingface.co/gonglinyuan/metro_t0p_basepp) | 256M | 160G corpus | T0+ Train |
157
+ | [METRO-T0++-Base++](https://huggingface.co/gonglinyuan/metro_t0pp_basepp) | 256M | 160G corpus | T0++ Train |
158
+ | [METRO-T0-Large++](https://huggingface.co/gonglinyuan/metro_t0_largepp) | 775M | 160G corpus | T0 Train |
159
+ | [METRO-T0+-Large++](https://huggingface.co/gonglinyuan/metro_t0p_largepp) | 775M | 160G corpus | T0+ Train |
160
+ | [METRO-T0++-Large++](https://huggingface.co/gonglinyuan/metro_t0pp_largepp) | 775M | 160G corpus | T0++ Train |
161
+
162
+
163
+ ## Citation
164
+
165
+ If you find the code and models useful for your research, please cite the following paper:
166
+
167
+ ```
168
+ TODO
169
+ ```