Official repository: https://github.com/gonglinyuan/metro_t0
METRO-T0
Paper: Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers (ACL 2023)
METRO-T0 is a T5-style text-to-text Transformer pretrained using model-generated pretraining signals, prompt-finetuned on a family of public NLP tasks proposed in T0. METRO-T0 is highly parameter efficient. For example, METRO-T0-Large++ (775M parameters) outperforms GPT-3 (175B parameters) and T0-3B (3B parameters) on a wide range of NLP tasks.
Use METRO-T0+-Base
To use METRO-T0+-Base in PyTorch (Python 3.7+, PyTorch 1.12+ and transformers 4.17+ are prerequisites), refer to the code snippet below:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("gonglinyuan/metro_t0p_base", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("gonglinyuan/metro_t0p_base", trust_remote_code=True)
input_text = "Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy"
inputs = tokenizer([input_text], max_length=512, truncation=True, add_special_tokens=True, return_tensors="pt").input_ids
outputs = model.generate(inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) # expected: positive
Other METRO-T0 Models
# Parameters | Pretraining Data | Prompt-Finetuning Data | |
---|---|---|---|
METRO-T0-Base | 226M | Wikibook (16G) | T0 Train |
METRO-T0+-Base | 226M | Wikibook (16G) | T0+ Train |
METRO-T0++-Base | 226M | Wikibook (16G) | T0++ Train |
METRO-T0-Base++ | 256M | 160G corpus | T0 Train |
METRO-T0+-Base++ | 256M | 160G corpus | T0+ Train |
METRO-T0++-Base++ | 256M | 160G corpus | T0++ Train |
METRO-T0-Large++ | 775M | 160G corpus | T0 Train |
METRO-T0+-Large++ | 775M | 160G corpus | T0+ Train |
METRO-T0++-Large++ | 775M | 160G corpus | T0++ Train |
Citation
If you find the code and models useful for your research, please cite the following paper:
@misc{gong2023modelgenerated,
title={Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers},
author={Linyuan Gong and Chenyan Xiong and Xiaodong Liu and Payal Bajaj and Yiqing Xie and Alvin Cheung and Jianfeng Gao and Xia Song},
year={2023},
eprint={2305.12567},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2305.12567}
}
- Downloads last month
- 4
Inference API (serverless) does not yet support model repos that contain custom code.
Evaluation results
- accuracy on RTEvalidation set self-reported64.910
- accuracy on CBvalidation set self-reported44.643
- accuracy on ANLI R1self-reported32.353
- accuracy on ANLI R2self-reported32.200
- accuracy on ANLI R3self-reported32.900
- accuracy on WSCvalidation set self-reported61.346
- accuracy on Winogrande XLvalidation set self-reported50.860
- accuracy on COPAvalidation set self-reported61.500
- accuracy on StoryCloze 2016validation set self-reported82.598
- accuracy on HellaSwagvalidation set self-reported43.221