metadata

license: mit
language:
  - en
tags:
  - t5
model-index:
  - name: metro_t0_base
    results:
      - task:
          type: natural-language-inference
        dataset:
          type: super_glue
          name: RTE
          config: rte
          split: validation
        metrics:
          - type: accuracy
            value: 61.6245487364621
      - task:
          type: natural-language-inference
        dataset:
          type: super_glue
          name: CB
          config: cb
          split: validation
        metrics:
          - type: accuracy
            value: 52.73809523809525
      - task:
          type: natural-language-inference
        dataset:
          type: anli
          name: ANLI R1
          split: dev_r1
        metrics:
          - type: accuracy
            value: 31.706666666666667
      - task:
          type: natural-language-inference
        dataset:
          type: anli
          name: ANLI R2
          split: dev_r2
        metrics:
          - type: accuracy
            value: 33.486666666666665
      - task:
          type: natural-language-inference
        dataset:
          type: anli
          name: ANLI R3
          split: dev_r3
        metrics:
          - type: accuracy
            value: 33.44444444444444
      - task:
          type: coreference-resolution
        dataset:
          type: super_glue
          name: WSC
          config: wsc.fixed
          split: validation
        metrics:
          - type: accuracy
            value: 58.75
      - task:
          type: coreference-resolution
        dataset:
          type: winogrande
          name: Winogrande XL
          config: winogrande_xl
          split: validation
        metrics:
          - type: accuracy
            value: 50.95501183898973
      - task:
          type: multiple-choice-qa
        dataset:
          type: super_glue
          name: COPA
          config: copa
          split: validation
        metrics:
          - type: accuracy
            value: 66.25
      - task:
          type: multiple-choice-qa
        dataset:
          type: story_cloze
          name: StoryCloze 2016
          config: '2016'
          split: validation
        metrics:
          - type: accuracy
            value: 82.40513094601816
      - task:
          type: multiple-choice-qa
        dataset:
          type: hellaswag
          name: HellaSwag
          split: validation
        metrics:
          - type: accuracy
            value: 25.647281418044216
      - task:
          type: word-sense-disambiguation
        dataset:
          type: super_glue
          name: WiC
          config: wic
          split: validation
        metrics:
          - type: accuracy
            value: 50.423197492163006

Official repository: https://github.com/gonglinyuan/metro_t0

METRO-T0

Paper: Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers (ACL 2023)

METRO-T0 is a T5-style text-to-text Transformer pretrained using model-generated pretraining signals, prompt-finetuned on a family of public NLP tasks proposed in T0. METRO-T0 is highly parameter efficient. For example, METRO-T0-Large++ (775M parameters) outperforms GPT-3 (175B parameters) and T0-3B (3B parameters) on a wide range of NLP tasks.

Use METRO-T0-Base

To use METRO-T0-Base in PyTorch (Python 3.7+, PyTorch 1.12+ and transformers 4.17+ are prerequisites), refer to the code snippet below:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("gonglinyuan/metro_t0_base", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("gonglinyuan/metro_t0_base", trust_remote_code=True)

input_text = "Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy"
inputs = tokenizer([input_text], max_length=512, truncation=True, add_special_tokens=True, return_tensors="pt").input_ids
outputs = model.generate(inputs, max_new_tokens=256, do_sample=False)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))  # expected: positive

Other METRO-T0 Models

	# Parameters	Pretraining Data	Prompt-Finetuning Data
METRO-T0-Base	226M	Wikibook (16G)	T0 Train
METRO-T0+-Base	226M	Wikibook (16G)	T0+ Train
METRO-T0++-Base	226M	Wikibook (16G)	T0++ Train
METRO-T0-Base++	256M	160G corpus	T0 Train
METRO-T0+-Base++	256M	160G corpus	T0+ Train
METRO-T0++-Base++	256M	160G corpus	T0++ Train
METRO-T0-Large++	775M	160G corpus	T0 Train
METRO-T0+-Large++	775M	160G corpus	T0+ Train
METRO-T0++-Large++	775M	160G corpus	T0++ Train

Citation

If you find the code and models useful for your research, please cite the following paper:

@misc{gong2023modelgenerated,
      title={Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers}, 
      author={Linyuan Gong and Chenyan Xiong and Xiaodong Liu and Payal Bajaj and Yiqing Xie and Alvin Cheung and Jianfeng Gao and Xia Song},
      year={2023},
      eprint={2305.12567},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2305.12567}
}