|
--- |
|
license: mit |
|
--- |
|
|
|
# MT-LLaMA Model Card |
|
|
|
## Model details |
|
|
|
**Model type:** |
|
MT-LLaMA is an open-source multi-task model trained by fine-tuning LLaMA on the massive tasks in [P3](https://huggingface.co/datasets/bigscience/P3) (i.e., T0 Train). Concretely, the used datasets during training and task taxonomy are listed below: |
|
* Multi-choice QA: CommonsenseQA, Cosmos QA, DREAM, QuAIL, QuaRTz, QASC, QuaRel, SciQ, Social IQA, Wiki Hop, WiQA |
|
* Extractive QA: Adversarial QA, DuoRC, Quoref, ROPES |
|
* Close-Book QA: Hotpot QA, Wiki QA |
|
* Sentiment Classification: Amazon, App Reviews, IMDB, Rotten Tomatoes, Yelp |
|
* Topic Classification: AG News, DBPedia, TREC |
|
* Structure-to-Text Generation: Common Gen, Wiki Bio |
|
* Text Summarization: CNN Daily Mail, Gigaword, MultiNews, SamSum, XSum |
|
* Paraphrase Identification: MRPC, PAWS, QQP |
|
|
|
**Organizations developing the model:** |
|
The MT-LLaMA team with members from Alibaba Damo Academy and the Chinese University of Hong Kong. |
|
|
|
## Intended use |
|
|
|
You can try the codes from our [github repo](https://github.com/DAMO-NLP-SG/MT-LLaMA). |
|
|
|
|
|
## Zero-shot Evaluation |
|
|
|
We primarily follow the protocols of [Bigscience T0](https://openreview.net/forum?id=9Vrb9D0WI4) to assess the generalization capability of our Multi-task LLaMA to: (1) _**Unseen Datasets**_ (i.e., datasets from seen tasks); (2) _**Unseen Tasks**_. |
|
|
|
#### Prompt Format |
|
|
|
Extractive QA: |
|
|
|
1. XQuAD, TyDiQA, MLQA, SQuAD |
|
```angular2html |
|
Input: Answer the question according to the context. Question: ${question}. Context: ${context}. Answer: |
|
Output: ${Answer} |
|
``` |
|
|
|
Sentiment: |
|
|
|
1. SST-2 |
|
```angular2html |
|
Input: ${sentence} Based on this review, would the user recommend this product? No or Yes? |
|
Output: Yes / No |
|
``` |
|
Multiple-Choice QA: |
|
|
|
1. OpenbookQA |
|
```angular2html |
|
Input: ${question} Which is the correct answer? - (A) ${choiceA} - (B) ${choiceB} - (C) ${choiceC} - (D) ${choiceD} |
|
Output: ${choiceA} / ${choiceB} / ${choiceC} / ${choiceD} |
|
``` |
|
Sentence Completion: |
|
|
|
1. COPA |
|
```angular2html |
|
Input: ${premise} {% if question == "cause" %} This happened because... {% else %} As a consequence... Help me pick the more plausible option: - ${text1} - ${text2} |
|
Output: ${text1} / ${text2} |
|
``` |
|
Coreference Resolution: |
|
1. Winogrande: |
|
```angular2html |
|
Input: ${sentence} In the previous sentence, does _ refer to ${option1} or ${option2}? |
|
Output: ${option1} / ${option2} |
|
``` |
|
Word Sense Disambiguation: |
|
1. WiC |
|
```angular2html |
|
Input: Does the word "${word}" have the same meaning in these two sentences? Yes, No? ${sentence1} ${sentence2} |
|
Output: ${sentence1} / ${sentence2} |
|
``` |
|
Natural Language Inference: |
|
|
|
1. MNLI: |
|
```angular2html |
|
Input: ${premise} Question: Does this imply that ${hypothesis}? Please response with 'Yes', 'No', or 'Maybe'. |
|
Output: Yes / No / Maybe |
|
``` |
|
2. RTE |
|
```angular2html |
|
Input: Given ${premise} Is it guaranteed true that "${hypothesis}"? Yes or no? |
|
Output: Yes / no |
|
``` |
|
#### Results on _Unseen Datasets_ |
|
|
|
| Model | XQuAD-en (F1/EM) | TyDiQA-en (F1/EM) | MLQA-en (F1/EM) | SQuAD (F1/EM) | SST-2 (Acc.) | OpenbookQA (Acc.) | |
|
|:------------|------------------|-------------------|-----------------|---------------|--------------|-------------------| |
|
| LLaMA-7b | 9.5 / 2.0 | 14.3 / 2.6 | 13.4 / 3.3 | 29.4 / 11.5 | 50.5 | 32.4 | |
|
| MT-LLaMA-7b | 42.3 / 31.1 | 38.9 / 26.9 | 45.4 / 31.5 | 85.9 / 77.6 | 92.6 | 38.2 | |
|
#### Results on _Unseen Tasks_ |
|
| Model | COPA (Acc.) | Winogrande (Acc.) | WiC (Acc.) | MNLI (Acc.) | RTE (Acc.) | |
|
|:------------|-------------|--------------------|------------|-------------|------------| |
|
| LLaMA-7b | 56.0 | 49.3 | 51.7 | 30.2 | 52.7 | |
|
| MT-LLaMA-7b | 88.0 | 54.9 | 52.2 | 49.6 | 79.1 | |
|
|
|
## Acknowledgement |
|
|
|
* Our training codes are largely borrowed from [FastChat](https://github.com/lm-sys/FastChat) |
|
* We are also grateful for the efforts of [LLaMA](https://github.com/facebookresearch/llama) (from FAIR) and [T0](https://github.com/bigscience-workshop/t-zero) (from BigScience), which serve as the foundation of our work |
|
|
|
If you find this resource useful, please cite the repo as follows: |
|
``` |
|
@software{damonlpsg2023mtllama, |
|
author = {Xu, Weiwen and Li, Xin and Bing, Lidong}, |
|
title = {Multi-task Instruction-tuned LLaMA}, |
|
year = 2023, |
|
url = {https://github.com/DAMO-NLP-SG/MT-LLaMA} |
|
} |
|
``` |
|
|