Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,109 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
---
|
4 |
+
|
5 |
+
# MT-LLaMA Model Card
|
6 |
+
|
7 |
+
## Model details
|
8 |
+
|
9 |
+
**Model type:**
|
10 |
+
MT-LLaMA is an open-source multi-task model trained by fine-tuning LLaMA on the massive tasks in [P3](https://huggingface.co/datasets/bigscience/P3) (i.e., T0 Train). Concretely, the used datasets during training and task taxonomy are listed below:
|
11 |
+
* Multi-choice QA: CommonsenseQA, Cosmos QA, DREAM, QuAIL, QuaRTz, QASC, QuaRel, SciQ, Social IQA, Wiki Hop, WiQA
|
12 |
+
* Extractive QA: Adversarial QA, DuoRC, Quoref, ROPES
|
13 |
+
* Close-Book QA: Hotpot QA, Wiki QA
|
14 |
+
* Sentiment Classification: Amazon, App Reviews, IMDB, Rotten Tomatoes, Yelp
|
15 |
+
* Topic Classification: AG News, DBPedia, TREC
|
16 |
+
* Structure-to-Text Generation: Common Gen, Wiki Bio
|
17 |
+
* Text Summarization: CNN Daily Mail, Gigaword, MultiNews, SamSum, XSum
|
18 |
+
* Paraphrase Identification: MRPC, PAWS, QQP
|
19 |
+
|
20 |
+
**Organizations developing the model:**
|
21 |
+
The MT-LLaMA team with members from Alibaba Damo Academy and the Chinese University of Hong Kong.
|
22 |
+
|
23 |
+
## Intended use
|
24 |
+
|
25 |
+
You can try the codes from [this repo](https://github.com/DAMO-NLP-SG/MT-LLaMA).
|
26 |
+
|
27 |
+
|
28 |
+
## Zero-shot Evaluation
|
29 |
+
We primarily follow the protocols of [Bigscience T0](https://openreview.net/forum?id=9Vrb9D0WI4) to assess the generalization capability of our Multi-task LLaMA to: (1) _**Unseen Datasets**_ (i.e., datasets from seen tasks); (2) _**Unseen Tasks**_.
|
30 |
+
|
31 |
+
#### Prompt Format
|
32 |
+
Extractive QA:
|
33 |
+
|
34 |
+
1. XQuAD, TyDiQA, MLQA, SQuAD
|
35 |
+
```angular2html
|
36 |
+
Input: Answer the question according to the context. Question: ${question}. Context: ${context}. Answer:
|
37 |
+
Output: ${Answer}
|
38 |
+
```
|
39 |
+
|
40 |
+
Sentiment:
|
41 |
+
|
42 |
+
1. SST-2
|
43 |
+
```angular2html
|
44 |
+
Input: ${sentence} Based on this review, would the user recommend this product? No or Yes?
|
45 |
+
Output: Yes / No
|
46 |
+
```
|
47 |
+
Multiple-Choice QA:
|
48 |
+
|
49 |
+
1. OpenbookQA
|
50 |
+
```angular2html
|
51 |
+
Input: ${question} Which is the correct answer? - (A) ${choiceA} - (B) ${choiceB} - (C) ${choiceC} - (D) ${choiceD}
|
52 |
+
Output: ${choiceA} / ${choiceB} / ${choiceC} / ${choiceD}
|
53 |
+
```
|
54 |
+
Sentence Completion:
|
55 |
+
|
56 |
+
1. COPA
|
57 |
+
```angular2html
|
58 |
+
Input: ${premise} {% if question == "cause" %} This happened because... {% else %} As a consequence... Help me pick the more plausible option: - ${text1} - ${text2}
|
59 |
+
Output: ${text1} / ${text2}
|
60 |
+
```
|
61 |
+
Coreference Resolution:
|
62 |
+
1. Winogrande:
|
63 |
+
```angular2html
|
64 |
+
Input: ${sentence} In the previous sentence, does _ refer to ${option1} or ${option2}?
|
65 |
+
Output: ${option1} / ${option2}
|
66 |
+
```
|
67 |
+
Word Sense Disambiguation:
|
68 |
+
1. WiC
|
69 |
+
```angular2html
|
70 |
+
Input: Does the word "${word}" have the same meaning in these two sentences? Yes, No? ${sentence1} ${sentence2}
|
71 |
+
Output: ${sentence1} / ${sentence2}
|
72 |
+
```
|
73 |
+
Natural Language Inference:
|
74 |
+
|
75 |
+
1. MNLI:
|
76 |
+
```angular2html
|
77 |
+
Input: ${premise} Question: Does this imply that ${hypothesis}? Please response with 'Yes', 'No', or 'Maybe'.
|
78 |
+
Output: Yes / No / Maybe
|
79 |
+
```
|
80 |
+
2. RTE
|
81 |
+
```angular2html
|
82 |
+
Input: Given ${premise} Is it guaranteed true that "${hypothesis}"? Yes or no?
|
83 |
+
Output: Yes / no
|
84 |
+
```
|
85 |
+
#### Results on _Unseen Datasets_
|
86 |
+
|
87 |
+
| Model | XQuAD-en (F1/EM) | TyDiQA-en (F1/EM) | MLQA-en (F1/EM) | SQuAD (F1/EM) | SST-2 (Acc.) | OpenbookQA (Acc.) |
|
88 |
+
|:------------|------------------|-------------------|-----------------|---------------|--------------|-------------------|
|
89 |
+
| LLaMA-7b | 9.5 / 2.0 | 14.3 / 2.6 | 13.4 / 3.3 | 29.4 / 11.5 | 50.5 | 32.4 |
|
90 |
+
| MT-LLaMA-7b | 42.3 / 31.1 | 38.9 / 26.9 | 45.4 / 31.5 | 85.9 / 77.6 | 92.6 | 38.2 |
|
91 |
+
#### Results on _Unseen Tasks_
|
92 |
+
| Model | COPA (Acc.) | Winogrande (Acc.) | WiC (Acc.) | MNLI (Acc.) | RTE (Acc.) |
|
93 |
+
|:------------|-------------|--------------------|------------|-------------|------------|
|
94 |
+
| LLaMA-7b | 56.0 | 49.3 | 51.7 | 30.2 | 52.7 |
|
95 |
+
| MT-LLaMA-7b | 88.0 | 54.9 | 52.2 | 49.6 | 79.1 |
|
96 |
+
|
97 |
+
## Acknowledgement
|
98 |
+
* Our training codes are largely borrowed from [FastChat](https://github.com/lm-sys/FastChat)
|
99 |
+
* We are also grateful for the efforts of [LLaMA](https://github.com/facebookresearch/llama) (from FAIR) and [T0](https://github.com/bigscience-workshop/t-zero) (from BigScience), which serve as the foundation of our work
|
100 |
+
|
101 |
+
If you find this resource useful, please cite the repo as follows:
|
102 |
+
```
|
103 |
+
@software{damonlpsg2023mtllama,
|
104 |
+
author = {Xu, Weiwen and Li, Xin and Bing, Lidong},
|
105 |
+
title = {Multi-task Instruction-tuned LLaMA},
|
106 |
+
year = 2023,
|
107 |
+
url = {https://github.com/DAMO-NLP-SG/MT-LLaMA}
|
108 |
+
}
|
109 |
+
```
|