xww033 commited on
Commit
0151ca7
1 Parent(s): 4229d74

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -0
README.md ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # MT-LLaMA Model Card
6
+
7
+ ## Model details
8
+
9
+ **Model type:**
10
+ MT-LLaMA is an open-source multi-task model trained by fine-tuning LLaMA on the massive tasks in [P3](https://huggingface.co/datasets/bigscience/P3) (i.e., T0 Train). Concretely, the used datasets during training and task taxonomy are listed below:
11
+ * Multi-choice QA: CommonsenseQA, Cosmos QA, DREAM, QuAIL, QuaRTz, QASC, QuaRel, SciQ, Social IQA, Wiki Hop, WiQA
12
+ * Extractive QA: Adversarial QA, DuoRC, Quoref, ROPES
13
+ * Close-Book QA: Hotpot QA, Wiki QA
14
+ * Sentiment Classification: Amazon, App Reviews, IMDB, Rotten Tomatoes, Yelp
15
+ * Topic Classification: AG News, DBPedia, TREC
16
+ * Structure-to-Text Generation: Common Gen, Wiki Bio
17
+ * Text Summarization: CNN Daily Mail, Gigaword, MultiNews, SamSum, XSum
18
+ * Paraphrase Identification: MRPC, PAWS, QQP
19
+
20
+ **Organizations developing the model:**
21
+ The MT-LLaMA team with members from Alibaba Damo Academy and the Chinese University of Hong Kong.
22
+
23
+ ## Intended use
24
+
25
+ You can try the codes from [this repo](https://github.com/DAMO-NLP-SG/MT-LLaMA).
26
+
27
+
28
+ ## Zero-shot Evaluation
29
+ We primarily follow the protocols of [Bigscience T0](https://openreview.net/forum?id=9Vrb9D0WI4) to assess the generalization capability of our Multi-task LLaMA to: (1) _**Unseen Datasets**_ (i.e., datasets from seen tasks); (2) _**Unseen Tasks**_.
30
+
31
+ #### Prompt Format
32
+ Extractive QA:
33
+
34
+ 1. XQuAD, TyDiQA, MLQA, SQuAD
35
+ ```angular2html
36
+ Input: Answer the question according to the context. Question: ${question}. Context: ${context}. Answer:
37
+ Output: ${Answer}
38
+ ```
39
+
40
+ Sentiment:
41
+
42
+ 1. SST-2
43
+ ```angular2html
44
+ Input: ${sentence} Based on this review, would the user recommend this product? No or Yes?
45
+ Output: Yes / No
46
+ ```
47
+ Multiple-Choice QA:
48
+
49
+ 1. OpenbookQA
50
+ ```angular2html
51
+ Input: ${question} Which is the correct answer? - (A) ${choiceA} - (B) ${choiceB} - (C) ${choiceC} - (D) ${choiceD}
52
+ Output: ${choiceA} / ${choiceB} / ${choiceC} / ${choiceD}
53
+ ```
54
+ Sentence Completion:
55
+
56
+ 1. COPA
57
+ ```angular2html
58
+ Input: ${premise} {% if question == "cause" %} This happened because... {% else %} As a consequence... Help me pick the more plausible option: - ${text1} - ${text2}
59
+ Output: ${text1} / ${text2}
60
+ ```
61
+ Coreference Resolution:
62
+ 1. Winogrande:
63
+ ```angular2html
64
+ Input: ${sentence} In the previous sentence, does _ refer to ${option1} or ${option2}?
65
+ Output: ${option1} / ${option2}
66
+ ```
67
+ Word Sense Disambiguation:
68
+ 1. WiC
69
+ ```angular2html
70
+ Input: Does the word "${word}" have the same meaning in these two sentences? Yes, No? ${sentence1} ${sentence2}
71
+ Output: ${sentence1} / ${sentence2}
72
+ ```
73
+ Natural Language Inference:
74
+
75
+ 1. MNLI:
76
+ ```angular2html
77
+ Input: ${premise} Question: Does this imply that ${hypothesis}? Please response with 'Yes', 'No', or 'Maybe'.
78
+ Output: Yes / No / Maybe
79
+ ```
80
+ 2. RTE
81
+ ```angular2html
82
+ Input: Given ${premise} Is it guaranteed true that "${hypothesis}"? Yes or no?
83
+ Output: Yes / no
84
+ ```
85
+ #### Results on _Unseen Datasets_
86
+
87
+ | Model | XQuAD-en (F1/EM) | TyDiQA-en (F1/EM) | MLQA-en (F1/EM) | SQuAD (F1/EM) | SST-2 (Acc.) | OpenbookQA (Acc.) |
88
+ |:------------|------------------|-------------------|-----------------|---------------|--------------|-------------------|
89
+ | LLaMA-7b | 9.5 / 2.0 | 14.3 / 2.6 | 13.4 / 3.3 | 29.4 / 11.5 | 50.5 | 32.4 |
90
+ | MT-LLaMA-7b | 42.3 / 31.1 | 38.9 / 26.9 | 45.4 / 31.5 | 85.9 / 77.6 | 92.6 | 38.2 |
91
+ #### Results on _Unseen Tasks_
92
+ | Model | COPA (Acc.) | Winogrande (Acc.) | WiC (Acc.) | MNLI (Acc.) | RTE (Acc.) |
93
+ |:------------|-------------|--------------------|------------|-------------|------------|
94
+ | LLaMA-7b | 56.0 | 49.3 | 51.7 | 30.2 | 52.7 |
95
+ | MT-LLaMA-7b | 88.0 | 54.9 | 52.2 | 49.6 | 79.1 |
96
+
97
+ ## Acknowledgement
98
+ * Our training codes are largely borrowed from [FastChat](https://github.com/lm-sys/FastChat)
99
+ * We are also grateful for the efforts of [LLaMA](https://github.com/facebookresearch/llama) (from FAIR) and [T0](https://github.com/bigscience-workshop/t-zero) (from BigScience), which serve as the foundation of our work
100
+
101
+ If you find this resource useful, please cite the repo as follows:
102
+ ```
103
+ @software{damonlpsg2023mtllama,
104
+ author = {Xu, Weiwen and Li, Xin and Bing, Lidong},
105
+ title = {Multi-task Instruction-tuned LLaMA},
106
+ year = 2023,
107
+ url = {https://github.com/DAMO-NLP-SG/MT-LLaMA}
108
+ }
109
+ ```