Update README.md

2429315 about 1 year ago

4.68 kB

	---
	license: mit
	---

	# MT-LLaMA Model Card

	## Model details

	Model type:
	MT-LLaMA is an open-source multi-task model trained by fine-tuning LLaMA on the massive tasks in [P3](https://huggingface.co/datasets/bigscience/P3) (i.e., T0 Train). Concretely, the used datasets during training and task taxonomy are listed below:
	* Multi-choice QA: CommonsenseQA, Cosmos QA, DREAM, QuAIL, QuaRTz, QASC, QuaRel, SciQ, Social IQA, Wiki Hop, WiQA
	* Extractive QA: Adversarial QA, DuoRC, Quoref, ROPES
	* Close-Book QA: Hotpot QA, Wiki QA
	* Sentiment Classification: Amazon, App Reviews, IMDB, Rotten Tomatoes, Yelp
	* Topic Classification: AG News, DBPedia, TREC
	* Structure-to-Text Generation: Common Gen, Wiki Bio
	* Text Summarization: CNN Daily Mail, Gigaword, MultiNews, SamSum, XSum
	* Paraphrase Identification: MRPC, PAWS, QQP

	Organizations developing the model:
	The MT-LLaMA team with members from Alibaba Damo Academy and the Chinese University of Hong Kong.

	## Intended use

	You can try the codes from our [github repo](https://github.com/DAMO-NLP-SG/MT-LLaMA).


	## Zero-shot Evaluation

	We primarily follow the protocols of [Bigscience T0](https://openreview.net/forum?id=9Vrb9D0WI4) to assess the generalization capability of our Multi-task LLaMA to: (1) _Unseen Datasets_ (i.e., datasets from seen tasks); (2) _Unseen Tasks_.

	#### Prompt Format

	Extractive QA:

	1. XQuAD, TyDiQA, MLQA, SQuAD
	```angular2html
	Input: Answer the question according to the context. Question: ${question}. Context: ${context}. Answer:
	Output: ${Answer}
	```

	Sentiment:

	1. SST-2
	```angular2html
	Input: ${sentence} Based on this review, would the user recommend this product? No or Yes?
	Output: Yes / No
	```
	Multiple-Choice QA:

	1. OpenbookQA
	```angular2html
	Input: ${question} Which is the correct answer? - (A) ${choiceA} - (B) ${choiceB} - (C) ${choiceC} - (D) ${choiceD}
	Output: ${choiceA} / ${choiceB} / ${choiceC} / ${choiceD}
	```
	Sentence Completion:

	1. COPA
	```angular2html
	Input: ${premise} {% if question == "cause" %} This happened because... {% else %} As a consequence... Help me pick the more plausible option: - ${text1} - ${text2}
	Output: ${text1} / ${text2}
	```
	Coreference Resolution:
	1. Winogrande:
	```angular2html
	Input: ${sentence} In the previous sentence, does _ refer to ${option1} or ${option2}?
	Output: ${option1} / ${option2}
	```
	Word Sense Disambiguation:
	1. WiC
	```angular2html
	Input: Does the word "${word}" have the same meaning in these two sentences? Yes, No? ${sentence1} ${sentence2}
	Output: ${sentence1} / ${sentence2}
	```
	Natural Language Inference:

	1. MNLI:
	```angular2html
	Input: ${premise} Question: Does this imply that ${hypothesis}? Please response with 'Yes', 'No', or 'Maybe'.
	Output: Yes / No / Maybe
	```
	2. RTE
	```angular2html
	Input: Given ${premise} Is it guaranteed true that "${hypothesis}"? Yes or no?
	Output: Yes / no
	```
	#### Results on _Unseen Datasets_

	\| Model \| XQuAD-en (F1/EM) \| TyDiQA-en (F1/EM) \| MLQA-en (F1/EM) \| SQuAD (F1/EM) \| SST-2 (Acc.) \| OpenbookQA (Acc.) \|
	\|:------------\|------------------\|-------------------\|-----------------\|---------------\|--------------\|-------------------\|
	\| LLaMA-7b \| 9.5 / 2.0 \| 14.3 / 2.6 \| 13.4 / 3.3 \| 29.4 / 11.5 \| 50.5 \| 32.4 \|
	\| MT-LLaMA-7b \| 42.3 / 31.1 \| 38.9 / 26.9 \| 45.4 / 31.5 \| 85.9 / 77.6 \| 92.6 \| 38.2 \|
	#### Results on _Unseen Tasks_
	\| Model \| COPA (Acc.) \| Winogrande (Acc.) \| WiC (Acc.) \| MNLI (Acc.) \| RTE (Acc.) \|
	\|:------------\|-------------\|--------------------\|------------\|-------------\|------------\|
	\| LLaMA-7b \| 56.0 \| 49.3 \| 51.7 \| 30.2 \| 52.7 \|
	\| MT-LLaMA-7b \| 88.0 \| 54.9 \| 52.2 \| 49.6 \| 79.1 \|

	## Acknowledgement

	* Our training codes are largely borrowed from [FastChat](https://github.com/lm-sys/FastChat)
	* We are also grateful for the efforts of [LLaMA](https://github.com/facebookresearch/llama) (from FAIR) and [T0](https://github.com/bigscience-workshop/t-zero) (from BigScience), which serve as the foundation of our work

	If you find this resource useful, please cite the repo as follows:
	```
	@software{damonlpsg2023mtllama,
	author = {Xu, Weiwen and Li, Xin and Bing, Lidong},
	title = {Multi-task Instruction-tuned LLaMA},
	year = 2023,
	url = {https://github.com/DAMO-NLP-SG/MT-LLaMA}
	}
	```