--- language: - en datasets: - Reza8848/MUFFIN_68k license: mit --- This is the model weight of **MUFFIN-T5-3B** (**Mu**lti-**F**aceted **In**structions). We fine-tune the [T5-3B](https://huggingface.co/t5-3b) model on our [MUFFIN dataset](https://arxiv.org/abs/2312.02436). We released both 3B and 11B models: |Model|Number of parameters| |-|-| |[MUFFIN-T5-3B](https://huggingface.co/Reza8848/MUFFIN-T5-3B)|3 billion| |[MUFFIN-T5-11B](https://huggingface.co/Reza8848/MUFFIN-T5-11B)|11 billion| You can also find the Llama2-based model weights [here](https://huggingface.co/Reza8848/MUFFIN-Llama2-lora-13B). ## Prompt Template Please use the following prompt template when using the models for inference (including the evaluations on **SuperNI-Test**, **T0-Eval**, and **BBH**): ```python prompt = "### Input:\n{input}" prompt += "\n\n" prompt += "### Instruction:\n{instruction}" prompt += "\n\n" prompt += "### Output:\n" print(prompt) ``` Please use the below prompt when testing the models on classification tasks (i.e., the **MMLU**). ```python prompt = "### Input:\n{input}" prompt += "\n\n" prompt += "### Instruction:\n{instruction}\n" prompt += "(A): {option1}\n(B): {option2}\n(C): {option3}\n(D): {option4}\nAvoid answers outside of (A, B, C, D)." # Add one more sentence in the prompt to indicate the output spaces prompt += "\n\n" prompt += "### Output:\n" print(prompt) ``` ## Model Usage Download our model weights through HuggingFace transformers 🤗: ```python import torch from transformers import AutoModelForSeq2SeqLM, AutoTokenizer ## Download tokenizer = AutoTokenizer.from_pretrained("Reza8848/MUFFIN-T5-3B") model = AutoModelForSeq2SeqLM.from_pretrained("Reza8848/MUFFIN-T5-3B") ## Inference #### Please prepare your testing instance (as shown below) value_dict = { "input": "Drink more wine when you feel thirsty.\nDrink more water when you feel thirsty" "instruction": "In this task, you are given two unconventional instructions for quenching thirst. Your goal is to identify which instruction is more likely to be followed by a person who wants to try something new or different. Answer \"Wine\" if the person is more likely to drink wine when thirsty, and \"Water\" if they are more likely to drink water." } #### Please use the prompt template mentioned before input_sequence = prompt.format_map(value_dict) input_ids = tokenizer(input_sequence, return_tensors="pt").input_ids raw_outputs = model.generate(input_ids) # set the generation arguments according to your needs (e.g., `do_sample`, `num_beams`) outputs = tokenizer.decode(raw_outputs[0], skip_special_tokens=True) print(outputs) ``` ## Zero-Shot Evaluation Performances Our training and inference code is based on [Tk-Instruct](https://github.com/yizhongw/Tk-Instruct/tree/main), including the [metric calculation scripts](https://github.com/yizhongw/Tk-Instruct/blob/main/src/compute_metrics.py) (i.e., `ROUGE-L` and `Exact-Match`).
performances.png
## 🥳 Citation Please kindly cite our paper if you use any resources in this repository: ```bibtex @inproceedings{Lou2023MUFFIN, title={{MUFFIN}: Curating Multi-Faceted Instructions for Improving Instruction Following}, author={Renze Lou and Kai Zhang and Jian Xie and Yuxuan Sun and Janice Ahn and Hanzi Xu and Yu su and Wenpeng Yin}, booktitle={The Twelfth International Conference on Learning Representations}, year={2024}, url={https://openreview.net/forum?id=1vrS1zwekw} } ```