MDCure-LLAMA3.1-70B-Instruct

πŸ“„ Paper | πŸ€— HF Collection | βš™οΈ GitHub Repo

Introduction

MDCure is an effective and scalable procedure for generating high-quality multi-document (MD) instruction tuning data to improve MD capabilities of LLMs. Using MDCure, we construct a suite of MD instruction datasets complementary to collections such as FLAN and fine-tune a variety of already instruction-tuned LLMs from the FlanT5, Qwen2, and LLAMA3.1 model families, up to 70B parameters in size. We additionally introduce MDCureRM, an evaluator model specifically designed for the MD setting to filter and select high-quality MD instruction data in a cost-effective, RM-as-a-judge fashion. Extensive evaluations on a wide range of MD and long-context benchmarks spanning various tasks show MDCure consistently improves performance over pre-trained baselines and over corresponding base models by up to 75.5%.

We release MDCure datasets of size 12k, 36k, and 72k. We also release MDCureRM and the best MDCure'd model for each architecture/size combination. To access all our models and datasets, please visit our HF Collection. For further details regarding dataset construction, please see our paper and Github repo. For additional details regarding how to use yale-nlp/MDCure-LLAMA3.1-70B-Instruct, please see below.

The MDCure pipeline generates diverse multi-document instructions, filters them via fine-grained scoring by MDCureRM, and tunes a base LLM to enhance its multi-document capabilities.

Model Details

yale-nlp/MDCure-LLAMA3.1-70B-Instruct is initialized from meta-llama/Meta-Llama-3.1-70B-Instruct and fine-tuned on the MDCure-72k dataset.

Requirements

We recommend using the latest version of HF Transformers, or any transformers>=4.45.0, to avoid any potential errors when using this model.

Quickstart

Below we provide a code snippet demonstrating how to load the tokenizer and model and generate content in response to an input context concerning multiple source documents and a related question or instruction. We strongly recommend to separate the texts and/or instruction using \n\n or <doc-sep> to maintain consistency with the format of the data used during training.

model = AutoModelForCausalLM.from_pretrained("yale-nlp/MDCure-LLAMA3.1-70B-Instruct", device_map='auto',torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("yale-nlp/MDCure-LLAMA3.1-70B-Instruct")

source_text_1 = ...
source_text_2 = ...
source_text_3 = ...
prompt = f"{source_text_1}\n\n{source_text_2}\n\n{source_text_3}\n\nWhat happened in CHAMPAIGN regarding Lovie Smith and the 2019 defense improvements? Respond with 1-2 sentences."

messages = [
      {"role": "system", "content": "You are an assistant with strong multi-document processing skills."},
      {"role": "user", "content": prompt},
      ]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt", return_token_type_ids=False).to(model.device) 

generated_ids = model.generate(**model_inputs, max_new_tokens=512)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

You can also run conversational inference with the model using the Transformers pipeline abstraction, described further in the official LLAMA3.1-70B-Instruct model card.

All MDCure Models

We open-source our custom multi-document instruction scoring model, MDCureRM, as well as our best MDCure'd models at the following links:

Model Huggingface Repo Description
MDCureRM πŸ€— HF Repo Multi-objective reward model to score and filter MD instruction data more cheaply and effectively than GPT-3.5-Turbo
MDCure-FlanT5-Base πŸ€— HF Repo FlanT5-Base fine-tuned with MDCure-72k
MDCure-FlanT5-Large πŸ€— HF Repo FlanT5-Large fine-tuned with MDCure-72k
MDCure-Qwen2-1.5B-Instruct πŸ€— HF Repo Qwen2-1.5B-Instruct fine-tuned with MDCure-72k
MDCure-Qwen2-7B-Instruct πŸ€— HF Repo Qwen2-7B-Instruct fine-tuned with MDCure-72k
MDCure-LLAMA3.1-8B-Instruct πŸ€— HF Repo LLAMA3.1-8B-Instruct fine-tuned with MDCure-72k
MDCure-LLAMA3.1-70B-Instruct πŸ€— HF Repo LLAMA3.1-70B-Instruct fine-tuned with MDCure-72

Citation

If you find our work useful, please cite our paper as:

@article{liu2024mdcure,
    title={MDCure: A Scalable Pipeline for Multi-Document Instruction-Following},
    author={Gabrielle Kaili-May Liu and Bowen Shi and Avi Caciularu and Idan Szpektor and Arman Cohan},
    journal={arXiv preprint arXiv:2410.23463},
    year={2024},
    url={https://arxiv.org/abs/2410.23463}
}
Downloads last month
21
Safetensors
Model size
70.6B params
Tensor type
BF16
Β·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for yale-nlp/MDCure-LLAMA3.1-70B-Instruct

Finetuned
(47)
this model

Dataset used to train yale-nlp/MDCure-LLAMA3.1-70B-Instruct

Collection including yale-nlp/MDCure-LLAMA3.1-70B-Instruct