Edit model card

Mistral-ORPO-Capybara-7k (7B)

Mistral-ORPO is a fine-tuned version of mistralai/Mistral-7B-v0.1 using the odds ratio preference optimization (ORPO). With ORPO, the model directly learns the preference without the supervised fine-tuning warmup phase.

Mistral-ORPO-ORPO-Capybara-7k is fine-tuned for 2.5 hours on four A100s exclusively on the 7k instances of the distilled Capybara paired multi-turn conversation dataset, argilla/distilabel-capybara-dpo-7k-binarized, by Argilla.

πŸ‘ Model Performance

1) AlpacaEval & MT-Bench

Model Name Size Align MT-Bench AlpacaEval 2.0 (LC)
Mistral-ORPO-Capybara-7k 7B ORPO 7.44 15.9
Mistral-ORPO-Ξ² 7B ORPO 7.32 14.7
Zephyr Ξ² 7B DPO 7.34 13.2
TULU-2-DPO 13B DPO 7.00 11.6
Llama-2-Chat 7B RLHF 6.27 5.4
Llama-2-Chat 13B RLHF 6.65 8.4

2) IFEval

Model Type Prompt-Strict Prompt-Loose Inst-Strict Inst-Loose
Mistral-ORPO-Capybara-7k 0.5083 0.5083 0.5827 0.6127
Mistral-ORPO-⍺ 0.5009 0.5083 0.5995 0.6163
Mistral-ORPO-Ξ² 0.5287 0.5564 0.6355 0.6619

πŸ—ΊοΈ MT-Bench by Category

image/png

πŸ–₯️ Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("kaist-ai/mistral-orpo-capybara-7k")
tokenizer = AutoTokenizer.from_pretrained("kaist-ai/mistral-orpo-capybara-7k")
# Apply chat template
query = [{'role': 'user', 'content': 'Hi! How are you doing?'}]
prompt = tokenizer.apply_chat_template(query, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors='pt')
# Generation with specific configurations
output = model.generate(
  **inputs,
  max_new_tokens=128,
  do_sample=True,
  temperature=0.7
)
response = tokenizer.batch_decode(output)
#<|user|>
#Hi! How are you doing?</s>
#<|assistant|>
#I'm doing well, thank you! How are you?</s>

πŸ“Ž Citation

@misc{hong2024orpo,
      title={ORPO: Monolithic Preference Optimization without Reference Model}, 
      author={Jiwoo Hong and Noah Lee and James Thorne},
      year={2024},
      eprint={2403.07691},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
299
Safetensors
Model size
7.24B params
Tensor type
BF16
Β·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train kaist-ai/mistral-orpo-capybara-7k

Spaces using kaist-ai/mistral-orpo-capybara-7k 2

Collection including kaist-ai/mistral-orpo-capybara-7k

Evaluation results