File size: 3,673 Bytes

bd17fae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
263610b
bd17fae
263610b
 
bd17fae
263610b
bd17fae
263610b
bd17fae
263610b
 
 
 
 
 
 
 
bd17fae
263610b
bd17fae
263610b
 
 
 
bd17fae
263610b
bd17fae
263610b
bd17fae
263610b
bd17fae
263610b
 
bd17fae
263610b
 
bd17fae
263610b
 
 
 
bd17fae
263610b
 
 
 
 
 
 
 
bd17fae
263610b
 
 
 
bd17fae
 
263610b
bd17fae
 
263610b

---
language:
- en
license: mit
base_model:
- mistralai/Mistral-7B-v0.1
datasets:
- HuggingFaceH4/ultrafeedback_binarized
pipeline_tag: text-generation
model-index:
- name: Mistral-ORPO-⍺
  results:
  - task:
      type: text-generation
    dataset:
      name: AlpacaEval 1
      type: AlpacaEval
    metrics:
    - type: AlpacaEval 1.0
      value: 87.92%
      name: Win Rate
    source:
      url: https://github.com/tatsu-lab/alpaca_eval
      name: self-reported
  - task:
      type: text-generation
    dataset:
      name: AlpacaEval 2
      type: AlpacaEval
    metrics:
    - type: AlpacaEval 2.0
      value: 11.33%
      name: Win Rate
    source:
      url: https://github.com/tatsu-lab/alpaca_eval
      name: self-reported
  - task:
      type: text-generation
    dataset:
      name: MT-Bench
      type: MT-Bench
    metrics:
    - type: MT-Bench
      value: 7.23
      name: Score
    source:
      url: https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/
      name: self-reported
---
# **Mistral-ORPO-⍺ (7B)**

**Mistral-ORPO** is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) using the *odds ratio preference optimization (ORPO)*. With ORPO, the model directly learns the preference without the supervised fine-tuning warmup phase. **Mistral-ORPO-⍺** is fine-tuned exclusively on [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized).
- **Github Repository**: https://github.com/xfactlab/orpo

## 👍 **Model Performance**

### 1) AlpacaEval & MT-Bench

|Model Name|Size|Align|MT-Bench|AlpacaEval 1.0|AlpacaEval 2.0|
|:--------|:--------------:|:--------------:|:-------------------:|:------------:|:------------:|
|**Mistral-<tt>ORPO</tt>-⍺**|7B|<tt>ORPO</tt>|7.23|87.92|11.33|
|**Mistral-<tt>ORPO</tt>-β**|7B|<tt>ORPO</tt>|7.32|91.41|12.20|
|Zephyr β |7B|DPO|7.34|90.60|10.99|
|TULU-2-DPO |13B|DPO|7.00|89.5|10.12|
|Llama-2-Chat |7B|RLHF|6.27|71.37|4.96|
|Llama-2-Chat |13B|RLHF|6.65|81.09|7.70|

### 2) IFEval

| **Model Type**     | **Prompt-Strict** | **Prompt-Loose** | **Inst-Strict** | **Inst-Loose** |
|--------------------|:-----------------:|:----------------:|:---------------:|:--------------:|
| **Mistral-ORPO-⍺** |       0.5009      |      0.5083      |      0.5995     |     0.6163     |
| **Mistral-ORPO-β** |       0.5287      |      0.5564      |      0.6355     |     0.6619     |

## 🗺️ **MT-Bench by Category**

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6415c043486c7c9a5d151583/1Ifpt0ljCfJPEoZAqlqqy.png)

## 🖥️ **Inference**

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("kaist-ai/mistral-orpo-alpha")
tokenizer = AutoTokenizer.from_pretrained("kaist-ai/mistral-orpo-alpha")

# Apply chat template
query = [{'role': 'user', 'content': 'Hi! How are you doing?'}]
prompt = tokenizer.apply_chat_template(query, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors='pt')

# Generation with specific configurations
output = model.generate(
  **inputs,
  max_new_tokens=128,
  do_sample=True,
  temperature=0.7
)
response = tokenizer.batch_decode(output)

#<|user|>
#Hi! How are you doing?</s>
#<|assistant|>
#I'm doing well, thank you! How are you?</s>
```

## 📎 **Citation**

```
@misc{hong2024orpo,
      title={ORPO: Monolithic Preference Optimization without Reference Model}, 
      author={Jiwoo Hong and Noah Lee and James Thorne},
      year={2024},
      eprint={2403.07691},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```