metadata

language:
  - en
license: apache-2.0
base_model:
  - mistralai/Mistral-7B-v0.1
datasets:
  - argilla/ultrafeedback-binarized-preferences-cleaned
pipeline_tag: text-generation
model-index:
  - name: Mistral-ORPO-β
    results:
      - task:
          type: text-generation
        dataset:
          name: AlpacaEval 1
          type: AlpacaEval
        metrics:
          - type: AlpacaEval 1.0
            value: 91.41%
            name: Win Rate
          - type: AlpacaEval 2.0
            value: 12.20%
            name: Win Rate
        source:
          url: https://github.com/tatsu-lab/alpaca_eval
          name: self-reported
      - task:
          type: text-generation
        dataset:
          name: MT-Bench
          type: MT-Bench
        metrics:
          - type: MT-Bench
            value: 7.322
            name: Score
        source:
          url: https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/
          name: self-reported

Mistral-ORPO-β (7B)

Mistral-ORPO is a fine-tuned version of mistralai/Mistral-7B-v0.1 using the odds ratio preference optimization (ORPO). With ORPO, the model directly learns the preference without the supervised fine-tuning warmup phase. Mistral-ORPO-β is fine-tuned exclusively on the 61k instances of the cleaned version of UltraFeedback, argilla/ultrafeedback-binarized-preferences-cleaned, by Argilla.

Model Performance

Model Name	Size	Align	MT-Bench	AlpacaEval 1.0	AlpacaEval 2.0
Mistral-`ORPO`-⍺	7B	`ORPO`	7.23	87.92	11.33
Mistral-`ORPO`-β	7B	`ORPO`	7.32	91.41	12.20
Zephyr ($\beta$)	7B	DPO	7.34	90.60	10.99
TULU-2-DPO	13B	DPO	7.00	89.5	10.12
Llama-2-Chat	7B	RLHF	6.27	71.37	4.96
Llama-2-Chat	13B	RLHF	6.65	81.09	7.70

Chat Template

<|user|>
Hi! How are you doing?</s>
<|assistant|>