Phi-3-mini-128K-instruct with CPO-SimPO

This repository contains the Phi-3-mini-128K-instruct model enhanced with the CPO-SimPO technique. CPO-SimPO combines Contrastive Preference Optimization (CPO) and Simple Preference Optimization (SimPO).

Introduction

Phi-3-mini-128K-instruct is a model optimized for instruction-based tasks. This approach has demonstrated notable improvements in key benchmarks, pushing the boundaries of AI preference learning.

What is CPO-SimPO?

CPO-SimPO is a novel technique, which combines elements from CPO and SimPO:

Contrastive Preference Optimization (CPO): Adds a behavior cloning regularizer to ensure the model remains close to the preferred data distribution.
Simple Preference Optimization (SimPO): Incorporates length normalization and target reward margins to prevent the generation of long but low-quality sequences.

Github

CPO-SIMPO

Model Performance

Base Scores:

MMLU: 68.7
HellaSwag: 80.09
GSM8K: 69.52
ARC: 63.14
Winogrande: 72.85
TruthfulQA: 54.12

New Scores after CPO-SimPO:

MMLU: 68.79
HellaSwag: 80.78
GSM8K: 78.01
ARC: 62.97
Winogrande: 74.47
TruthfulQA: 56.19

Key Improvements:

Enhanced Model Performance: Significant score improvements, particularly in GSM8K (up by 8.49 points!) and TruthfulQA (up by 2.07 points).
Quality Control: Improved generation of high-quality sequences through length normalization and reward margins.
Balanced Optimization: The BC regularizer helps maintain the integrity of learned preferences without deviating from the preferred data distribution.

Usage

Installation

To use this model, you need to install the transformers library from Hugging Face.

pip install transformers

Inference

Here's an example of how to perform inference with the model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(0)

model = AutoModelForCausalLM.from_pretrained(
    "QueryloopAI/Phi-3-mini-128K-instruct-cpo-simpo", 
    device_map="cuda", 
    torch_dtype="auto", 
    trust_remote_code=True, 
)
tokenizer = AutoTokenizer.from_pretrained("Syed-Hasan-8503/Phi-3-mini-128K-instruct-cpo-simpo")

messages = [
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])