Syed-Hasan-8503 commited on
Commit
4c0f5ef
1 Parent(s): dbc12bd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -1
README.md CHANGED
@@ -1,3 +1,85 @@
1
  ---
2
  license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ ---
4
+
5
+ # Phi-3-mini-128K-instruct with CPO-SimPO
6
+
7
+ This repository contains the Phi-3-mini-128K-instruct model enhanced with the CPO-SimPO technique. CPO-SimPO combines Contrastive Preference Optimization (CPO) and Simple Preference Optimization (SimPO).
8
+
9
+ ## Introduction
10
+
11
+ Phi-3-mini-128K-instruct is a model optimized for instruction-based tasks. This approach has demonstrated notable improvements in key benchmarks, pushing the boundaries of AI preference learning.
12
+
13
+ ### What is CPO-SimPO?
14
+
15
+ CPO-SimPO is a novel technique, which combines elements from CPO and SimPO:
16
+
17
+ - **Contrastive Preference Optimization (CPO):** Adds a behavior cloning regularizer to ensure the model remains close to the preferred data distribution.
18
+ - **Simple Preference Optimization (SimPO):** Incorporates length normalization and target reward margins to prevent the generation of long but low-quality sequences.
19
+
20
+ ### Github
21
+
22
+ **[CPO-SIMPO](https://github.com/fe1ixxu/CPO_SIMPO)**
23
+
24
+
25
+ ## Model Performance
26
+
27
+ COMING SOON!
28
+
29
+ - **TruthfulQA:** 56.19
30
+
31
+ ### Key Improvements:
32
+ - **Enhanced Model Performance:** Significant score improvements, particularly in GSM8K (up by 8.49 points!) and TruthfulQA (up by 2.07 points).
33
+ - **Quality Control:** Improved generation of high-quality sequences through length normalization and reward margins.
34
+ - **Balanced Optimization:** The BC regularizer helps maintain the integrity of learned preferences without deviating from the preferred data distribution.
35
+
36
+ ## Usage
37
+
38
+ ### Installation
39
+
40
+ To use this model, you need to install the `transformers` library from Hugging Face.
41
+
42
+ ```bash
43
+ pip install transformers
44
+ ```
45
+
46
+ ### Inference
47
+
48
+ Here's an example of how to perform inference with the model:
49
+
50
+ ```python
51
+ import torch
52
+ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
53
+
54
+ torch.random.manual_seed(0)
55
+
56
+ model = AutoModelForCausalLM.from_pretrained(
57
+ "Syed-Hasan-8503/Phi-3-mini-128K-instruct-cpo-simpo",
58
+ device_map="cuda",
59
+ torch_dtype="auto",
60
+ trust_remote_code=True,
61
+ )
62
+ tokenizer = AutoTokenizer.from_pretrained("Syed-Hasan-8503/Phi-3-mini-128K-instruct-cpo-simpo")
63
+
64
+ messages = [
65
+ {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
66
+ {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
67
+ {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
68
+ ]
69
+
70
+ pipe = pipeline(
71
+ "text-generation",
72
+ model=model,
73
+ tokenizer=tokenizer,
74
+ )
75
+
76
+ generation_args = {
77
+ "max_new_tokens": 500,
78
+ "return_full_text": False,
79
+ "temperature": 0.0,
80
+ "do_sample": False,
81
+ }
82
+
83
+ output = pipe(messages, **generation_args)
84
+ print(output[0]['generated_text'])
85
+ ```