Commit
•
a402e8a
1
Parent(s):
7693f67
argilla/phi2-lora-distilabel-intel-orca-dpo-pairs
Browse files- README.md +32 -118
- adapter_config.json +4 -4
- adapter_model.safetensors +1 -1
- training_args.bin +1 -1
README.md
CHANGED
@@ -5,126 +5,40 @@ tags:
|
|
5 |
- trl
|
6 |
- dpo
|
7 |
- generated_from_trainer
|
8 |
-
- distilabel
|
9 |
-
- argilla
|
10 |
base_model: microsoft/phi-2
|
11 |
model-index:
|
12 |
-
- name: phi2-lora-
|
13 |
results: []
|
14 |
-
datasets:
|
15 |
-
- argilla/distilabel-intel-orca-dpo-pairs
|
16 |
-
language:
|
17 |
-
- en
|
18 |
-
pipeline_tag: text-generation
|
19 |
---
|
20 |
|
21 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
22 |
should probably proofread and complete it, then remove this comment. -->
|
23 |
|
24 |
-
# phi2-lora-
|
25 |
-
|
26 |
-
This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs).
|
27 |
-
The full training notebook can be found [here](https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ?usp=sharing).
|
28 |
|
|
|
29 |
It achieves the following results on the evaluation set:
|
30 |
-
- Loss: 0.
|
31 |
-
- Rewards/chosen: 0.
|
32 |
-
- Rewards/rejected: -
|
33 |
-
- Rewards/accuracies: 0.
|
34 |
-
- Rewards/margins:
|
35 |
-
- Logps/rejected: -
|
36 |
-
- Logps/chosen: -
|
37 |
-
- Logits/rejected: 0.
|
38 |
-
- Logits/chosen: 0.
|
39 |
|
40 |
## Model description
|
41 |
|
42 |
-
|
43 |
-
|
44 |
-
You can play around with the model shown below. We load the LoRa adapter and bits_n_bytes config (only when CUDA is available).
|
45 |
-
|
46 |
-
```python
|
47 |
-
import torch
|
48 |
-
import torch
|
49 |
-
from transformers import (
|
50 |
-
AutoModelForCausalLM,
|
51 |
-
AutoTokenizer,
|
52 |
-
BitsAndBytesConfig
|
53 |
-
)
|
54 |
-
from peft import PeftModel
|
55 |
-
|
56 |
-
# template used for fine-tune
|
57 |
-
# template = """\
|
58 |
-
# Instruct: {instruction}\n
|
59 |
-
# Output: {response}"""
|
60 |
-
|
61 |
-
if torch.cuda.is_available():
|
62 |
-
device = torch.device("cuda")
|
63 |
-
print(f"Using {torch.cuda.get_device_name(0)}")
|
64 |
-
bnb_config = BitsAndBytesConfig(
|
65 |
-
load_in_4bit=True,
|
66 |
-
bnb_4bit_quant_type='nf4',
|
67 |
-
bnb_4bit_compute_dtype='float16',
|
68 |
-
bnb_4bit_use_double_quant=False,
|
69 |
-
)
|
70 |
-
elif torch.backends.mps.is_available():
|
71 |
-
device = torch.device("mps")
|
72 |
-
bnb_config = None
|
73 |
-
else:
|
74 |
-
device = torch.device("cpu")
|
75 |
-
bnb_config = None
|
76 |
-
print("No GPU available, using CPU instead.")
|
77 |
-
|
78 |
-
config = PeftConfig.from_pretrained("davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs")
|
79 |
-
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype=torch.float16, quantization_config=bnb_config)
|
80 |
-
model = PeftModel.from_pretrained(model, "davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs").to(device)
|
81 |
-
|
82 |
-
prompt = "Instruct: What is the capital of France? \nOutput:""
|
83 |
-
inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False)
|
84 |
-
|
85 |
-
outputs = model.generate(**inputs)
|
86 |
-
text = tokenizer.batch_decode(outputs)[0]
|
87 |
-
```
|
88 |
|
89 |
## Intended uses & limitations
|
90 |
|
91 |
-
|
92 |
|
93 |
## Training and evaluation data
|
94 |
|
95 |
-
|
96 |
-
|
97 |
-
```python
|
98 |
-
peft_config = LoraConfig(
|
99 |
-
lora_alpha=16,
|
100 |
-
lora_dropout=0.5,
|
101 |
-
r=32,
|
102 |
-
target_modules=['k_proj', 'q_proj', 'v_proj', 'fc1', 'fc2'],
|
103 |
-
bias="none",
|
104 |
-
task_type="CAUSAL_LM",
|
105 |
-
)
|
106 |
-
```
|
107 |
-
|
108 |
-
```python
|
109 |
-
training_arguments = TrainingArguments(
|
110 |
-
output_dir=f"./{model_name}",
|
111 |
-
evaluation_strategy="steps",
|
112 |
-
do_eval=True,
|
113 |
-
optim="paged_adamw_8bit",
|
114 |
-
per_device_train_batch_size=2,
|
115 |
-
gradient_accumulation_steps=16,
|
116 |
-
per_device_eval_batch_size=2,
|
117 |
-
log_level="debug",
|
118 |
-
save_steps=20,
|
119 |
-
logging_steps=20,
|
120 |
-
learning_rate=1e-5,
|
121 |
-
eval_steps=20,
|
122 |
-
num_train_epochs=1, # Modified for tutorial purposes
|
123 |
-
max_steps=100,
|
124 |
-
warmup_steps=20,
|
125 |
-
lr_scheduler_type="linear",
|
126 |
-
)
|
127 |
-
```
|
128 |
|
129 |
## Training procedure
|
130 |
|
@@ -146,28 +60,28 @@ The following hyperparameters were used during training:
|
|
146 |
|
147 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
148 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
149 |
-
| 0.
|
150 |
-
| 0.
|
151 |
-
| 0.
|
152 |
-
| 0.
|
153 |
-
| 0.
|
154 |
-
| 0.
|
155 |
-
| 0.
|
156 |
-
| 0.
|
157 |
-
| 0.
|
158 |
-
| 0.
|
159 |
-
| 0.
|
160 |
-
| 0.
|
161 |
-
| 0.
|
162 |
-
| 0.
|
163 |
-
| 0.
|
164 |
-
| 0.
|
165 |
|
166 |
|
167 |
### Framework versions
|
168 |
|
169 |
- PEFT 0.7.1
|
170 |
- Transformers 4.37.1
|
171 |
-
- Pytorch 2.1.0+
|
172 |
- Datasets 2.16.1
|
173 |
- Tokenizers 0.15.1
|
|
|
5 |
- trl
|
6 |
- dpo
|
7 |
- generated_from_trainer
|
|
|
|
|
8 |
base_model: microsoft/phi-2
|
9 |
model-index:
|
10 |
+
- name: phi2-lora-distilabel-intel-orca-dpo-pairs
|
11 |
results: []
|
|
|
|
|
|
|
|
|
|
|
12 |
---
|
13 |
|
14 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
15 |
should probably proofread and complete it, then remove this comment. -->
|
16 |
|
17 |
+
# phi2-lora-distilabel-intel-orca-dpo-pairs
|
|
|
|
|
|
|
18 |
|
19 |
+
This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on an unknown dataset.
|
20 |
It achieves the following results on the evaluation set:
|
21 |
+
- Loss: 0.4537
|
22 |
+
- Rewards/chosen: -0.0837
|
23 |
+
- Rewards/rejected: -1.2628
|
24 |
+
- Rewards/accuracies: 0.8301
|
25 |
+
- Rewards/margins: 1.1791
|
26 |
+
- Logps/rejected: -224.8409
|
27 |
+
- Logps/chosen: -203.2228
|
28 |
+
- Logits/rejected: 0.4773
|
29 |
+
- Logits/chosen: 0.3062
|
30 |
|
31 |
## Model description
|
32 |
|
33 |
+
More information needed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
## Intended uses & limitations
|
36 |
|
37 |
+
More information needed
|
38 |
|
39 |
## Training and evaluation data
|
40 |
|
41 |
+
More information needed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
## Training procedure
|
44 |
|
|
|
60 |
|
61 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
62 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
63 |
+
| 0.6853 | 0.06 | 20 | 0.6701 | 0.0133 | -0.0368 | 0.6905 | 0.0501 | -212.5803 | -202.2522 | 0.3853 | 0.2532 |
|
64 |
+
| 0.6312 | 0.12 | 40 | 0.5884 | 0.0422 | -0.2208 | 0.8138 | 0.2630 | -214.4207 | -201.9638 | 0.4254 | 0.2816 |
|
65 |
+
| 0.547 | 0.19 | 60 | 0.5146 | 0.0172 | -0.5786 | 0.8278 | 0.5958 | -217.9983 | -202.2132 | 0.4699 | 0.3110 |
|
66 |
+
| 0.4388 | 0.25 | 80 | 0.4893 | -0.0808 | -1.0789 | 0.8293 | 0.9981 | -223.0014 | -203.1934 | 0.5158 | 0.3396 |
|
67 |
+
| 0.4871 | 0.31 | 100 | 0.4818 | -0.1298 | -1.2346 | 0.8297 | 1.1048 | -224.5586 | -203.6837 | 0.5133 | 0.3340 |
|
68 |
+
| 0.4863 | 0.37 | 120 | 0.4723 | -0.1230 | -1.1718 | 0.8301 | 1.0488 | -223.9305 | -203.6159 | 0.4910 | 0.3167 |
|
69 |
+
| 0.4578 | 0.44 | 140 | 0.4666 | -0.1257 | -1.1772 | 0.8301 | 1.0515 | -223.9844 | -203.6428 | 0.4795 | 0.3078 |
|
70 |
+
| 0.4587 | 0.5 | 160 | 0.4625 | -0.0746 | -1.1272 | 0.8301 | 1.0526 | -223.4841 | -203.1310 | 0.4857 | 0.3139 |
|
71 |
+
| 0.4688 | 0.56 | 180 | 0.4595 | -0.0584 | -1.1194 | 0.8297 | 1.0610 | -223.4062 | -202.9692 | 0.4890 | 0.3171 |
|
72 |
+
| 0.4189 | 0.62 | 200 | 0.4579 | -0.0666 | -1.1647 | 0.8297 | 1.0982 | -223.8598 | -203.0511 | 0.4858 | 0.3138 |
|
73 |
+
| 0.4392 | 0.68 | 220 | 0.4564 | -0.0697 | -1.1915 | 0.8301 | 1.1219 | -224.1278 | -203.0823 | 0.4824 | 0.3110 |
|
74 |
+
| 0.4659 | 0.75 | 240 | 0.4554 | -0.0826 | -1.2245 | 0.8301 | 1.1419 | -224.4574 | -203.2112 | 0.4761 | 0.3052 |
|
75 |
+
| 0.4075 | 0.81 | 260 | 0.4544 | -0.0823 | -1.2328 | 0.8301 | 1.1504 | -224.5403 | -203.2089 | 0.4749 | 0.3044 |
|
76 |
+
| 0.4015 | 0.87 | 280 | 0.4543 | -0.0833 | -1.2590 | 0.8301 | 1.1757 | -224.8026 | -203.2188 | 0.4779 | 0.3067 |
|
77 |
+
| 0.4365 | 0.93 | 300 | 0.4539 | -0.0846 | -1.2658 | 0.8301 | 1.1812 | -224.8702 | -203.2313 | 0.4780 | 0.3067 |
|
78 |
+
| 0.4589 | 1.0 | 320 | 0.4537 | -0.0837 | -1.2628 | 0.8301 | 1.1791 | -224.8409 | -203.2228 | 0.4773 | 0.3062 |
|
79 |
|
80 |
|
81 |
### Framework versions
|
82 |
|
83 |
- PEFT 0.7.1
|
84 |
- Transformers 4.37.1
|
85 |
+
- Pytorch 2.1.0+cu118
|
86 |
- Datasets 2.16.1
|
87 |
- Tokenizers 0.15.1
|
adapter_config.json
CHANGED
@@ -19,11 +19,11 @@
|
|
19 |
"rank_pattern": {},
|
20 |
"revision": null,
|
21 |
"target_modules": [
|
22 |
-
"q_proj",
|
23 |
-
"fc2",
|
24 |
-
"fc1",
|
25 |
"v_proj",
|
26 |
-
"
|
|
|
|
|
|
|
27 |
],
|
28 |
"task_type": "CAUSAL_LM"
|
29 |
}
|
|
|
19 |
"rank_pattern": {},
|
20 |
"revision": null,
|
21 |
"target_modules": [
|
|
|
|
|
|
|
22 |
"v_proj",
|
23 |
+
"fc1",
|
24 |
+
"k_proj",
|
25 |
+
"q_proj",
|
26 |
+
"fc2"
|
27 |
],
|
28 |
"task_type": "CAUSAL_LM"
|
29 |
}
|
adapter_model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 167814424
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:400d50a0b60714c20aeb04ec937d32bd6999d465d274b4fe2001c9602af90dda
|
3 |
size 167814424
|
training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4728
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:05bbe3be44fd10a655baf9168292abd8adb13b568673063a090ba95efef33405
|
3 |
size 4728
|