davidberenstein1957 HF staff commited on
Commit
a402e8a
1 Parent(s): 7693f67

argilla/phi2-lora-distilabel-intel-orca-dpo-pairs

Browse files
Files changed (4) hide show
  1. README.md +32 -118
  2. adapter_config.json +4 -4
  3. adapter_model.safetensors +1 -1
  4. training_args.bin +1 -1
README.md CHANGED
@@ -5,126 +5,40 @@ tags:
5
  - trl
6
  - dpo
7
  - generated_from_trainer
8
- - distilabel
9
- - argilla
10
  base_model: microsoft/phi-2
11
  model-index:
12
- - name: phi2-lora-quantized-distilabel-intel-orca-dpo-pairs
13
  results: []
14
- datasets:
15
- - argilla/distilabel-intel-orca-dpo-pairs
16
- language:
17
- - en
18
- pipeline_tag: text-generation
19
  ---
20
 
21
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
22
  should probably proofread and complete it, then remove this comment. -->
23
 
24
- # phi2-lora-quantized-distilabel-intel-orca-dpo-pairs
25
-
26
- This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs).
27
- The full training notebook can be found [here](https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ?usp=sharing).
28
 
 
29
  It achieves the following results on the evaluation set:
30
- - Loss: 0.0972
31
- - Rewards/chosen: 0.2699
32
- - Rewards/rejected: -5.8246
33
- - Rewards/accuracies: 0.9623
34
- - Rewards/margins: 6.0944
35
- - Logps/rejected: -311.1872
36
- - Logps/chosen: -115.6127
37
- - Logits/rejected: 0.0766
38
- - Logits/chosen: 0.0242
39
 
40
  ## Model description
41
 
42
- The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). In order to scale LoRa approached for LLMs, I recommend looking at [predibase/lorax](https://github.com/predibase/lorax).
43
-
44
- You can play around with the model shown below. We load the LoRa adapter and bits_n_bytes config (only when CUDA is available).
45
-
46
- ```python
47
- import torch
48
- import torch
49
- from transformers import (
50
- AutoModelForCausalLM,
51
- AutoTokenizer,
52
- BitsAndBytesConfig
53
- )
54
- from peft import PeftModel
55
-
56
- # template used for fine-tune
57
- # template = """\
58
- # Instruct: {instruction}\n
59
- # Output: {response}"""
60
-
61
- if torch.cuda.is_available():
62
- device = torch.device("cuda")
63
- print(f"Using {torch.cuda.get_device_name(0)}")
64
- bnb_config = BitsAndBytesConfig(
65
- load_in_4bit=True,
66
- bnb_4bit_quant_type='nf4',
67
- bnb_4bit_compute_dtype='float16',
68
- bnb_4bit_use_double_quant=False,
69
- )
70
- elif torch.backends.mps.is_available():
71
- device = torch.device("mps")
72
- bnb_config = None
73
- else:
74
- device = torch.device("cpu")
75
- bnb_config = None
76
- print("No GPU available, using CPU instead.")
77
-
78
- config = PeftConfig.from_pretrained("davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs")
79
- model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype=torch.float16, quantization_config=bnb_config)
80
- model = PeftModel.from_pretrained(model, "davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs").to(device)
81
-
82
- prompt = "Instruct: What is the capital of France? \nOutput:""
83
- inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False)
84
-
85
- outputs = model.generate(**inputs)
86
- text = tokenizer.batch_decode(outputs)[0]
87
- ```
88
 
89
  ## Intended uses & limitations
90
 
91
- This is a LoRa adapter fine-tine for phi-2 and not a full fine-tune of the model. Additionally, I did not spend time updating parameters.
92
 
93
  ## Training and evaluation data
94
 
95
- The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). The full training notebook can be found [here](https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ?usp=sharing). Underneath, there are some configs for the adapter and the trainer.
96
-
97
- ```python
98
- peft_config = LoraConfig(
99
- lora_alpha=16,
100
- lora_dropout=0.5,
101
- r=32,
102
- target_modules=['k_proj', 'q_proj', 'v_proj', 'fc1', 'fc2'],
103
- bias="none",
104
- task_type="CAUSAL_LM",
105
- )
106
- ```
107
-
108
- ```python
109
- training_arguments = TrainingArguments(
110
- output_dir=f"./{model_name}",
111
- evaluation_strategy="steps",
112
- do_eval=True,
113
- optim="paged_adamw_8bit",
114
- per_device_train_batch_size=2,
115
- gradient_accumulation_steps=16,
116
- per_device_eval_batch_size=2,
117
- log_level="debug",
118
- save_steps=20,
119
- logging_steps=20,
120
- learning_rate=1e-5,
121
- eval_steps=20,
122
- num_train_epochs=1, # Modified for tutorial purposes
123
- max_steps=100,
124
- warmup_steps=20,
125
- lr_scheduler_type="linear",
126
- )
127
- ```
128
 
129
  ## Training procedure
130
 
@@ -146,28 +60,28 @@ The following hyperparameters were used during training:
146
 
147
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
148
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
149
- | 0.6805 | 0.06 | 20 | 0.6540 | 0.0096 | -0.0728 | 0.8367 | 0.0824 | -253.6698 | -118.2153 | 0.3760 | 0.3395 |
150
- | 0.5821 | 0.12 | 40 | 0.4977 | 0.0383 | -0.4385 | 0.9199 | 0.4768 | -257.3268 | -117.9285 | 0.3836 | 0.3356 |
151
- | 0.4163 | 0.19 | 60 | 0.3225 | 0.0641 | -1.1656 | 0.9257 | 1.2298 | -264.5979 | -117.6701 | 0.3836 | 0.3192 |
152
- | 0.275 | 0.25 | 80 | 0.2245 | 0.0476 | -2.1180 | 0.9316 | 2.1656 | -274.1212 | -117.8351 | 0.3399 | 0.2698 |
153
- | 0.1808 | 0.31 | 100 | 0.1771 | -0.0012 | -3.2019 | 0.9366 | 3.2007 | -284.9609 | -118.3238 | 0.2615 | 0.1964 |
154
- | 0.1405 | 0.37 | 120 | 0.1528 | 0.0185 | -4.0396 | 0.9425 | 4.0581 | -293.3371 | -118.1262 | 0.1983 | 0.1407 |
155
- | 0.1121 | 0.44 | 140 | 0.1389 | 0.0285 | -4.6518 | 0.9471 | 4.6802 | -299.4591 | -118.0267 | 0.1493 | 0.0980 |
156
- | 0.1544 | 0.5 | 160 | 0.1289 | 0.0745 | -4.9025 | 0.9506 | 4.9771 | -301.9670 | -117.5659 | 0.1257 | 0.0785 |
157
- | 0.1594 | 0.56 | 180 | 0.1204 | 0.1435 | -4.8770 | 0.9561 | 5.0205 | -301.7119 | -116.8765 | 0.1168 | 0.0696 |
158
- | 0.0988 | 0.62 | 200 | 0.1136 | 0.1830 | -5.1569 | 0.9576 | 5.3400 | -304.5108 | -116.4809 | 0.1078 | 0.0579 |
159
- | 0.1141 | 0.68 | 220 | 0.1080 | 0.2052 | -5.4532 | 0.9580 | 5.6584 | -307.4731 | -116.2591 | 0.0962 | 0.0460 |
160
- | 0.0943 | 0.75 | 240 | 0.1037 | 0.2326 | -5.6061 | 0.9592 | 5.8387 | -309.0026 | -115.9850 | 0.0913 | 0.0393 |
161
- | 0.1108 | 0.81 | 260 | 0.1008 | 0.2500 | -5.7399 | 0.9607 | 5.9900 | -310.3409 | -115.8109 | 0.0827 | 0.0316 |
162
- | 0.1088 | 0.87 | 280 | 0.0987 | 0.2677 | -5.7068 | 0.9619 | 5.9745 | -310.0096 | -115.6346 | 0.0825 | 0.0301 |
163
- | 0.0741 | 0.93 | 300 | 0.0975 | 0.2701 | -5.7873 | 0.9623 | 6.0574 | -310.8145 | -115.6102 | 0.0788 | 0.0261 |
164
- | 0.1059 | 1.0 | 320 | 0.0972 | 0.2699 | -5.8246 | 0.9623 | 6.0944 | -311.1872 | -115.6127 | 0.0766 | 0.0242 |
165
 
166
 
167
  ### Framework versions
168
 
169
  - PEFT 0.7.1
170
  - Transformers 4.37.1
171
- - Pytorch 2.1.0+cu121
172
  - Datasets 2.16.1
173
  - Tokenizers 0.15.1
 
5
  - trl
6
  - dpo
7
  - generated_from_trainer
 
 
8
  base_model: microsoft/phi-2
9
  model-index:
10
+ - name: phi2-lora-distilabel-intel-orca-dpo-pairs
11
  results: []
 
 
 
 
 
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
+ # phi2-lora-distilabel-intel-orca-dpo-pairs
 
 
 
18
 
19
+ This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.4537
22
+ - Rewards/chosen: -0.0837
23
+ - Rewards/rejected: -1.2628
24
+ - Rewards/accuracies: 0.8301
25
+ - Rewards/margins: 1.1791
26
+ - Logps/rejected: -224.8409
27
+ - Logps/chosen: -203.2228
28
+ - Logits/rejected: 0.4773
29
+ - Logits/chosen: 0.3062
30
 
31
  ## Model description
32
 
33
+ More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  ## Intended uses & limitations
36
 
37
+ More information needed
38
 
39
  ## Training and evaluation data
40
 
41
+ More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ## Training procedure
44
 
 
60
 
61
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
+ | 0.6853 | 0.06 | 20 | 0.6701 | 0.0133 | -0.0368 | 0.6905 | 0.0501 | -212.5803 | -202.2522 | 0.3853 | 0.2532 |
64
+ | 0.6312 | 0.12 | 40 | 0.5884 | 0.0422 | -0.2208 | 0.8138 | 0.2630 | -214.4207 | -201.9638 | 0.4254 | 0.2816 |
65
+ | 0.547 | 0.19 | 60 | 0.5146 | 0.0172 | -0.5786 | 0.8278 | 0.5958 | -217.9983 | -202.2132 | 0.4699 | 0.3110 |
66
+ | 0.4388 | 0.25 | 80 | 0.4893 | -0.0808 | -1.0789 | 0.8293 | 0.9981 | -223.0014 | -203.1934 | 0.5158 | 0.3396 |
67
+ | 0.4871 | 0.31 | 100 | 0.4818 | -0.1298 | -1.2346 | 0.8297 | 1.1048 | -224.5586 | -203.6837 | 0.5133 | 0.3340 |
68
+ | 0.4863 | 0.37 | 120 | 0.4723 | -0.1230 | -1.1718 | 0.8301 | 1.0488 | -223.9305 | -203.6159 | 0.4910 | 0.3167 |
69
+ | 0.4578 | 0.44 | 140 | 0.4666 | -0.1257 | -1.1772 | 0.8301 | 1.0515 | -223.9844 | -203.6428 | 0.4795 | 0.3078 |
70
+ | 0.4587 | 0.5 | 160 | 0.4625 | -0.0746 | -1.1272 | 0.8301 | 1.0526 | -223.4841 | -203.1310 | 0.4857 | 0.3139 |
71
+ | 0.4688 | 0.56 | 180 | 0.4595 | -0.0584 | -1.1194 | 0.8297 | 1.0610 | -223.4062 | -202.9692 | 0.4890 | 0.3171 |
72
+ | 0.4189 | 0.62 | 200 | 0.4579 | -0.0666 | -1.1647 | 0.8297 | 1.0982 | -223.8598 | -203.0511 | 0.4858 | 0.3138 |
73
+ | 0.4392 | 0.68 | 220 | 0.4564 | -0.0697 | -1.1915 | 0.8301 | 1.1219 | -224.1278 | -203.0823 | 0.4824 | 0.3110 |
74
+ | 0.4659 | 0.75 | 240 | 0.4554 | -0.0826 | -1.2245 | 0.8301 | 1.1419 | -224.4574 | -203.2112 | 0.4761 | 0.3052 |
75
+ | 0.4075 | 0.81 | 260 | 0.4544 | -0.0823 | -1.2328 | 0.8301 | 1.1504 | -224.5403 | -203.2089 | 0.4749 | 0.3044 |
76
+ | 0.4015 | 0.87 | 280 | 0.4543 | -0.0833 | -1.2590 | 0.8301 | 1.1757 | -224.8026 | -203.2188 | 0.4779 | 0.3067 |
77
+ | 0.4365 | 0.93 | 300 | 0.4539 | -0.0846 | -1.2658 | 0.8301 | 1.1812 | -224.8702 | -203.2313 | 0.4780 | 0.3067 |
78
+ | 0.4589 | 1.0 | 320 | 0.4537 | -0.0837 | -1.2628 | 0.8301 | 1.1791 | -224.8409 | -203.2228 | 0.4773 | 0.3062 |
79
 
80
 
81
  ### Framework versions
82
 
83
  - PEFT 0.7.1
84
  - Transformers 4.37.1
85
+ - Pytorch 2.1.0+cu118
86
  - Datasets 2.16.1
87
  - Tokenizers 0.15.1
adapter_config.json CHANGED
@@ -19,11 +19,11 @@
19
  "rank_pattern": {},
20
  "revision": null,
21
  "target_modules": [
22
- "q_proj",
23
- "fc2",
24
- "fc1",
25
  "v_proj",
26
- "k_proj"
 
 
 
27
  ],
28
  "task_type": "CAUSAL_LM"
29
  }
 
19
  "rank_pattern": {},
20
  "revision": null,
21
  "target_modules": [
 
 
 
22
  "v_proj",
23
+ "fc1",
24
+ "k_proj",
25
+ "q_proj",
26
+ "fc2"
27
  ],
28
  "task_type": "CAUSAL_LM"
29
  }
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:248d0921fbc1aa3d12aca2611a87cc405fae69ba5541a37b91e0bce2b6755f3c
3
  size 167814424
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:400d50a0b60714c20aeb04ec937d32bd6999d465d274b4fe2001c9602af90dda
3
  size 167814424
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b4524aa2397f11b2f7313454b12c5d85ab3176d5be70fcbc007e86c2ac9bbdd6
3
  size 4728
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:05bbe3be44fd10a655baf9168292abd8adb13b568673063a090ba95efef33405
3
  size 4728