argilla
/

phi2-lora-distilabel-intel-orca-dpo-pairs

@@ -5,126 +5,40 @@ tags:
 - trl
 - dpo
 - generated_from_trainer
-- distilabel
-- argilla
 base_model: microsoft/phi-2
 model-index:
-- name: phi2-lora-quantized-distilabel-intel-orca-dpo-pairs
   results: []
-datasets:
-- argilla/distilabel-intel-orca-dpo-pairs
-language:
-- en
-pipeline_tag: text-generation
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# phi2-lora-quantized-distilabel-intel-orca-dpo-pairs
-This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs).
-The full training notebook can be found [here](https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ?usp=sharing).
 It achieves the following results on the evaluation set:
-- Loss: 0.0972
-- Rewards/chosen: 0.2699
-- Rewards/rejected: -5.8246
-- Rewards/accuracies: 0.9623
-- Rewards/margins: 6.0944
-- Logps/rejected: -311.1872
-- Logps/chosen: -115.6127
-- Logits/rejected: 0.0766
-- Logits/chosen: 0.0242
 ## Model description
-The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). In order to scale LoRa approached for LLMs, I recommend looking at [predibase/lorax](https://github.com/predibase/lorax).
-You can play around with the model shown below. We load the LoRa adapter and bits_n_bytes config (only when CUDA is available).
-```python
-import torch
-import torch
-from transformers import (
-    AutoModelForCausalLM,
-    AutoTokenizer,
-    BitsAndBytesConfig
-)
-from peft import PeftModel
-# template used for fine-tune
-# template = """\
-# Instruct: {instruction}\n
-# Output: {response}"""
-if torch.cuda.is_available():
-    device = torch.device("cuda")
-    print(f"Using {torch.cuda.get_device_name(0)}")
-    bnb_config = BitsAndBytesConfig(
-        load_in_4bit=True,
-        bnb_4bit_quant_type='nf4',
-        bnb_4bit_compute_dtype='float16',
-        bnb_4bit_use_double_quant=False,
-    )
-elif torch.backends.mps.is_available():
-    device = torch.device("mps")
-    bnb_config = None
-else:
-    device = torch.device("cpu")
-    bnb_config = None
-    print("No GPU available, using CPU instead.")
-config = PeftConfig.from_pretrained("davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs")
-model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype=torch.float16, quantization_config=bnb_config)
-model = PeftModel.from_pretrained(model, "davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs").to(device)
-prompt = "Instruct: What is the capital of France? \nOutput:""
-inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False)
-outputs = model.generate(**inputs)
-text = tokenizer.batch_decode(outputs)[0]
-```
 ## Intended uses & limitations
-This is a LoRa adapter fine-tine for phi-2 and not a full fine-tune of the model. Additionally, I did not spend time updating parameters.
 ## Training and evaluation data
-The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the [distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs). The full training notebook can be found [here](https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ?usp=sharing). Underneath, there are some configs for the adapter and the trainer.
-```python
-peft_config = LoraConfig(
-    lora_alpha=16,
-    lora_dropout=0.5,
-    r=32,
-    target_modules=['k_proj', 'q_proj', 'v_proj', 'fc1', 'fc2'],
-    bias="none",
-    task_type="CAUSAL_LM",
-)
-```
-```python
-training_arguments = TrainingArguments(
-    output_dir=f"./{model_name}",
-    evaluation_strategy="steps",
-    do_eval=True,
-    optim="paged_adamw_8bit",
-    per_device_train_batch_size=2,
-    gradient_accumulation_steps=16,
-    per_device_eval_batch_size=2,
-    log_level="debug",
-    save_steps=20,
-    logging_steps=20,
-    learning_rate=1e-5,
-    eval_steps=20,
-    num_train_epochs=1, # Modified for tutorial purposes
-    max_steps=100,
-    warmup_steps=20,
-    lr_scheduler_type="linear",
-)
-```
 ## Training procedure
@@ -146,28 +60,28 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6805        | 0.06  | 20   | 0.6540          | 0.0096         | -0.0728          | 0.8367             | 0.0824          | -253.6698      | -118.2153    | 0.3760          | 0.3395        |
-| 0.5821        | 0.12  | 40   | 0.4977          | 0.0383         | -0.4385          | 0.9199             | 0.4768          | -257.3268      | -117.9285    | 0.3836          | 0.3356        |
-| 0.4163        | 0.19  | 60   | 0.3225          | 0.0641         | -1.1656          | 0.9257             | 1.2298          | -264.5979      | -117.6701    | 0.3836          | 0.3192        |
-| 0.275         | 0.25  | 80   | 0.2245          | 0.0476         | -2.1180          | 0.9316             | 2.1656          | -274.1212      | -117.8351    | 0.3399          | 0.2698        |
-| 0.1808        | 0.31  | 100  | 0.1771          | -0.0012        | -3.2019          | 0.9366             | 3.2007          | -284.9609      | -118.3238    | 0.2615          | 0.1964        |
-| 0.1405        | 0.37  | 120  | 0.1528          | 0.0185         | -4.0396          | 0.9425             | 4.0581          | -293.3371      | -118.1262    | 0.1983          | 0.1407        |
-| 0.1121        | 0.44  | 140  | 0.1389          | 0.0285         | -4.6518          | 0.9471             | 4.6802          | -299.4591      | -118.0267    | 0.1493          | 0.0980        |
-| 0.1544        | 0.5   | 160  | 0.1289          | 0.0745         | -4.9025          | 0.9506             | 4.9771          | -301.9670      | -117.5659    | 0.1257          | 0.0785        |
-| 0.1594        | 0.56  | 180  | 0.1204          | 0.1435         | -4.8770          | 0.9561             | 5.0205          | -301.7119      | -116.8765    | 0.1168          | 0.0696        |
-| 0.0988        | 0.62  | 200  | 0.1136          | 0.1830         | -5.1569          | 0.9576             | 5.3400          | -304.5108      | -116.4809    | 0.1078          | 0.0579        |
-| 0.1141        | 0.68  | 220  | 0.1080          | 0.2052         | -5.4532          | 0.9580             | 5.6584          | -307.4731      | -116.2591    | 0.0962          | 0.0460        |
-| 0.0943        | 0.75  | 240  | 0.1037          | 0.2326         | -5.6061          | 0.9592             | 5.8387          | -309.0026      | -115.9850    | 0.0913          | 0.0393        |
-| 0.1108        | 0.81  | 260  | 0.1008          | 0.2500         | -5.7399          | 0.9607             | 5.9900          | -310.3409      | -115.8109    | 0.0827          | 0.0316        |
-| 0.1088        | 0.87  | 280  | 0.0987          | 0.2677         | -5.7068          | 0.9619             | 5.9745          | -310.0096      | -115.6346    | 0.0825          | 0.0301        |
-| 0.0741        | 0.93  | 300  | 0.0975          | 0.2701         | -5.7873          | 0.9623             | 6.0574          | -310.8145      | -115.6102    | 0.0788          | 0.0261        |
-| 0.1059        | 1.0   | 320  | 0.0972          | 0.2699         | -5.8246          | 0.9623             | 6.0944          | -311.1872      | -115.6127    | 0.0766          | 0.0242        |
 ### Framework versions
 - PEFT 0.7.1
 - Transformers 4.37.1
-- Pytorch 2.1.0+cu121
 - Datasets 2.16.1
 - Tokenizers 0.15.1

 - trl
 - dpo
 - generated_from_trainer
 base_model: microsoft/phi-2
 model-index:
+- name: phi2-lora-distilabel-intel-orca-dpo-pairs
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# phi2-lora-distilabel-intel-orca-dpo-pairs
+This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.4537
+- Rewards/chosen: -0.0837
+- Rewards/rejected: -1.2628
+- Rewards/accuracies: 0.8301
+- Rewards/margins: 1.1791
+- Logps/rejected: -224.8409
+- Logps/chosen: -203.2228
+- Logits/rejected: 0.4773
+- Logits/chosen: 0.3062
 ## Model description
+More information needed
 ## Intended uses & limitations
+More information needed
 ## Training and evaluation data
+More information needed
 ## Training procedure
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6853        | 0.06  | 20   | 0.6701          | 0.0133         | -0.0368          | 0.6905             | 0.0501          | -212.5803      | -202.2522    | 0.3853          | 0.2532        |
+| 0.6312        | 0.12  | 40   | 0.5884          | 0.0422         | -0.2208          | 0.8138             | 0.2630          | -214.4207      | -201.9638    | 0.4254          | 0.2816        |
+| 0.547         | 0.19  | 60   | 0.5146          | 0.0172         | -0.5786          | 0.8278             | 0.5958          | -217.9983      | -202.2132    | 0.4699          | 0.3110        |
+| 0.4388        | 0.25  | 80   | 0.4893          | -0.0808        | -1.0789          | 0.8293             | 0.9981          | -223.0014      | -203.1934    | 0.5158          | 0.3396        |
+| 0.4871        | 0.31  | 100  | 0.4818          | -0.1298        | -1.2346          | 0.8297             | 1.1048          | -224.5586      | -203.6837    | 0.5133          | 0.3340        |
+| 0.4863        | 0.37  | 120  | 0.4723          | -0.1230        | -1.1718          | 0.8301             | 1.0488          | -223.9305      | -203.6159    | 0.4910          | 0.3167        |
+| 0.4578        | 0.44  | 140  | 0.4666          | -0.1257        | -1.1772          | 0.8301             | 1.0515          | -223.9844      | -203.6428    | 0.4795          | 0.3078        |
+| 0.4587        | 0.5   | 160  | 0.4625          | -0.0746        | -1.1272          | 0.8301             | 1.0526          | -223.4841      | -203.1310    | 0.4857          | 0.3139        |
+| 0.4688        | 0.56  | 180  | 0.4595          | -0.0584        | -1.1194          | 0.8297             | 1.0610          | -223.4062      | -202.9692    | 0.4890          | 0.3171        |
+| 0.4189        | 0.62  | 200  | 0.4579          | -0.0666        | -1.1647          | 0.8297             | 1.0982          | -223.8598      | -203.0511    | 0.4858          | 0.3138        |
+| 0.4392        | 0.68  | 220  | 0.4564          | -0.0697        | -1.1915          | 0.8301             | 1.1219          | -224.1278      | -203.0823    | 0.4824          | 0.3110        |
+| 0.4659        | 0.75  | 240  | 0.4554          | -0.0826        | -1.2245          | 0.8301             | 1.1419          | -224.4574      | -203.2112    | 0.4761          | 0.3052        |
+| 0.4075        | 0.81  | 260  | 0.4544          | -0.0823        | -1.2328          | 0.8301             | 1.1504          | -224.5403      | -203.2089    | 0.4749          | 0.3044        |
+| 0.4015        | 0.87  | 280  | 0.4543          | -0.0833        | -1.2590          | 0.8301             | 1.1757          | -224.8026      | -203.2188    | 0.4779          | 0.3067        |
+| 0.4365        | 0.93  | 300  | 0.4539          | -0.0846        | -1.2658          | 0.8301             | 1.1812          | -224.8702      | -203.2313    | 0.4780          | 0.3067        |
+| 0.4589        | 1.0   | 320  | 0.4537          | -0.0837        | -1.2628          | 0.8301             | 1.1791          | -224.8409      | -203.2228    | 0.4773          | 0.3062        |
 ### Framework versions
 - PEFT 0.7.1
 - Transformers 4.37.1
+- Pytorch 2.1.0+cu118
 - Datasets 2.16.1
 - Tokenizers 0.15.1

adapter_config.json CHANGED Viewed

@@ -19,11 +19,11 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "q_proj",
-    "fc2",
-    "fc1",
     "v_proj",
-    "k_proj"
   ],
   "task_type": "CAUSAL_LM"
 }

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "v_proj",
+    "fc1",
+    "k_proj",
+    "q_proj",
+    "fc2"
   ],
   "task_type": "CAUSAL_LM"
 }

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:248d0921fbc1aa3d12aca2611a87cc405fae69ba5541a37b91e0bce2b6755f3c
 size 167814424

 version https://git-lfs.github.com/spec/v1
+oid sha256:400d50a0b60714c20aeb04ec937d32bd6999d465d274b4fe2001c9602af90dda
 size 167814424

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b4524aa2397f11b2f7313454b12c5d85ab3176d5be70fcbc007e86c2ac9bbdd6
 size 4728

 version https://git-lfs.github.com/spec/v1
+oid sha256:05bbe3be44fd10a655baf9168292abd8adb13b568673063a090ba95efef33405
 size 4728