Model save

Browse files

Files changed (7) hide show

README.md +128 -0
all_results.json +9 -0
generation_config.json +7 -0
model.safetensors +1 -1
runs/Jun22_09-07-01_poseidon/events.out.tfevents.1719047556.poseidon.188938.0 +2 -2
train_results.json +9 -0
trainer_state.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,128 @@

+---
+license: apache-2.0
+base_model: martimfasantos/tinyllama-1.1b-sum-sft-full_old
+tags:
+- trl
+- dpo
+- generated_from_trainer
+model-index:
+- name: tinyllama-1.1b-sum-dpo-full_LR5e-8_BS32_2epochs_old
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# tinyllama-1.1b-sum-dpo-full_LR5e-8_BS32_2epochs_old
+This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-sum-sft-full_old](https://huggingface.co/martimfasantos/tinyllama-1.1b-sum-sft-full_old) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.6856
+- Rewards/chosen: -0.0618
+- Rewards/rejected: -0.0788
+- Rewards/accuracies: 0.5955
+- Rewards/margins: 0.0169
+- Logps/rejected: -71.0584
+- Logps/chosen: -64.8961
+- Logits/rejected: -3.0381
+- Logits/chosen: -3.0439
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-08
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- distributed_type: multi-GPU
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 32
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 2
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6932        | 0.0345 | 100  | 0.6932          | 0.0000         | 0.0001           | 0.4805             | -0.0001         | -63.1716       | -58.7091     | -3.1575         | -3.1632       |
+| 0.6931        | 0.0689 | 200  | 0.6932          | -0.0000        | 0.0000           | 0.4863             | -0.0000         | -63.1768       | -58.7119     | -3.1575         | -3.1632       |
+| 0.6931        | 0.1034 | 300  | 0.6932          | 0.0001         | 0.0002           | 0.4756             | -0.0001         | -63.1627       | -58.7008     | -3.1575         | -3.1632       |
+| 0.693         | 0.1378 | 400  | 0.6931          | 0.0002         | 0.0002           | 0.5007             | 0.0000          | -63.1637       | -58.6940     | -3.1572         | -3.1629       |
+| 0.6931        | 0.1723 | 500  | 0.6931          | 0.0003         | 0.0002           | 0.4942             | 0.0001          | -63.1590       | -58.6825     | -3.1569         | -3.1625       |
+| 0.6928        | 0.2068 | 600  | 0.6931          | 0.0006         | 0.0005           | 0.5023             | 0.0002          | -63.1320       | -58.6476     | -3.1556         | -3.1613       |
+| 0.692         | 0.2412 | 700  | 0.6930          | 0.0010         | 0.0006           | 0.5414             | 0.0004          | -63.1153       | -58.6091     | -3.1543         | -3.1599       |
+| 0.6923        | 0.2757 | 800  | 0.6928          | 0.0013         | 0.0006           | 0.5588             | 0.0007          | -63.1219       | -58.5861     | -3.1529         | -3.1585       |
+| 0.6912        | 0.3101 | 900  | 0.6927          | 0.0017         | 0.0007           | 0.5660             | 0.0010          | -63.1103       | -58.5464     | -3.1501         | -3.1558       |
+| 0.6909        | 0.3446 | 1000 | 0.6925          | 0.0018         | 0.0005           | 0.5646             | 0.0013          | -63.1285       | -58.5271     | -3.1481         | -3.1538       |
+| 0.6907        | 0.3790 | 1100 | 0.6924          | 0.0020         | 0.0003           | 0.5604             | 0.0016          | -63.1469       | -58.5154     | -3.1457         | -3.1513       |
+| 0.6898        | 0.4135 | 1200 | 0.6921          | 0.0018         | -0.0003          | 0.5743             | 0.0022          | -63.2143       | -58.5306     | -3.1424         | -3.1480       |
+| 0.688         | 0.4480 | 1300 | 0.6919          | 0.0018         | -0.0008          | 0.5741             | 0.0026          | -63.2606       | -58.5351     | -3.1392         | -3.1448       |
+| 0.6888        | 0.4824 | 1400 | 0.6917          | 0.0011         | -0.0019          | 0.5723             | 0.0030          | -63.3749       | -58.6054     | -3.1364         | -3.1420       |
+| 0.6886        | 0.5169 | 1500 | 0.6915          | 0.0002         | -0.0033          | 0.5737             | 0.0035          | -63.5057       | -58.6878     | -3.1325         | -3.1382       |
+| 0.6885        | 0.5513 | 1600 | 0.6912          | -0.0003        | -0.0043          | 0.5769             | 0.0040          | -63.6057       | -58.7407     | -3.1295         | -3.1351       |
+| 0.6861        | 0.5858 | 1700 | 0.6910          | -0.0016        | -0.0062          | 0.5746             | 0.0046          | -63.8004       | -58.8729     | -3.1253         | -3.1310       |
+| 0.6872        | 0.6203 | 1800 | 0.6908          | -0.0035        | -0.0085          | 0.5839             | 0.0050          | -64.0325       | -59.0604     | -3.1214         | -3.1270       |
+| 0.6862        | 0.6547 | 1900 | 0.6905          | -0.0054        | -0.0110          | 0.5802             | 0.0057          | -64.2826       | -59.2489     | -3.1157         | -3.1214       |
+| 0.6859        | 0.6892 | 2000 | 0.6903          | -0.0080        | -0.0142          | 0.5869             | 0.0062          | -64.5982       | -59.5137     | -3.1119         | -3.1176       |
+| 0.6846        | 0.7236 | 2100 | 0.6899          | -0.0107        | -0.0176          | 0.5829             | 0.0069          | -64.9428       | -59.7842     | -3.1059         | -3.1116       |
+| 0.6861        | 0.7581 | 2200 | 0.6897          | -0.0133        | -0.0207          | 0.5869             | 0.0074          | -65.2491       | -60.0455     | -3.1025         | -3.1081       |
+| 0.6836        | 0.7926 | 2300 | 0.6895          | -0.0168        | -0.0247          | 0.5922             | 0.0079          | -65.6530       | -60.3904     | -3.0987         | -3.1044       |
+| 0.6847        | 0.8270 | 2400 | 0.6892          | -0.0209        | -0.0296          | 0.5869             | 0.0087          | -66.1402       | -60.8069     | -3.0949         | -3.1007       |
+| 0.6838        | 0.8615 | 2500 | 0.6889          | -0.0250        | -0.0343          | 0.5904             | 0.0093          | -66.6113       | -61.2157     | -3.0910         | -3.0968       |
+| 0.6841        | 0.8959 | 2600 | 0.6886          | -0.0284        | -0.0384          | 0.5955             | 0.0100          | -67.0226       | -61.5496     | -3.0877         | -3.0933       |
+| 0.6824        | 0.9304 | 2700 | 0.6883          | -0.0321        | -0.0428          | 0.5855             | 0.0107          | -67.4593       | -61.9186     | -3.0839         | -3.0897       |
+| 0.6824        | 0.9649 | 2800 | 0.6880          | -0.0334        | -0.0447          | 0.5929             | 0.0113          | -67.6515       | -62.0566     | -3.0811         | -3.0868       |
+| 0.6812        | 0.9993 | 2900 | 0.6878          | -0.0363        | -0.0481          | 0.5906             | 0.0118          | -67.9890       | -62.3425     | -3.0775         | -3.0832       |
+| 0.6819        | 1.0338 | 3000 | 0.6877          | -0.0373        | -0.0494          | 0.5932             | 0.0120          | -68.1166       | -62.4440     | -3.0740         | -3.0797       |
+| 0.6796        | 1.0682 | 3100 | 0.6874          | -0.0392        | -0.0518          | 0.5987             | 0.0126          | -68.3560       | -62.6296     | -3.0701         | -3.0759       |
+| 0.6776        | 1.1027 | 3200 | 0.6872          | -0.0409        | -0.0540          | 0.5906             | 0.0131          | -68.5819       | -62.8043     | -3.0674         | -3.0732       |
+| 0.6824        | 1.1371 | 3300 | 0.6870          | -0.0436        | -0.0571          | 0.5946             | 0.0135          | -68.8899       | -63.0750     | -3.0643         | -3.0701       |
+| 0.6787        | 1.1716 | 3400 | 0.6869          | -0.0458        | -0.0596          | 0.5941             | 0.0138          | -69.1415       | -63.2913     | -3.0611         | -3.0668       |
+| 0.6801        | 1.2061 | 3500 | 0.6867          | -0.0482        | -0.0624          | 0.5929             | 0.0142          | -69.4185       | -63.5317     | -3.0588         | -3.0646       |
+| 0.6797        | 1.2405 | 3600 | 0.6866          | -0.0499        | -0.0644          | 0.5915             | 0.0145          | -69.6206       | -63.6998     | -3.0559         | -3.0616       |
+| 0.6783        | 1.2750 | 3700 | 0.6864          | -0.0511        | -0.0659          | 0.5904             | 0.0149          | -69.7728       | -63.8172     | -3.0542         | -3.0599       |
+| 0.6771        | 1.3094 | 3800 | 0.6864          | -0.0521        | -0.0672          | 0.5920             | 0.0151          | -69.8981       | -63.9235     | -3.0522         | -3.0580       |
+| 0.6785        | 1.3439 | 3900 | 0.6862          | -0.0536        | -0.0690          | 0.5922             | 0.0154          | -70.0814       | -64.0693     | -3.0499         | -3.0556       |
+| 0.6807        | 1.3784 | 4000 | 0.6861          | -0.0551        | -0.0708          | 0.5908             | 0.0157          | -70.2593       | -64.2214     | -3.0484         | -3.0541       |
+| 0.6769        | 1.4128 | 4100 | 0.6860          | -0.0563        | -0.0722          | 0.5929             | 0.0159          | -70.3988       | -64.3376     | -3.0467         | -3.0525       |
+| 0.6722        | 1.4473 | 4200 | 0.6859          | -0.0577        | -0.0738          | 0.5946             | 0.0161          | -70.5629       | -64.4845     | -3.0456         | -3.0513       |
+| 0.6769        | 1.4817 | 4300 | 0.6858          | -0.0582        | -0.0745          | 0.5939             | 0.0163          | -70.6349       | -64.5350     | -3.0442         | -3.0499       |
+| 0.6785        | 1.5162 | 4400 | 0.6858          | -0.0586        | -0.0750          | 0.5955             | 0.0164          | -70.6776       | -64.5703     | -3.0432         | -3.0490       |
+| 0.6735        | 1.5507 | 4500 | 0.6858          | -0.0597        | -0.0762          | 0.5920             | 0.0164          | -70.7972       | -64.6853     | -3.0421         | -3.0479       |
+| 0.6786        | 1.5851 | 4600 | 0.6857          | -0.0603        | -0.0769          | 0.5967             | 0.0166          | -70.8698       | -64.7462     | -3.0414         | -3.0471       |
+| 0.6803        | 1.6196 | 4700 | 0.6857          | -0.0603        | -0.0770          | 0.5978             | 0.0167          | -70.8781       | -64.7435     | -3.0408         | -3.0466       |
+| 0.6789        | 1.6540 | 4800 | 0.6856          | -0.0607        | -0.0775          | 0.5929             | 0.0168          | -70.9263       | -64.7804     | -3.0399         | -3.0457       |
+| 0.6723        | 1.6885 | 4900 | 0.6856          | -0.0611        | -0.0779          | 0.5985             | 0.0168          | -70.9741       | -64.8213     | -3.0390         | -3.0448       |
+| 0.6767        | 1.7229 | 5000 | 0.6856          | -0.0613        | -0.0781          | 0.5960             | 0.0169          | -70.9925       | -64.8377     | -3.0388         | -3.0446       |
+| 0.6774        | 1.7574 | 5100 | 0.6856          | -0.0615        | -0.0784          | 0.5939             | 0.0168          | -71.0176       | -64.8661     | -3.0387         | -3.0445       |
+| 0.6748        | 1.7919 | 5200 | 0.6855          | -0.0616        | -0.0786          | 0.5939             | 0.0170          | -71.0377       | -64.8736     | -3.0383         | -3.0441       |
+| 0.6761        | 1.8263 | 5300 | 0.6855          | -0.0617        | -0.0787          | 0.5950             | 0.0170          | -71.0469       | -64.8778     | -3.0380         | -3.0439       |
+| 0.6738        | 1.8608 | 5400 | 0.6855          | -0.0618        | -0.0788          | 0.5985             | 0.0171          | -71.0633       | -64.8885     | -3.0380         | -3.0438       |
+| 0.6821        | 1.8952 | 5500 | 0.6855          | -0.0618        | -0.0788          | 0.5934             | 0.0170          | -71.0638       | -64.8919     | -3.0379         | -3.0437       |
+| 0.6724        | 1.9297 | 5600 | 0.6855          | -0.0619        | -0.0788          | 0.5955             | 0.0170          | -71.0635       | -64.8979     | -3.0379         | -3.0437       |
+| 0.6745        | 1.9642 | 5700 | 0.6855          | -0.0619        | -0.0790          | 0.5957             | 0.0171          | -71.0788       | -64.9037     | -3.0380         | -3.0438       |
+| 0.6767        | 1.9986 | 5800 | 0.6856          | -0.0618        | -0.0788          | 0.5955             | 0.0169          | -71.0584       | -64.8961     | -3.0381         | -3.0439       |
+### Framework versions
+- Transformers 4.41.2
+- Pytorch 2.1.2
+- Datasets 2.19.2
+- Tokenizers 0.19.1

all_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 2.0,
+    "total_flos": 0.0,
+    "train_loss": 0.6830481951767456,
+    "train_runtime": 69306.1943,
+    "train_samples": 92858,
+    "train_samples_per_second": 2.68,
+    "train_steps_per_second": 0.084
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "max_length": 2048,
+  "pad_token_id": 0,
+  "transformers_version": "4.41.2"
+}

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7eb630e7a7ca2736908053901854e3171ca1eda92138b07db712de0bbb749b19
 size 4400216536

 version https://git-lfs.github.com/spec/v1
+oid sha256:938403cdc0f1f6ae89a4c8a68919b6b0997b9b6d8958f7335140961aade22997
 size 4400216536

runs/Jun22_09-07-01_poseidon/events.out.tfevents.1719047556.poseidon.188938.0 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:da61a87aec026d6a6f46aed99e2e121dbf5468f2063edfd81d2c54bfd09ad3f1
-size 447658

 version https://git-lfs.github.com/spec/v1
+oid sha256:2d8faa775097edbdc53c339d300bf4a0a0f7df2143d817156f8146605a0c005b
+size 448012

train_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 2.0,
+    "total_flos": 0.0,
+    "train_loss": 0.6830481951767456,
+    "train_runtime": 69306.1943,
+    "train_samples": 92858,
+    "train_samples_per_second": 2.68,
+    "train_steps_per_second": 0.084
+}

trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff