End of training

Browse files

Files changed (10) hide show

README.md +49 -49
final_checkpoint/model-00001-of-00004.safetensors +1 -1
final_checkpoint/model-00002-of-00004.safetensors +1 -1
final_checkpoint/model-00003-of-00004.safetensors +1 -1
final_checkpoint/model-00004-of-00004.safetensors +1 -1
model-00001-of-00004.safetensors +1 -1
model-00002-of-00004.safetensors +1 -1
model-00003-of-00004.safetensors +1 -1
model-00004-of-00004.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -17,15 +17,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [tsavage68/UTI_L3_1000steps_1e5rate_SFT](https://huggingface.co/tsavage68/UTI_L3_1000steps_1e5rate_SFT) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.6931
-- Rewards/chosen: 0.0
-- Rewards/rejected: 0.0
-- Rewards/accuracies: 0.0
-- Rewards/margins: 0.0
-- Logps/rejected: 0.0
-- Logps/chosen: 0.0
-- Logits/rejected: -1.1794
-- Logits/chosen: -1.1794
 ## Model description
@@ -59,46 +59,46 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch   | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6931        | 0.3333  | 25   | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 0.6667  | 50   | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 1.0     | 75   | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 1.3333  | 100  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 1.6667  | 125  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 2.0     | 150  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 2.3333  | 175  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 2.6667  | 200  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 3.0     | 225  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 3.3333  | 250  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 3.6667  | 275  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 4.0     | 300  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 4.3333  | 325  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 4.6667  | 350  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 5.0     | 375  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 5.3333  | 400  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 5.6667  | 425  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 6.0     | 450  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 6.3333  | 475  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 6.6667  | 500  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 7.0     | 525  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 7.3333  | 550  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 7.6667  | 575  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 8.0     | 600  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 8.3333  | 625  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 8.6667  | 650  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 9.0     | 675  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 9.3333  | 700  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 9.6667  | 725  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 10.0    | 750  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 10.3333 | 775  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 10.6667 | 800  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 11.0    | 825  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 11.3333 | 850  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 11.6667 | 875  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 12.0    | 900  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 12.3333 | 925  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 12.6667 | 950  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 13.0    | 975  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 13.3333 | 1000 | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
 ### Framework versions

 This model is a fine-tuned version of [tsavage68/UTI_L3_1000steps_1e5rate_SFT](https://huggingface.co/tsavage68/UTI_L3_1000steps_1e5rate_SFT) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.0094
+- Rewards/chosen: 3.5398
+- Rewards/rejected: -9.3115
+- Rewards/accuracies: 0.9900
+- Rewards/margins: 12.8514
+- Logps/rejected: -61.8926
+- Logps/chosen: -22.1453
+- Logits/rejected: -1.1592
+- Logits/chosen: -1.1419
 ## Model description
 | Training Loss | Epoch   | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.5446        | 0.3333  | 25   | 0.2409          | 0.7934         | -0.6030          | 0.9800             | 1.3964          | -44.4754       | -27.6381     | -1.1424         | -1.1365       |
+| 0.0009        | 0.6667  | 50   | 0.0261          | 2.3344         | -5.4705          | 0.9800             | 7.8050          | -54.2106       | -24.5560     | -1.1516         | -1.1414       |
+| 0.0001        | 1.0     | 75   | 0.0417          | 2.5058         | -6.7400          | 0.9700             | 9.2458          | -56.7494       | -24.2133     | -1.1557         | -1.1437       |
+| 0.0           | 1.3333  | 100  | 0.0150          | 2.4614         | -7.0530          | 0.9900             | 9.5144          | -57.3755       | -24.3022     | -1.1580         | -1.1452       |
+| 0.0           | 1.6667  | 125  | 0.0165          | 2.8817         | -7.5738          | 0.9900             | 10.4555         | -58.4170       | -23.4615     | -1.1561         | -1.1425       |
+| 0.0175        | 2.0     | 150  | 0.0077          | 2.7979         | -9.1294          | 0.9900             | 11.9273         | -61.5282       | -23.6290     | -1.1592         | -1.1446       |
+| 0.0           | 2.3333  | 175  | 0.0077          | 2.8004         | -9.1559          | 0.9900             | 11.9563         | -61.5813       | -23.6240     | -1.1592         | -1.1445       |
+| 0.0           | 2.6667  | 200  | 0.0098          | 3.5386         | -9.1468          | 0.9900             | 12.6854         | -61.5630       | -22.1478     | -1.1582         | -1.1411       |
+| 0.0           | 3.0     | 225  | 0.0098          | 3.5323         | -9.1598          | 0.9900             | 12.6921         | -61.5891       | -22.1603     | -1.1583         | -1.1411       |
+| 0.0           | 3.3333  | 250  | 0.0099          | 3.5384         | -9.1504          | 0.9900             | 12.6888         | -61.5704       | -22.1482     | -1.1580         | -1.1408       |
+| 0.0           | 3.6667  | 275  | 0.0101          | 3.5390         | -9.1521          | 0.9900             | 12.6912         | -61.5738       | -22.1469     | -1.1582         | -1.1410       |
+| 0.0173        | 4.0     | 300  | 0.0102          | 3.5300         | -9.1689          | 0.9900             | 12.6988         | -61.6072       | -22.1650     | -1.1582         | -1.1410       |
+| 0.0           | 4.3333  | 325  | 0.0095          | 3.5391         | -9.1723          | 0.9900             | 12.7114         | -61.6141       | -22.1467     | -1.1582         | -1.1411       |
+| 0.0173        | 4.6667  | 350  | 0.0098          | 3.5336         | -9.1774          | 0.9900             | 12.7110         | -61.6242       | -22.1576     | -1.1582         | -1.1411       |
+| 0.0           | 5.0     | 375  | 0.0100          | 3.5413         | -9.1860          | 0.9900             | 12.7273         | -61.6416       | -22.1423     | -1.1584         | -1.1412       |
+| 0.0173        | 5.3333  | 400  | 0.0097          | 3.5385         | -9.1956          | 0.9900             | 12.7342         | -61.6608       | -22.1479     | -1.1586         | -1.1414       |
+| 0.0173        | 5.6667  | 425  | 0.0099          | 3.5458         | -9.1729          | 0.9900             | 12.7188         | -61.6153       | -22.1332     | -1.1581         | -1.1409       |
+| 0.0           | 6.0     | 450  | 0.0095          | 3.5342         | -9.2206          | 0.9900             | 12.7548         | -61.7106       | -22.1565     | -1.1583         | -1.1411       |
+| 0.0           | 6.3333  | 475  | 0.0096          | 3.5378         | -9.2207          | 0.9900             | 12.7585         | -61.7109       | -22.1492     | -1.1585         | -1.1413       |
+| 0.0173        | 6.6667  | 500  | 0.0098          | 3.5344         | -9.2288          | 0.9900             | 12.7632         | -61.7271       | -22.1561     | -1.1588         | -1.1415       |
+| 0.0           | 7.0     | 525  | 0.0090          | 3.5387         | -9.2492          | 0.9900             | 12.7878         | -61.7678       | -22.1475     | -1.1587         | -1.1414       |
+| 0.0           | 7.3333  | 550  | 0.0092          | 3.5377         | -9.2629          | 0.9900             | 12.8006         | -61.7953       | -22.1496     | -1.1589         | -1.1417       |
+| 0.0173        | 7.6667  | 575  | 0.0093          | 3.5369         | -9.2697          | 0.9900             | 12.8066         | -61.8089       | -22.1510     | -1.1590         | -1.1418       |
+| 0.0           | 8.0     | 600  | 0.0094          | 3.5387         | -9.2877          | 0.9900             | 12.8264         | -61.8448       | -22.1475     | -1.1587         | -1.1414       |
+| 0.0347        | 8.3333  | 625  | 0.0098          | 3.5219         | -9.2959          | 0.9900             | 12.8178         | -61.8614       | -22.1812     | -1.1590         | -1.1418       |
+| 0.0           | 8.6667  | 650  | 0.0092          | 3.5332         | -9.2917          | 0.9900             | 12.8249         | -61.8529       | -22.1584     | -1.1589         | -1.1416       |
+| 0.0           | 9.0     | 675  | 0.0091          | 3.5324         | -9.3041          | 0.9900             | 12.8365         | -61.8776       | -22.1600     | -1.1591         | -1.1418       |
+| 0.0           | 9.3333  | 700  | 0.0096          | 3.5277         | -9.3067          | 0.9900             | 12.8344         | -61.8829       | -22.1695     | -1.1591         | -1.1418       |
+| 0.0           | 9.6667  | 725  | 0.0092          | 3.5429         | -9.3040          | 0.9900             | 12.8470         | -61.8776       | -22.1390     | -1.1591         | -1.1418       |
+| 0.0           | 10.0    | 750  | 0.0096          | 3.5350         | -9.3114          | 0.9900             | 12.8464         | -61.8923       | -22.1549     | -1.1588         | -1.1415       |
+| 0.0           | 10.3333 | 775  | 0.0094          | 3.5320         | -9.3159          | 0.9900             | 12.8479         | -61.9013       | -22.1609     | -1.1590         | -1.1416       |
+| 0.0           | 10.6667 | 800  | 0.0092          | 3.5430         | -9.3106          | 0.9900             | 12.8535         | -61.8906       | -22.1389     | -1.1591         | -1.1418       |
+| 0.0           | 11.0    | 825  | 0.0090          | 3.5293         | -9.3094          | 0.9900             | 12.8387         | -61.8883       | -22.1663     | -1.1589         | -1.1416       |
+| 0.0           | 11.3333 | 850  | 0.0093          | 3.5309         | -9.3281          | 0.9900             | 12.8591         | -61.9258       | -22.1630     | -1.1590         | -1.1417       |
+| 0.0173        | 11.6667 | 875  | 0.0093          | 3.5340         | -9.3279          | 0.9900             | 12.8618         | -61.9252       | -22.1570     | -1.1592         | -1.1419       |
+| 0.0           | 12.0    | 900  | 0.0092          | 3.5268         | -9.3258          | 0.9900             | 12.8526         | -61.9212       | -22.1713     | -1.1590         | -1.1416       |
+| 0.0           | 12.3333 | 925  | 0.0089          | 3.5337         | -9.3216          | 0.9900             | 12.8553         | -61.9127       | -22.1576     | -1.1590         | -1.1417       |
+| 0.0173        | 12.6667 | 950  | 0.0093          | 3.5404         | -9.3113          | 0.9900             | 12.8518         | -61.8922       | -22.1440     | -1.1591         | -1.1419       |
+| 0.0173        | 13.0    | 975  | 0.0094          | 3.5398         | -9.3115          | 0.9900             | 12.8514         | -61.8926       | -22.1453     | -1.1592         | -1.1419       |
+| 0.0           | 13.3333 | 1000 | 0.0094          | 3.5398         | -9.3115          | 0.9900             | 12.8514         | -61.8926       | -22.1453     | -1.1592         | -1.1419       |
 ### Framework versions

final_checkpoint/model-00001-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4ecfc213ef10a403002239cf57c82d2408559d233d700db44bf415e7cd82efde
 size 4976698592

 version https://git-lfs.github.com/spec/v1
+oid sha256:7cd33b7e87ade284781bec739497a364b59f256b9c0bf5c29cf0cc233d320391
 size 4976698592

final_checkpoint/model-00002-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3b0066b37f89800bfcc599e592fb78145d660e2329cf0599a9baf232cbcd5b80
 size 4999802616

 version https://git-lfs.github.com/spec/v1
+oid sha256:fb69815e3ef4cec90fdaf73b718ff69a6b07107b1a21f1dc00a6828c48cfb789
 size 4999802616

final_checkpoint/model-00003-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0cf8bd17b66dd5609c3dec9f2df1eec5dd06e5451980b10e07de25c01aa08049
 size 4915916080

 version https://git-lfs.github.com/spec/v1
+oid sha256:a0fd7073e29d2b5ecefe039b88df9d8e21f367cb37cd22ef4fd0ad259e7ba528
 size 4915916080

final_checkpoint/model-00004-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:19c684aa227481a47d3e1af16e899ea6ef5326d3852f2e5c226da91b27bbc2ce
 size 1168138808

 version https://git-lfs.github.com/spec/v1
+oid sha256:14a1fc028f6a748b7cf1d3862c384518aa6c7f0c1f7cb6a0f3642d3e3056b185
 size 1168138808

model-00001-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4ecfc213ef10a403002239cf57c82d2408559d233d700db44bf415e7cd82efde
 size 4976698592

 version https://git-lfs.github.com/spec/v1
+oid sha256:7cd33b7e87ade284781bec739497a364b59f256b9c0bf5c29cf0cc233d320391
 size 4976698592

model-00002-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3b0066b37f89800bfcc599e592fb78145d660e2329cf0599a9baf232cbcd5b80
 size 4999802616

 version https://git-lfs.github.com/spec/v1
+oid sha256:fb69815e3ef4cec90fdaf73b718ff69a6b07107b1a21f1dc00a6828c48cfb789
 size 4999802616

model-00003-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0cf8bd17b66dd5609c3dec9f2df1eec5dd06e5451980b10e07de25c01aa08049
 size 4915916080

 version https://git-lfs.github.com/spec/v1
+oid sha256:a0fd7073e29d2b5ecefe039b88df9d8e21f367cb37cd22ef4fd0ad259e7ba528
 size 4915916080

model-00004-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:19c684aa227481a47d3e1af16e899ea6ef5326d3852f2e5c226da91b27bbc2ce
 size 1168138808

 version https://git-lfs.github.com/spec/v1
+oid sha256:14a1fc028f6a748b7cf1d3862c384518aa6c7f0c1f7cb6a0f3642d3e3056b185
 size 1168138808

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ca8d4e9f26c94dfe09d58725653432d04ec6bfa2fa33636fb0a9560f934fa932
 size 4667

 version https://git-lfs.github.com/spec/v1
+oid sha256:e7a5a72ab82918ece0b88a193f0f6a4b6c3d3554380d2de6c946378d5a5fdb24
 size 4667