End of training

Browse files

Files changed (10) hide show

README.md +49 -49
final_checkpoint/model-00001-of-00004.safetensors +1 -1
final_checkpoint/model-00002-of-00004.safetensors +1 -1
final_checkpoint/model-00003-of-00004.safetensors +1 -1
final_checkpoint/model-00004-of-00004.safetensors +1 -1
model-00001-of-00004.safetensors +1 -1
model-00002-of-00004.safetensors +1 -1
model-00003-of-00004.safetensors +1 -1
model-00004-of-00004.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -17,15 +17,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [tsavage68/UTI_L3_1000steps_1e5rate_SFT](https://huggingface.co/tsavage68/UTI_L3_1000steps_1e5rate_SFT) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.6931
-- Rewards/chosen: 0.0
-- Rewards/rejected: 0.0
-- Rewards/accuracies: 0.0
-- Rewards/margins: 0.0
-- Logps/rejected: 0.0
-- Logps/chosen: 0.0
-- Logits/rejected: -1.1794
-- Logits/chosen: -1.1794
 ## Model description
@@ -59,46 +59,46 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch   | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6931        | 0.3333  | 25   | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 0.6667  | 50   | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 1.0     | 75   | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 1.3333  | 100  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 1.6667  | 125  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 2.0     | 150  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 2.3333  | 175  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 2.6667  | 200  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 3.0     | 225  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 3.3333  | 250  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 3.6667  | 275  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 4.0     | 300  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 4.3333  | 325  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 4.6667  | 350  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 5.0     | 375  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 5.3333  | 400  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 5.6667  | 425  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 6.0     | 450  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 6.3333  | 475  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 6.6667  | 500  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 7.0     | 525  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 7.3333  | 550  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 7.6667  | 575  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 8.0     | 600  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 8.3333  | 625  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 8.6667  | 650  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 9.0     | 675  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 9.3333  | 700  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 9.6667  | 725  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 10.0    | 750  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 10.3333 | 775  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 10.6667 | 800  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 11.0    | 825  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 11.3333 | 850  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 11.6667 | 875  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 12.0    | 900  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 12.3333 | 925  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 12.6667 | 950  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 13.0    | 975  | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
-| 0.6931        | 13.3333 | 1000 | 0.6931          | 0.0            | 0.0              | 0.0                | 0.0             | 0.0            | 0.0          | -1.1794         | -1.1794       |
 ### Framework versions

 This model is a fine-tuned version of [tsavage68/UTI_L3_1000steps_1e5rate_SFT](https://huggingface.co/tsavage68/UTI_L3_1000steps_1e5rate_SFT) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.2741
+- Rewards/chosen: -0.0170
+- Rewards/rejected: -6.7809
+- Rewards/accuracies: 0.6400
+- Rewards/margins: 6.7639
+- Logps/rejected: -96.2941
+- Logps/chosen: -19.2736
+- Logits/rejected: -1.2664
+- Logits/chosen: -1.2475
 ## Model description
 | Training Loss | Epoch   | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.67          | 0.3333  | 25   | 0.6075          | 0.1072         | -0.0786          | 0.6400             | 0.1858          | -29.2710       | -18.0315     | -1.1541         | -1.1497       |
+| 0.3388        | 0.6667  | 50   | 0.3079          | 0.3701         | -1.1689          | 0.6500             | 1.5390          | -40.1739       | -15.4027     | -1.1704         | -1.1602       |
+| 0.1782        | 1.0     | 75   | 0.2489          | 0.3405         | -3.3088          | 0.6500             | 3.6493          | -61.5725       | -15.6982     | -1.2173         | -1.2009       |
+| 0.1047        | 1.3333  | 100  | 0.2514          | 0.3299         | -4.1473          | 0.6500             | 4.4772          | -69.9577       | -15.8048     | -1.2277         | -1.2096       |
+| 0.1909        | 1.6667  | 125  | 0.2649          | 0.2370         | -4.5013          | 0.6400             | 4.7383          | -73.4979       | -16.7332     | -1.2311         | -1.2144       |
+| 0.364         | 2.0     | 150  | 0.2617          | 0.2324         | -4.8873          | 0.6400             | 5.1197          | -77.3577       | -16.7794     | -1.2337         | -1.2169       |
+| 0.26          | 2.3333  | 175  | 0.2628          | 0.1974         | -5.1469          | 0.6400             | 5.3443          | -79.9539       | -17.1290     | -1.2363         | -1.2194       |
+| 0.2253        | 2.6667  | 200  | 0.2643          | 0.1698         | -5.3745          | 0.6400             | 5.5443          | -82.2301       | -17.4054     | -1.2386         | -1.2217       |
+| 0.208         | 3.0     | 225  | 0.2660          | 0.1513         | -5.5214          | 0.6400             | 5.6727          | -83.6984       | -17.5904     | -1.2407         | -1.2238       |
+| 0.2253        | 3.3333  | 250  | 0.2667          | 0.1290         | -5.6833          | 0.6400             | 5.8124          | -85.3180       | -17.8128     | -1.2430         | -1.2261       |
+| 0.1733        | 3.6667  | 275  | 0.2681          | 0.1116         | -5.8186          | 0.6400             | 5.9301          | -86.6704       | -17.9877     | -1.2452         | -1.2281       |
+| 0.2773        | 4.0     | 300  | 0.2686          | 0.1005         | -5.9317          | 0.6400             | 6.0322          | -87.8013       | -18.0979     | -1.2472         | -1.2299       |
+| 0.2426        | 4.3333  | 325  | 0.2690          | 0.0844         | -6.0431          | 0.6400             | 6.1276          | -88.9161       | -18.2589     | -1.2493         | -1.2319       |
+| 0.156         | 4.6667  | 350  | 0.2692          | 0.0741         | -6.1302          | 0.6400             | 6.2043          | -89.7871       | -18.3627     | -1.2509         | -1.2333       |
+| 0.2253        | 5.0     | 375  | 0.2715          | 0.0625         | -6.2127          | 0.6400             | 6.2752          | -90.6117       | -18.4779     | -1.2530         | -1.2353       |
+| 0.2253        | 5.3333  | 400  | 0.2713          | 0.0535         | -6.2910          | 0.6400             | 6.3446          | -91.3949       | -18.5679     | -1.2545         | -1.2367       |
+| 0.2253        | 5.6667  | 425  | 0.2724          | 0.0411         | -6.3668          | 0.6400             | 6.4079          | -92.1528       | -18.6919     | -1.2563         | -1.2383       |
+| 0.208         | 6.0     | 450  | 0.2729          | 0.0353         | -6.4187          | 0.6400             | 6.4541          | -92.6719       | -18.7501     | -1.2573         | -1.2392       |
+| 0.2773        | 6.3333  | 475  | 0.2736          | 0.0283         | -6.4704          | 0.6400             | 6.4987          | -93.1886       | -18.8205     | -1.2582         | -1.2400       |
+| 0.3119        | 6.6667  | 500  | 0.2725          | 0.0224         | -6.5105          | 0.6400             | 6.5329          | -93.5893       | -18.8791     | -1.2592         | -1.2409       |
+| 0.208         | 7.0     | 525  | 0.2719          | 0.0140         | -6.5739          | 0.6400             | 6.5880          | -94.2240       | -18.9630     | -1.2606         | -1.2422       |
+| 0.1733        | 7.3333  | 550  | 0.2740          | 0.0094         | -6.6118          | 0.6400             | 6.6212          | -94.6024       | -19.0092     | -1.2618         | -1.2433       |
+| 0.2599        | 7.6667  | 575  | 0.2728          | 0.0021         | -6.6411          | 0.6400             | 6.6432          | -94.8961       | -19.0825     | -1.2622         | -1.2436       |
+| 0.2599        | 8.0     | 600  | 0.2736          | -0.0003        | -6.6671          | 0.6400             | 6.6668          | -95.1557       | -19.1060     | -1.2631         | -1.2444       |
+| 0.2253        | 8.3333  | 625  | 0.2728          | -0.0010        | -6.6895          | 0.6400             | 6.6884          | -95.3796       | -19.1137     | -1.2634         | -1.2447       |
+| 0.104         | 8.6667  | 650  | 0.2735          | -0.0019        | -6.7075          | 0.6400             | 6.7056          | -95.5598       | -19.1222     | -1.2641         | -1.2453       |
+| 0.2253        | 9.0     | 675  | 0.2726          | -0.0051        | -6.7243          | 0.6400             | 6.7192          | -95.7281       | -19.1544     | -1.2648         | -1.2460       |
+| 0.2253        | 9.3333  | 700  | 0.2736          | -0.0097        | -6.7446          | 0.6400             | 6.7348          | -95.9304       | -19.2006     | -1.2653         | -1.2465       |
+| 0.2253        | 9.6667  | 725  | 0.2740          | -0.0130        | -6.7590          | 0.6400             | 6.7460          | -96.0751       | -19.2334     | -1.2655         | -1.2466       |
+| 0.3119        | 10.0    | 750  | 0.2742          | -0.0140        | -6.7661          | 0.6400             | 6.7520          | -96.1452       | -19.2434     | -1.2656         | -1.2466       |
+| 0.208         | 10.3333 | 775  | 0.2741          | -0.0154        | -6.7688          | 0.6400             | 6.7534          | -96.1727       | -19.2569     | -1.2660         | -1.2470       |
+| 0.2253        | 10.6667 | 800  | 0.2728          | -0.0133        | -6.7751          | 0.6400             | 6.7618          | -96.2353       | -19.2360     | -1.2661         | -1.2471       |
+| 0.2426        | 11.0    | 825  | 0.2734          | -0.0133        | -6.7787          | 0.6400             | 6.7654          | -96.2719       | -19.2365     | -1.2662         | -1.2473       |
+| 0.2946        | 11.3333 | 850  | 0.2743          | -0.0138        | -6.7737          | 0.6400             | 6.7599          | -96.2217       | -19.2417     | -1.2663         | -1.2474       |
+| 0.1733        | 11.6667 | 875  | 0.2739          | -0.0147        | -6.7807          | 0.6400             | 6.7660          | -96.2913       | -19.2500     | -1.2662         | -1.2472       |
+| 0.156         | 12.0    | 900  | 0.2751          | -0.0158        | -6.7820          | 0.6400             | 6.7661          | -96.3044       | -19.2615     | -1.2664         | -1.2475       |
+| 0.1906        | 12.3333 | 925  | 0.2747          | -0.0152        | -6.7835          | 0.6400             | 6.7682          | -96.3194       | -19.2557     | -1.2663         | -1.2474       |
+| 0.2426        | 12.6667 | 950  | 0.2741          | -0.0190        | -6.7817          | 0.6400             | 6.7627          | -96.3018       | -19.2932     | -1.2665         | -1.2475       |
+| 0.208         | 13.0    | 975  | 0.2741          | -0.0170        | -6.7809          | 0.6400             | 6.7639          | -96.2941       | -19.2736     | -1.2664         | -1.2475       |
+| 0.3119        | 13.3333 | 1000 | 0.2741          | -0.0170        | -6.7809          | 0.6400             | 6.7639          | -96.2941       | -19.2736     | -1.2664         | -1.2475       |
 ### Framework versions

final_checkpoint/model-00001-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4ecfc213ef10a403002239cf57c82d2408559d233d700db44bf415e7cd82efde
 size 4976698592

 version https://git-lfs.github.com/spec/v1
+oid sha256:00849e10fe7d1756fef24d9702058758ca73bd793aeb546b5de6121412bbaa6e
 size 4976698592

final_checkpoint/model-00002-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3b0066b37f89800bfcc599e592fb78145d660e2329cf0599a9baf232cbcd5b80
 size 4999802616

 version https://git-lfs.github.com/spec/v1
+oid sha256:0eecb6b2dbc5031a12c263f63bf09ca81a683910c2935a687864dcd0eacf3acc
 size 4999802616

final_checkpoint/model-00003-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0cf8bd17b66dd5609c3dec9f2df1eec5dd06e5451980b10e07de25c01aa08049
 size 4915916080

 version https://git-lfs.github.com/spec/v1
+oid sha256:6e9ddb4d44980e70499a948bf9ad15a41fb8eb6b956b75233c3a980fa1ca7dab
 size 4915916080

final_checkpoint/model-00004-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:19c684aa227481a47d3e1af16e899ea6ef5326d3852f2e5c226da91b27bbc2ce
 size 1168138808

 version https://git-lfs.github.com/spec/v1
+oid sha256:2fcfa7009d8b7d50d3d7157a849fba13424250d3cd0786ea57d9ad5078443986
 size 1168138808

model-00001-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4ecfc213ef10a403002239cf57c82d2408559d233d700db44bf415e7cd82efde
 size 4976698592

 version https://git-lfs.github.com/spec/v1
+oid sha256:00849e10fe7d1756fef24d9702058758ca73bd793aeb546b5de6121412bbaa6e
 size 4976698592

model-00002-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3b0066b37f89800bfcc599e592fb78145d660e2329cf0599a9baf232cbcd5b80
 size 4999802616

 version https://git-lfs.github.com/spec/v1
+oid sha256:0eecb6b2dbc5031a12c263f63bf09ca81a683910c2935a687864dcd0eacf3acc
 size 4999802616

model-00003-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0cf8bd17b66dd5609c3dec9f2df1eec5dd06e5451980b10e07de25c01aa08049
 size 4915916080

 version https://git-lfs.github.com/spec/v1
+oid sha256:6e9ddb4d44980e70499a948bf9ad15a41fb8eb6b956b75233c3a980fa1ca7dab
 size 4915916080

model-00004-of-00004.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:19c684aa227481a47d3e1af16e899ea6ef5326d3852f2e5c226da91b27bbc2ce
 size 1168138808

 version https://git-lfs.github.com/spec/v1
+oid sha256:2fcfa7009d8b7d50d3d7157a849fba13424250d3cd0786ea57d9ad5078443986
 size 1168138808

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6048ed1d6e6deafc9c6d81954f4c2aa8994e0717cc0ea44687ed825627b15651
 size 4667

 version https://git-lfs.github.com/spec/v1
+oid sha256:effa96b3a74279136b85a063e573b182f0d8c68107e9947c7867cacad6fd3022
 size 4667