afaji
/

fresh-2-layer-medmcqa50000-distill-of-fresh-2-layer-gpqa_EVAL_gpqa

@@ -15,8 +15,8 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 4.6855
-- Accuracy: 0.7828
 ## Model description
@@ -48,42 +48,28 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss | Accuracy |
 |:-------------:|:-----:|:----:|:---------------:|:--------:|
-| No log        | 0.06  | 100  | 14.3260         | 0.3333   |
-| No log        | 0.13  | 200  | 12.1073         | 0.4899   |
-| No log        | 0.19  | 300  | 11.0435         | 0.5101   |
-| No log        | 0.26  | 400  | 9.6543          | 0.5808   |
-| 3.466         | 0.32  | 500  | 9.4758          | 0.5960   |
-| 3.466         | 0.38  | 600  | 8.5372          | 0.6263   |
-| 3.466         | 0.45  | 700  | 8.3611          | 0.6566   |
-| 3.466         | 0.51  | 800  | 7.3273          | 0.6919   |
-| 3.466         | 0.58  | 900  | 8.0522          | 0.6414   |
-| 1.1408        | 0.64  | 1000 | 7.5545          | 0.6515   |
-| 1.1408        | 0.7   | 1100 | 6.9424          | 0.7020   |
-| 1.1408        | 0.77  | 1200 | 6.5618          | 0.6869   |
-| 1.1408        | 0.83  | 1300 | 6.1301          | 0.7121   |
-| 1.1408        | 0.9   | 1400 | 7.3708          | 0.7121   |
-| 0.7156        | 0.96  | 1500 | 5.9791          | 0.7172   |
-| 0.7156        | 1.02  | 1600 | 6.0925          | 0.7172   |
-| 0.7156        | 1.09  | 1700 | 6.1228          | 0.7121   |
-| 0.7156        | 1.15  | 1800 | 6.2473          | 0.7222   |
-| 0.7156        | 1.22  | 1900 | 6.3483          | 0.7172   |
-| 0.4805        | 1.28  | 2000 | 5.6959          | 0.7071   |
-| 0.4805        | 1.34  | 2100 | 5.5578          | 0.7424   |
-| 0.4805        | 1.41  | 2200 | 5.2385          | 0.7626   |
-| 0.4805        | 1.47  | 2300 | 5.6583          | 0.7374   |
-| 0.4805        | 1.54  | 2400 | 5.1442          | 0.7475   |
-| 0.3914        | 1.6   | 2500 | 5.0866          | 0.7677   |
-| 0.3914        | 1.66  | 2600 | 5.0077          | 0.7626   |
-| 0.3914        | 1.73  | 2700 | 4.6813          | 0.7778   |
-| 0.3914        | 1.79  | 2800 | 4.8810          | 0.7677   |
-| 0.3914        | 1.86  | 2900 | 4.6941          | 0.7626   |
-| 0.3368        | 1.92  | 3000 | 4.8332          | 0.7727   |
-| 0.3368        | 1.98  | 3100 | 4.6855          | 0.7828   |
-| 0.3368        | 2.05  | 3200 | 4.7359          | 0.7778   |
-| 0.3368        | 2.11  | 3300 | 4.5992          | 0.7778   |
-| 0.3368        | 2.18  | 3400 | 4.5406          | 0.7677   |
-| 0.2459        | 2.24  | 3500 | 4.8480          | 0.7828   |
-| 0.2459        | 2.3   | 3600 | 4.6215          | 0.7677   |
 ### Framework versions

 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 6.3481
+- Accuracy: 0.7677
 ## Model description
 | Training Loss | Epoch | Step | Validation Loss | Accuracy |
 |:-------------:|:-----:|:----:|:---------------:|:--------:|
+| No log        | 0.06  | 100  | 14.8949         | 0.4141   |
+| No log        | 0.13  | 200  | 11.8675         | 0.4697   |
+| No log        | 0.19  | 300  | 10.6894         | 0.5556   |
+| No log        | 0.26  | 400  | 9.8194          | 0.5404   |
+| 3.5537        | 0.32  | 500  | 9.0542          | 0.5556   |
+| 3.5537        | 0.38  | 600  | 9.0155          | 0.6061   |
+| 3.5537        | 0.45  | 700  | 8.1758          | 0.6768   |
+| 3.5537        | 0.51  | 800  | 7.6983          | 0.6970   |
+| 3.5537        | 0.58  | 900  | 7.6211          | 0.6818   |
+| 1.0971        | 0.64  | 1000 | 7.1361          | 0.6919   |
+| 1.0971        | 0.7   | 1100 | 7.1059          | 0.6717   |
+| 1.0971        | 0.77  | 1200 | 6.9443          | 0.6919   |
+| 1.0971        | 0.83  | 1300 | 6.7089          | 0.7273   |
+| 1.0971        | 0.9   | 1400 | 6.5064          | 0.7172   |
+| 0.699         | 0.96  | 1500 | 5.9161          | 0.7273   |
+| 0.699         | 1.02  | 1600 | 6.6374          | 0.7525   |
+| 0.699         | 1.09  | 1700 | 6.3481          | 0.7677   |
+| 0.699         | 1.15  | 1800 | 5.9385          | 0.7323   |
+| 0.699         | 1.22  | 1900 | 6.2063          | 0.7374   |
+| 0.4733        | 1.28  | 2000 | 5.9173          | 0.7273   |
+| 0.4733        | 1.34  | 2100 | 5.8466          | 0.7626   |
+| 0.4733        | 1.41  | 2200 | 5.6702          | 0.7374   |
 ### Framework versions

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b842d80f44994705f41eeab14f6ec3f508be602adf5a8c24a45d43d4c1d5e7ab
 size 98247916

 version https://git-lfs.github.com/spec/v1
+oid sha256:7e0aef65b647dc4c499530b1a8ae8171fac88507448aa818f7a26334a1fd967b
 size 98247916

tokenizer.json CHANGED Viewed

@@ -1,21 +1,7 @@
 {
   "version": "1.0",
-  "truncation": {
-    "direction": "Right",
-    "max_length": 512,
-    "strategy": "LongestFirst",
-    "stride": 0
-  },
-  "padding": {
-    "strategy": {
-      "Fixed": 512
-    },
-    "direction": "Right",
-    "pad_to_multiple_of": null,
-    "pad_id": 0,
-    "pad_type_id": 0,
-    "pad_token": "[PAD]"
-  },
   "added_tokens": [
     {
       "id": 0,

 {
   "version": "1.0",
+  "truncation": null,
+  "padding": null,
   "added_tokens": [
     {
       "id": 0,