dafrimi
/

starcoderbase7b_2048_context_length_lr_0.0005

Safetensors

gpt_bigcode

Generated from Trainer

Model card Files Files and versions Community

dafrimi commited on Aug 11, 2024

Commit

e1873da

verified ·

1 Parent(s): 60d9922

End of training

Browse files

Files changed (2) hide show

README.md +84 -24
generation_config.json +1 -1

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [bigcode/starcoderbase-7b](https://huggingface.co/bigcode/starcoderbase-7b) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.8933
 ## Model description
@@ -50,33 +50,93 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch | Step | Validation Loss |
-|:-------------:|:-----:|:----:|:---------------:|
-| 0.7067        | 0.05  | 100  | 0.7675          |
-| 0.4933        | 0.1   | 200  | 0.6368          |
-| 0.3121        | 0.15  | 300  | 0.6059          |
-| 0.2769        | 0.2   | 400  | 0.6548          |
-| 0.206         | 0.25  | 500  | 0.7635          |
-| 0.1528        | 0.3   | 600  | 0.6882          |
-| 0.1119        | 0.35  | 700  | 0.6780          |
-| 0.1059        | 0.4   | 800  | 0.7212          |
-| 0.0933        | 0.45  | 900  | 0.6975          |
-| 0.0652        | 0.5   | 1000 | 0.7074          |
-| 0.0556        | 0.55  | 1100 | 0.7506          |
-| 0.0432        | 0.6   | 1200 | 0.7520          |
-| 0.0414        | 0.65  | 1300 | 0.7630          |
-| 0.0475        | 0.7   | 1400 | 0.7558          |
-| 0.0332        | 0.75  | 1500 | 0.8318          |
-| 0.0307        | 0.8   | 1600 | 0.8102          |
-| 0.0283        | 0.85  | 1700 | 0.8601          |
-| 0.0267        | 0.9   | 1800 | 0.8806          |
-| 0.0269        | 0.95  | 1900 | 0.8876          |
-| 0.0272        | 1.0   | 2000 | 0.8933          |
 ### Framework versions
-- Transformers 4.43.3
 - Pytorch 2.4.0a0+07cecf4168.nv24.05
 - Datasets 2.20.0
 - Tokenizers 0.19.1

 This model is a fine-tuned version of [bigcode/starcoderbase-7b](https://huggingface.co/bigcode/starcoderbase-7b) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2.0501
 ## Model description
 ### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 0.6244        | 0.0125 | 25   | 0.5402          |
+| 1.0172        | 0.025  | 50   | 1.4486          |
+| 0.9991        | 0.0375 | 75   | 1.0535          |
+| 0.715         | 0.05   | 100  | 1.6262          |
+| 0.6957        | 0.0625 | 125  | 0.6796          |
+| 0.5182        | 0.075  | 150  | 0.6086          |
+| 0.497         | 0.0875 | 175  | 0.5938          |
+| 0.4611        | 0.1    | 200  | 0.6104          |
+| 0.4046        | 0.1125 | 225  | 0.5857          |
+| 0.3753        | 0.125  | 250  | 0.6633          |
+| 0.3517        | 0.1375 | 275  | 0.6479          |
+| 0.2758        | 0.15   | 300  | 0.5788          |
+| 0.2928        | 0.1625 | 325  | 0.6429          |
+| 0.2669        | 0.175  | 350  | 0.5874          |
+| 0.2608        | 0.1875 | 375  | 0.5497          |
+| 0.2049        | 0.2    | 400  | 0.6268          |
+| 0.2006        | 0.2125 | 425  | 0.6265          |
+| 0.197         | 0.225  | 450  | 0.6236          |
+| 0.177         | 0.2375 | 475  | 0.6124          |
+| 0.1774        | 0.25   | 500  | 0.6231          |
+| 0.1509        | 0.2625 | 525  | 0.5864          |
+| 0.1389        | 0.275  | 550  | 0.6161          |
+| 0.8679        | 0.2875 | 575  | 11.4657         |
+| 6.5575        | 0.3    | 600  | 6.4917          |
+| 6.0031        | 0.3125 | 625  | 5.5229          |
+| 5.1391        | 0.325  | 650  | 5.2191          |
+| 4.4917        | 0.3375 | 675  | 4.6562          |
+| 3.9199        | 0.35   | 700  | 4.2153          |
+| 3.855         | 0.3625 | 725  | 4.0902          |
+| 3.5441        | 0.375  | 750  | 4.0601          |
+| 3.3835        | 0.3875 | 775  | 3.8844          |
+| 3.1663        | 0.4    | 800  | 3.8223          |
+| 2.9285        | 0.4125 | 825  | 3.4541          |
+| 3.0088        | 0.425  | 850  | 3.5302          |
+| 2.9083        | 0.4375 | 875  | 3.3347          |
+| 2.8438        | 0.45   | 900  | 3.3962          |
+| 2.663         | 0.4625 | 925  | 3.0955          |
+| 2.5084        | 0.475  | 950  | 3.0454          |
+| 2.5818        | 0.4875 | 975  | 3.0131          |
+| 2.4068        | 0.5    | 1000 | 3.0179          |
+| 2.3994        | 0.5125 | 1025 | 2.8273          |
+| 2.1942        | 0.525  | 1050 | 2.7333          |
+| 2.1041        | 0.5375 | 1075 | 2.6163          |
+| 2.0861        | 0.55   | 1100 | 2.6006          |
+| 1.9868        | 0.5625 | 1125 | 2.5482          |
+| 1.9496        | 0.575  | 1150 | 2.6079          |
+| 1.8099        | 0.5875 | 1175 | 2.3777          |
+| 1.6454        | 0.6    | 1200 | 2.2547          |
+| 1.6484        | 0.6125 | 1225 | 2.3254          |
+| 1.5729        | 0.625  | 1250 | 2.2835          |
+| 1.5635        | 0.6375 | 1275 | 2.2167          |
+| 1.3961        | 0.65   | 1300 | 2.2751          |
+| 1.3495        | 0.6625 | 1325 | 2.1755          |
+| 1.3524        | 0.675  | 1350 | 2.1377          |
+| 1.3116        | 0.6875 | 1375 | 2.1407          |
+| 1.282         | 0.7    | 1400 | 2.0955          |
+| 1.114         | 0.7125 | 1425 | 2.0334          |
+| 1.0985        | 0.725  | 1450 | 2.0133          |
+| 1.1216        | 0.7375 | 1475 | 2.0139          |
+| 1.0544        | 0.75   | 1500 | 2.0464          |
+| 1.0221        | 0.7625 | 1525 | 1.9984          |
+| 0.9368        | 0.775  | 1550 | 2.0069          |
+| 0.8973        | 0.7875 | 1575 | 1.9595          |
+| 0.9332        | 0.8    | 1600 | 1.9372          |
+| 0.9227        | 0.8125 | 1625 | 1.9910          |
+| 0.8507        | 0.825  | 1650 | 2.0251          |
+| 0.8242        | 0.8375 | 1675 | 1.9892          |
+| 0.7571        | 0.85   | 1700 | 2.0327          |
+| 0.7519        | 0.8625 | 1725 | 1.9949          |
+| 0.7209        | 0.875  | 1750 | 2.0050          |
+| 0.7315        | 0.8875 | 1775 | 2.0076          |
+| 0.77          | 0.9    | 1800 | 2.0315          |
+| 0.7719        | 0.9125 | 1825 | 2.0241          |
+| 0.681         | 0.925  | 1850 | 2.0440          |
+| 0.7371        | 0.9375 | 1875 | 2.0380          |
+| 0.6823        | 0.95   | 1900 | 2.0392          |
+| 0.6891        | 0.9625 | 1925 | 2.0563          |
+| 0.7266        | 0.975  | 1950 | 2.0511          |
+| 0.6888        | 0.9875 | 1975 | 2.0501          |
+| 0.6663        | 1.0    | 2000 | 2.0501          |
 ### Framework versions
+- Transformers 4.44.0
 - Pytorch 2.4.0a0+07cecf4168.nv24.05
 - Datasets 2.20.0
 - Tokenizers 0.19.1

generation_config.json CHANGED Viewed

@@ -2,5 +2,5 @@
   "_from_model_config": true,
   "bos_token_id": 0,
   "eos_token_id": 0,
-  "transformers_version": "4.43.3"
 }

   "_from_model_config": true,
   "bos_token_id": 0,
   "eos_token_id": 0,
+  "transformers_version": "4.44.0"
 }