e-hossam96
/

arabic-nano-gpt-v0

@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 4.9499
 ## Model description
@@ -35,35 +35,173 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.0006
-- train_batch_size: 32
 - eval_batch_size: 64
 - seed: 42
-- gradient_accumulation_steps: 8
 - total_train_batch_size: 256
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.01
-- num_epochs: 2
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 7.6152        | 0.1422 | 100  | 6.9246          |
-| 6.6089        | 0.2844 | 200  | 6.3326          |
-| 6.1811        | 0.4266 | 300  | 5.9524          |
-| 5.8677        | 0.5688 | 400  | 5.6719          |
-| 5.6433        | 0.7110 | 500  | 5.4863          |
-| 5.503         | 0.8532 | 600  | 5.3572          |
-| 5.3964        | 0.9954 | 700  | 5.2521          |
-| 5.2963        | 1.1376 | 800  | 5.1742          |
-| 5.2239        | 1.2798 | 900  | 5.1095          |
-| 5.1744        | 1.4220 | 1000 | 5.0590          |
-| 5.1376        | 1.5642 | 1100 | 5.0150          |
-| 5.1061        | 1.7064 | 1200 | 4.9836          |
-| 5.0786        | 1.8486 | 1300 | 4.9605          |
-| 5.0725        | 1.9908 | 1400 | 4.9499          |
 ### Framework versions

 This model is a fine-tuned version of [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 3.2854
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.001
+- train_batch_size: 64
 - eval_batch_size: 64
 - seed: 42
+- gradient_accumulation_steps: 4
 - total_train_batch_size: 256
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.01
+- num_epochs: 24
 ### Training results
+| Training Loss | Epoch  | Step   | Validation Loss |
+|:-------------:|:------:|:------:|:---------------:|
+| 5.62          | 0.0585 | 1000   | 5.3754          |
+| 4.6527        | 0.1170 | 2000   | 4.4918          |
+| 4.2818        | 0.1755 | 3000   | 4.1137          |
+| 4.1289        | 0.2340 | 4000   | 3.9388          |
+| 4.0021        | 0.2924 | 5000   | 3.8274          |
+| 3.9301        | 0.3509 | 6000   | 3.7534          |
+| 3.8822        | 0.4094 | 7000   | 3.6986          |
+| 3.8375        | 0.4679 | 8000   | 3.6557          |
+| 3.7918        | 0.5264 | 9000   | 3.6266          |
+| 3.7723        | 0.5849 | 10000  | 3.5994          |
+| 3.7549        | 0.6434 | 11000  | 3.5787          |
+| 3.7324        | 0.7019 | 12000  | 3.5612          |
+| 3.7249        | 0.7604 | 13000  | 3.5436          |
+| 3.6989        | 0.8188 | 14000  | 3.5323          |
+| 3.7003        | 0.8773 | 15000  | 3.5169          |
+| 3.6919        | 0.9358 | 16000  | 3.5055          |
+| 3.6717        | 0.9943 | 17000  | 3.4966          |
+| 3.6612        | 1.0528 | 18000  | 3.4868          |
+| 3.6467        | 1.1113 | 19000  | 3.4787          |
+| 3.6497        | 1.1698 | 20000  | 3.4707          |
+| 3.6193        | 1.2283 | 21000  | 3.4639          |
+| 3.6302        | 1.2868 | 22000  | 3.4572          |
+| 3.6225        | 1.3452 | 23000  | 3.4516          |
+| 3.635         | 1.4037 | 24000  | 3.4458          |
+| 3.6115        | 1.4622 | 25000  | 3.4416          |
+| 3.6162        | 1.5207 | 26000  | 3.4348          |
+| 3.6142        | 1.5792 | 27000  | 3.4329          |
+| 3.5956        | 1.6377 | 28000  | 3.4293          |
+| 3.5885        | 1.6962 | 29000  | 3.4226          |
+| 3.603         | 1.7547 | 30000  | 3.4195          |
+| 3.5947        | 1.8132 | 31000  | 3.4142          |
+| 3.588         | 1.8716 | 32000  | 3.4113          |
+| 3.5803        | 1.9301 | 33000  | 3.4065          |
+| 3.5891        | 1.9886 | 34000  | 3.4044          |
+| 3.5801        | 2.0471 | 35000  | 3.4032          |
+| 3.5739        | 2.1056 | 36000  | 3.3988          |
+| 3.5661        | 2.1641 | 37000  | 3.3981          |
+| 3.5657        | 2.2226 | 38000  | 3.3934          |
+| 3.5727        | 2.2811 | 39000  | 3.3907          |
+| 3.5617        | 2.3396 | 40000  | 3.3885          |
+| 3.5579        | 2.3980 | 41000  | 3.3855          |
+| 3.5553        | 2.4565 | 42000  | 3.3816          |
+| 3.5647        | 2.5150 | 43000  | 3.3803          |
+| 3.5531        | 2.5735 | 44000  | 3.3799          |
+| 3.5494        | 2.6320 | 45000  | 3.3777          |
+| 3.5525        | 2.6905 | 46000  | 3.3759          |
+| 3.5487        | 2.7490 | 47000  | 3.3725          |
+| 3.5551        | 2.8075 | 48000  | 3.3711          |
+| 3.5511        | 2.8660 | 49000  | 3.3681          |
+| 3.5463        | 2.9244 | 50000  | 3.3695          |
+| 3.5419        | 2.9829 | 51000  | 3.3660          |
+| 3.5414        | 3.0414 | 52000  | 3.3648          |
+| 3.5388        | 3.0999 | 53000  | 3.3605          |
+| 3.5333        | 3.1584 | 54000  | 3.3619          |
+| 3.525         | 3.2169 | 55000  | 3.3588          |
+| 3.5361        | 3.2754 | 56000  | 3.3572          |
+| 3.5302        | 3.3339 | 57000  | 3.3540          |
+| 3.5355        | 3.3924 | 58000  | 3.3553          |
+| 3.5391        | 3.4508 | 59000  | 3.3504          |
+| 3.531         | 3.5093 | 60000  | 3.3495          |
+| 3.5293        | 3.5678 | 61000  | 3.3483          |
+| 3.5269        | 3.6263 | 62000  | 3.3489          |
+| 3.5181        | 3.6848 | 63000  | 3.3494          |
+| 3.5205        | 3.7433 | 64000  | 3.3480          |
+| 3.5237        | 3.8018 | 65000  | 3.3440          |
+| 3.5316        | 3.8603 | 66000  | 3.3417          |
+| 3.5222        | 3.9188 | 67000  | 3.3433          |
+| 3.5174        | 3.9772 | 68000  | 3.3418          |
+| 3.518         | 4.0357 | 69000  | 3.3414          |
+| 3.5036        | 4.0942 | 70000  | 3.3365          |
+| 3.5101        | 4.1527 | 71000  | 3.3367          |
+| 3.5145        | 4.2112 | 72000  | 3.3361          |
+| 3.5053        | 4.2697 | 73000  | 3.3355          |
+| 3.5153        | 4.3282 | 74000  | 3.3334          |
+| 3.5003        | 4.3867 | 75000  | 3.3334          |
+| 3.5001        | 4.4452 | 76000  | 3.3326          |
+| 3.5114        | 4.5036 | 77000  | 3.3298          |
+| 3.5108        | 4.5621 | 78000  | 3.3292          |
+| 3.4985        | 4.6206 | 79000  | 3.3288          |
+| 3.497         | 4.6791 | 80000  | 3.3303          |
+| 3.4982        | 4.7376 | 81000  | 3.3291          |
+| 3.5068        | 4.7961 | 82000  | 3.3272          |
+| 3.4915        | 4.8546 | 83000  | 3.3244          |
+| 3.5036        | 4.9131 | 84000  | 3.3214          |
+| 3.5027        | 4.9716 | 85000  | 3.3214          |
+| 3.5078        | 5.0300 | 86000  | 3.3225          |
+| 3.5112        | 5.0885 | 87000  | 3.3243          |
+| 3.5049        | 5.1470 | 88000  | 3.3216          |
+| 3.4917        | 5.2055 | 89000  | 3.3192          |
+| 3.4802        | 5.2640 | 90000  | 3.3188          |
+| 3.4971        | 5.3225 | 91000  | 3.3201          |
+| 3.4941        | 5.3810 | 92000  | 3.3175          |
+| 3.4998        | 5.4395 | 93000  | 3.3179          |
+| 3.5011        | 5.4980 | 94000  | 3.3164          |
+| 3.4912        | 5.5564 | 95000  | 3.3180          |
+| 3.4961        | 5.6149 | 96000  | 3.3168          |
+| 3.4833        | 5.6734 | 97000  | 3.3148          |
+| 3.498         | 5.7319 | 98000  | 3.3133          |
+| 3.4892        | 5.7904 | 99000  | 3.3142          |
+| 3.4967        | 5.8489 | 100000 | 3.3142          |
+| 3.4847        | 5.9074 | 101000 | 3.3094          |
+| 3.4899        | 5.9659 | 102000 | 3.3102          |
+| 3.4774        | 6.0244 | 103000 | 3.3110          |
+| 3.4854        | 6.0828 | 104000 | 3.3106          |
+| 3.4873        | 6.1413 | 105000 | 3.3087          |
+| 3.4869        | 6.1998 | 106000 | 3.3102          |
+| 3.4833        | 6.2583 | 107000 | 3.3063          |
+| 3.491         | 6.3168 | 108000 | 3.3082          |
+| 3.4776        | 6.3753 | 109000 | 3.3075          |
+| 3.4924        | 6.4338 | 110000 | 3.3068          |
+| 3.4804        | 6.4923 | 111000 | 3.3050          |
+| 3.4805        | 6.5508 | 112000 | 3.3041          |
+| 3.4892        | 6.6093 | 113000 | 3.3031          |
+| 3.4775        | 6.6677 | 114000 | 3.3032          |
+| 3.481         | 6.7262 | 115000 | 3.3036          |
+| 3.4782        | 6.7847 | 116000 | 3.3025          |
+| 3.4804        | 6.8432 | 117000 | 3.3017          |
+| 3.4841        | 6.9017 | 118000 | 3.2999          |
+| 3.4784        | 6.9602 | 119000 | 3.3008          |
+| 3.4821        | 7.0187 | 120000 | 3.3001          |
+| 3.4671        | 7.0772 | 121000 | 3.3008          |
+| 3.485         | 7.1357 | 122000 | 3.2976          |
+| 3.4737        | 7.1941 | 123000 | 3.2985          |
+| 3.4793        | 7.2526 | 124000 | 3.2979          |
+| 3.4651        | 7.3111 | 125000 | 3.2968          |
+| 3.4847        | 7.3696 | 126000 | 3.2974          |
+| 3.474         | 7.4281 | 127000 | 3.2973          |
+| 3.4769        | 7.4866 | 128000 | 3.2955          |
+| 3.486         | 7.5451 | 129000 | 3.2953          |
+| 3.4684        | 7.6036 | 130000 | 3.2944          |
+| 3.4826        | 7.6621 | 131000 | 3.2949          |
+| 3.4685        | 7.7205 | 132000 | 3.2944          |
+| 3.4608        | 7.7790 | 133000 | 3.2931          |
+| 3.4655        | 7.8375 | 134000 | 3.2953          |
+| 3.4648        | 7.8960 | 135000 | 3.2928          |
+| 3.4632        | 7.9545 | 136000 | 3.2936          |
+| 3.4666        | 8.0130 | 137000 | 3.2902          |
+| 3.4663        | 8.0715 | 138000 | 3.2939          |
+| 3.4713        | 8.1300 | 139000 | 3.2904          |
+| 3.4654        | 8.1885 | 140000 | 3.2917          |
+| 3.466         | 8.2469 | 141000 | 3.2913          |
+| 3.4724        | 8.3054 | 142000 | 3.2889          |
+| 3.4695        | 8.3639 | 143000 | 3.2890          |
+| 3.4729        | 8.4224 | 144000 | 3.2876          |
+| 3.4551        | 8.4809 | 145000 | 3.2898          |
+| 3.4652        | 8.5394 | 146000 | 3.2885          |
+| 3.4689        | 8.5979 | 147000 | 3.2854          |
+| 3.4647        | 8.6564 | 148000 | 3.2857          |
+| 3.4653        | 8.7149 | 149000 | 3.2857          |
+| 3.4552        | 8.7733 | 150000 | 3.2861          |
+| 3.47          | 8.8318 | 151000 | 3.2868          |
+| 3.4627        | 8.8903 | 152000 | 3.2854          |
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8aea299f4f7e52d3140e17c3e44315e4867c88c09826f7d688d7736005ead2be
 size 22080496

 version https://git-lfs.github.com/spec/v1
+oid sha256:f05d284de06f966859d50432ebaa80cf6d6bd6b9485a9984695ea86e6fc9dbda
 size 22080496

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9816fd657dc3ca74aba11ed7ddcfb22cd9813231085213afa94832c8f93cd28d
 size 5240

 version https://git-lfs.github.com/spec/v1
+oid sha256:a5c749c246d14a02ab2c5b292a6312faf35f70556ea7daf54af9e1303e431065
 size 5240