gpt_train_12_384

This model is a fine-tuned version of openai-community/gpt2 on the gokuls/wiki_book_corpus_raw_dataset_tiny dataset. It achieves the following results on the evaluation set:

  • Loss: 8.8125
  • Accuracy: 0.1024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 10
  • distributed_type: multi-GPU
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
10.8984 0.0000 1 10.9062 0.0001
10.8984 0.0001 2 10.9062 0.0001
10.8984 0.0001 3 10.9062 0.0001
10.8984 0.0002 4 10.9062 0.0001
10.9062 0.0002 5 10.9062 0.0001
10.8984 0.0003 6 10.9062 0.0001
10.9062 0.0003 7 10.9062 0.0001
10.9062 0.0004 8 10.9062 0.0001
10.9062 0.0004 9 10.9062 0.0001
10.8984 0.0005 10 10.9062 0.0001
10.8984 0.0005 11 10.9062 0.0001
10.8984 0.0006 12 10.9062 0.0001
10.8984 0.0006 13 10.9062 0.0001
10.9062 0.0007 14 10.9062 0.0001
10.8984 0.0007 15 10.9062 0.0001
10.8984 0.0008 16 10.9062 0.0001
10.9062 0.0008 17 10.9062 0.0001
10.9062 0.0009 18 10.7578 0.0110
10.7734 0.0009 19 10.6562 0.0285
10.6797 0.0010 20 10.5781 0.0469
10.6016 0.0010 21 10.5234 0.0485
10.5234 0.0011 22 10.4766 0.0478
10.5 0.0011 23 10.4375 0.0483
10.4531 0.0012 24 10.4062 0.0507
10.4141 0.0012 25 10.3828 0.0531
10.3672 0.0013 26 10.3594 0.0556
10.3828 0.0013 27 10.3359 0.0562
10.3594 0.0014 28 10.3203 0.0562
10.3281 0.0014 29 10.3047 0.0559
10.3203 0.0015 30 10.2969 0.0563
10.3281 0.0015 31 10.2812 0.0566
10.3359 0.0015 32 10.2734 0.0566
10.2656 0.0016 33 10.2656 0.0570
10.2656 0.0016 34 10.2578 0.0561
10.2656 0.0017 35 10.2422 0.0562
10.2656 0.0017 36 10.2344 0.0575
10.2656 0.0018 37 10.2266 0.0586
10.2109 0.0018 38 10.2188 0.0593
10.2656 0.0019 39 10.2109 0.0596
10.2266 0.0019 40 10.2031 0.0599
10.2109 0.0020 41 10.1953 0.0601
10.2109 0.0020 42 10.1797 0.0604
10.2109 0.0021 43 10.1719 0.0608
10.1484 0.0021 44 10.1641 0.0610
10.1875 0.0022 45 10.1484 0.0611
10.1719 0.0022 46 10.1406 0.0612
10.1484 0.0023 47 10.1328 0.0615
10.1172 0.0023 48 10.1172 0.0622
10.1797 0.0024 49 10.1094 0.0632
10.1016 0.0024 50 10.1016 0.0642
10.1406 0.0025 51 10.0938 0.0651
10.1406 0.0025 52 10.0859 0.0658
10.1094 0.0026 53 10.0781 0.0663
10.1016 0.0026 54 10.0703 0.0669
10.0781 0.0027 55 10.0625 0.0672
10.0703 0.0027 56 10.0547 0.0678
10.0703 0.0028 57 10.0469 0.0681
10.0469 0.0028 58 10.0391 0.0686
10.1016 0.0029 59 10.0312 0.0689
10.0547 0.0029 60 10.0312 0.0694
10.0391 0.0030 61 10.0234 0.0695
10.0547 0.0030 62 10.0156 0.0692
10.0312 0.0031 63 10.0078 0.0688
10.0547 0.0031 64 10.0 0.0687
10.0547 0.0031 65 9.9922 0.0693
9.9922 0.0032 66 9.9844 0.0697
10.0234 0.0032 67 9.9766 0.0705
10.0 0.0033 68 9.9688 0.0711
10.0 0.0033 69 9.9609 0.0715
9.9688 0.0034 70 9.9609 0.0716
9.9922 0.0034 71 9.9531 0.0717
9.9844 0.0035 72 9.9453 0.0716
9.9688 0.0035 73 9.9375 0.0718
9.9453 0.0036 74 9.9297 0.0726
9.9375 0.0036 75 9.9219 0.0734
9.9141 0.0037 76 9.9141 0.0744
9.9062 0.0037 77 9.9062 0.0751
9.9219 0.0038 78 9.9062 0.0755
9.9219 0.0038 79 9.8984 0.0756
9.9219 0.0039 80 9.8906 0.0757
9.875 0.0039 81 9.8828 0.0759
9.9219 0.0040 82 9.875 0.0760
9.875 0.0040 83 9.875 0.0763
9.8672 0.0041 84 9.8672 0.0765
9.9062 0.0041 85 9.8594 0.0769
9.8828 0.0042 86 9.8516 0.0773
9.8594 0.0042 87 9.8516 0.0775
9.8906 0.0043 88 9.8438 0.0777
9.8047 0.0043 89 9.8359 0.0777
9.8203 0.0044 90 9.8359 0.0778
9.8594 0.0044 91 9.8281 0.0781
9.8438 0.0045 92 9.8203 0.0786
9.8438 0.0045 93 9.8203 0.0790
9.8438 0.0046 94 9.8125 0.0793
9.8359 0.0046 95 9.8047 0.0794
9.8281 0.0046 96 9.8047 0.0795
9.8516 0.0047 97 9.7969 0.0796
9.8281 0.0047 98 9.7891 0.0797
9.7734 0.0048 99 9.7891 0.0798
9.8125 0.0048 100 9.7812 0.0802
9.8203 0.0049 101 9.7734 0.0806
9.8281 0.0049 102 9.7734 0.0809
9.7734 0.0050 103 9.7656 0.0811
9.7891 0.0050 104 9.7578 0.0813
9.8047 0.0051 105 9.7578 0.0814
9.7578 0.0051 106 9.75 0.0815
9.7734 0.0052 107 9.75 0.0816
9.7891 0.0052 108 9.7422 0.0818
9.75 0.0053 109 9.7344 0.0819
9.75 0.0053 110 9.7344 0.0821
9.7266 0.0054 111 9.7266 0.0823
9.7656 0.0054 112 9.7188 0.0824
9.7812 0.0055 113 9.7188 0.0824
9.7734 0.0055 114 9.7109 0.0824
9.7266 0.0056 115 9.7109 0.0824
9.7266 0.0056 116 9.7031 0.0826
9.7109 0.0057 117 9.6953 0.0828
9.6719 0.0057 118 9.6953 0.0829
9.6953 0.0058 119 9.6875 0.0830
9.6719 0.0058 120 9.6875 0.0831
9.6953 0.0059 121 9.6797 0.0831
9.6875 0.0059 122 9.6797 0.0831
9.6719 0.0060 123 9.6719 0.0832
9.6719 0.0060 124 9.6641 0.0833
9.625 0.0061 125 9.6641 0.0833
9.6719 0.0061 126 9.6562 0.0834
9.6953 0.0062 127 9.6562 0.0836
9.6719 0.0062 128 9.6484 0.0837
9.6797 0.0062 129 9.6406 0.0838
9.6484 0.0063 130 9.6406 0.0839
9.6719 0.0063 131 9.6328 0.0839
9.6328 0.0064 132 9.6328 0.0839
9.6719 0.0064 133 9.625 0.0839
9.6484 0.0065 134 9.6172 0.0840
9.6406 0.0065 135 9.6172 0.0841
9.6094 0.0066 136 9.6094 0.0843
9.625 0.0066 137 9.6094 0.0845
9.6562 0.0067 138 9.6016 0.0846
9.6172 0.0067 139 9.6016 0.0847
9.6094 0.0068 140 9.5938 0.0847
9.6562 0.0068 141 9.5859 0.0847
9.6562 0.0069 142 9.5859 0.0847
9.6562 0.0069 143 9.5781 0.0848
9.6016 0.0070 144 9.5781 0.0849
9.6094 0.0070 145 9.5703 0.0850
9.5938 0.0071 146 9.5703 0.0851
9.5703 0.0071 147 9.5625 0.0851
9.5859 0.0072 148 9.5625 0.0851
9.625 0.0072 149 9.5547 0.0852
9.5859 0.0073 150 9.5469 0.0854
9.5625 0.0073 151 9.5469 0.0855
9.5547 0.0074 152 9.5391 0.0856
9.5703 0.0074 153 9.5391 0.0858
9.5391 0.0075 154 9.5312 0.0858
9.5391 0.0075 155 9.5312 0.0859
9.5 0.0076 156 9.5234 0.0861
9.5547 0.0076 157 9.5156 0.0863
9.5391 0.0077 158 9.5156 0.0863
9.5312 0.0077 159 9.5156 0.0864
9.5391 0.0077 160 9.5078 0.0864
9.4688 0.0078 161 9.5 0.0866
9.5547 0.0078 162 9.5 0.0867
9.5078 0.0079 163 9.4922 0.0869
9.5078 0.0079 164 9.4922 0.0870
9.5 0.0080 165 9.4844 0.0872
9.5312 0.0080 166 9.4844 0.0875
9.5156 0.0081 167 9.4766 0.0877
9.4844 0.0081 168 9.4766 0.0878
9.4688 0.0082 169 9.4688 0.0878
9.5156 0.0082 170 9.4609 0.0879
9.4922 0.0083 171 9.4609 0.0879
9.4844 0.0083 172 9.4531 0.0878
9.5234 0.0084 173 9.4531 0.0879
9.4844 0.0084 174 9.4453 0.0879
9.4219 0.0085 175 9.4453 0.0880
9.4062 0.0085 176 9.4375 0.0881
9.4375 0.0086 177 9.4375 0.0883
9.4375 0.0086 178 9.4297 0.0885
9.4688 0.0087 179 9.4297 0.0887
9.4453 0.0087 180 9.4219 0.0888
9.4219 0.0088 181 9.4219 0.0890
9.4141 0.0088 182 9.4141 0.0890
9.4375 0.0089 183 9.4062 0.0890
9.3984 0.0089 184 9.4062 0.0890
9.4297 0.0090 185 9.3984 0.0891
9.3984 0.0090 186 9.3984 0.0891
9.3906 0.0091 187 9.3906 0.0892
9.4219 0.0091 188 9.3906 0.0893
9.4062 0.0092 189 9.3828 0.0895
9.375 0.0092 190 9.3828 0.0897
9.3828 0.0093 191 9.375 0.0898
9.3906 0.0093 192 9.375 0.0898
9.3906 0.0093 193 9.3672 0.0899
9.4141 0.0094 194 9.3672 0.0898
9.3203 0.0094 195 9.3594 0.0898
9.3906 0.0095 196 9.3594 0.0898
9.3594 0.0095 197 9.3516 0.0900
9.3516 0.0096 198 9.3516 0.0901
9.3438 0.0096 199 9.3438 0.0902
9.3516 0.0097 200 9.3438 0.0904
9.3125 0.0097 201 9.3359 0.0906
9.3516 0.0098 202 9.3359 0.0907
9.3359 0.0098 203 9.3281 0.0908
9.3516 0.0099 204 9.3281 0.0907
9.3281 0.0099 205 9.3203 0.0906
9.375 0.0100 206 9.3125 0.0905
9.2812 0.0100 207 9.3125 0.0904
9.3281 0.0101 208 9.3047 0.0906
9.3281 0.0101 209 9.3047 0.0908
9.3594 0.0102 210 9.2969 0.0912
9.3438 0.0102 211 9.2969 0.0915
9.2891 0.0103 212 9.2891 0.0916
9.3438 0.0103 213 9.2891 0.0916
9.3047 0.0104 214 9.2812 0.0915
9.2656 0.0104 215 9.2812 0.0914
9.2734 0.0105 216 9.2734 0.0913
9.2891 0.0105 217 9.2734 0.0913
9.2969 0.0106 218 9.2656 0.0913
9.25 0.0106 219 9.2656 0.0914
9.2578 0.0107 220 9.2578 0.0915
9.25 0.0107 221 9.2578 0.0916
9.2656 0.0108 222 9.25 0.0920
9.2578 0.0108 223 9.25 0.0923
9.2734 0.0108 224 9.2422 0.0926
9.2891 0.0109 225 9.2422 0.0929
9.25 0.0109 226 9.2344 0.0928
9.2344 0.0110 227 9.2344 0.0928
9.2656 0.0110 228 9.2266 0.0927
9.2656 0.0111 229 9.2266 0.0928
9.2656 0.0111 230 9.2188 0.0930
9.25 0.0112 231 9.2188 0.0933
9.2891 0.0112 232 9.2109 0.0937
9.2188 0.0113 233 9.2031 0.0938
9.2578 0.0113 234 9.2031 0.0939
9.2422 0.0114 235 9.1953 0.0938
9.2109 0.0114 236 9.1953 0.0935
9.1797 0.0115 237 9.1953 0.0935
9.1953 0.0115 238 9.1875 0.0938
9.1797 0.0116 239 9.1875 0.0943
9.2266 0.0116 240 9.1797 0.0948
9.2109 0.0117 241 9.1719 0.0951
9.1719 0.0117 242 9.1719 0.0954
9.2031 0.0118 243 9.1719 0.0955
9.1953 0.0118 244 9.1641 0.0954
9.1875 0.0119 245 9.1641 0.0950
9.2031 0.0119 246 9.1562 0.0949
9.1797 0.0120 247 9.1484 0.0950
9.1484 0.0120 248 9.1484 0.0952
9.1406 0.0121 249 9.1484 0.0954
9.1641 0.0121 250 9.1406 0.0956
9.1406 0.0122 251 9.1406 0.0956
9.1719 0.0122 252 9.1328 0.0954
9.125 0.0123 253 9.1328 0.0953
9.1719 0.0123 254 9.125 0.0950
9.1797 0.0124 255 9.125 0.0950
9.0859 0.0124 256 9.1172 0.0951
9.1875 0.0124 257 9.1172 0.0957
9.1094 0.0125 258 9.1094 0.0963
9.0938 0.0125 259 9.1094 0.0968
9.1016 0.0126 260 9.1016 0.0969
9.1406 0.0126 261 9.1016 0.0969
9.0781 0.0127 262 9.0938 0.0966
9.1094 0.0127 263 9.0938 0.0963
9.1172 0.0128 264 9.0859 0.0959
9.1172 0.0128 265 9.0859 0.0956
9.125 0.0129 266 9.0859 0.0955
9.1094 0.0129 267 9.0781 0.0957
9.0781 0.0130 268 9.0781 0.0964
9.125 0.0130 269 9.0703 0.0973
9.0547 0.0131 270 9.0703 0.0980
9.0781 0.0131 271 9.0625 0.0983
9.1016 0.0132 272 9.0625 0.0981
9.0703 0.0132 273 9.0547 0.0975
9.0547 0.0133 274 9.0547 0.0969
9.0312 0.0133 275 9.0469 0.0964
9.0938 0.0134 276 9.0469 0.0964
9.0156 0.0134 277 9.0391 0.0967
9.1094 0.0135 278 9.0391 0.0973
9.0859 0.0135 279 9.0312 0.0980
9.0234 0.0136 280 9.0312 0.0984
9.0781 0.0136 281 9.0234 0.0984
9.0547 0.0137 282 9.0234 0.0983
9.0234 0.0137 283 9.0156 0.0979
9.0312 0.0138 284 9.0156 0.0978
9.0391 0.0138 285 9.0078 0.0978
9.0312 0.0139 286 9.0078 0.0980
9.0625 0.0139 287 9.0078 0.0982
9.0234 0.0139 288 9.0 0.0986
9.0078 0.0140 289 9.0 0.0990
9.0 0.0140 290 8.9922 0.0996
9.0078 0.0141 291 8.9922 0.0997
9.0 0.0141 292 8.9844 0.0999
9.0078 0.0142 293 8.9844 0.0999
8.9922 0.0142 294 8.9766 0.0995
9.0078 0.0143 295 8.9766 0.0990
8.9844 0.0143 296 8.9688 0.0985
8.9766 0.0144 297 8.9688 0.0983
8.9531 0.0144 298 8.9609 0.0985
8.9688 0.0145 299 8.9609 0.0988
9.0312 0.0145 300 8.9531 0.0994
9.0156 0.0146 301 8.9531 0.0998
8.9688 0.0146 302 8.9453 0.0999
9.0 0.0147 303 8.9453 0.0997
8.9375 0.0147 304 8.9375 0.0996
8.9766 0.0148 305 8.9375 0.0994
8.9375 0.0148 306 8.9375 0.0994
8.9688 0.0149 307 8.9297 0.0997
8.9531 0.0149 308 8.9297 0.0999
8.9531 0.0150 309 8.9219 0.1002
8.9062 0.0150 310 8.9219 0.1003
8.9375 0.0151 311 8.9141 0.1004
8.8828 0.0151 312 8.9141 0.1003
8.9219 0.0152 313 8.9062 0.1003
8.9219 0.0152 314 8.9062 0.1004
8.9297 0.0153 315 8.9062 0.1009
8.9922 0.0153 316 8.8984 0.1011
8.9062 0.0154 317 8.8984 0.1011
8.9297 0.0154 318 8.8906 0.1011
8.9531 0.0155 319 8.8906 0.1008
8.9531 0.0155 320 8.8828 0.1006
8.9375 0.0155 321 8.8828 0.1004
8.9219 0.0156 322 8.875 0.1002
8.9062 0.0156 323 8.875 0.1004
8.8906 0.0157 324 8.875 0.1006
8.8906 0.0157 325 8.8672 0.1011
8.8672 0.0158 326 8.8672 0.1016
8.875 0.0158 327 8.8594 0.1019
8.8516 0.0159 328 8.8594 0.1022
8.8672 0.0159 329 8.8516 0.1020
8.8984 0.0160 330 8.8516 0.1018
8.875 0.0160 331 8.8438 0.1016
8.8828 0.0161 332 8.8438 0.1014
8.8438 0.0161 333 8.8359 0.1014
8.7969 0.0162 334 8.8359 0.1017
8.8828 0.0162 335 8.8281 0.1020
8.8281 0.0163 336 8.8281 0.1025
8.8203 0.0163 337 8.8281 0.1027
8.8594 0.0164 338 8.8203 0.1028
8.8594 0.0164 339 8.8203 0.1027
8.8203 0.0165 340 8.8125 0.1025
8.8359 0.0165 341 8.8125 0.1024

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.0a0+32f93b1
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
21
Safetensors
Model size
60.3M params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for gokulsrinivasagan/gpt_train_12_384

Finetuned
(1326)
this model

Dataset used to train gokulsrinivasagan/gpt_train_12_384

Evaluation results

  • Accuracy on gokuls/wiki_book_corpus_raw_dataset_tiny
    self-reported
    0.102