gpt_train_12_128 / README.md
gokulsrinivasagan's picture
End of training
3c1649d verified
metadata
license: mit
base_model: openai-community/gpt2
tags:
  - generated_from_trainer
datasets:
  - gokuls/wiki_book_corpus_raw_dataset_tiny
metrics:
  - accuracy
model-index:
  - name: gpt_train_12_128
    results:
      - task:
          name: Causal Language Modeling
          type: text-generation
        dataset:
          name: gokuls/wiki_book_corpus_raw_dataset_tiny
          type: gokuls/wiki_book_corpus_raw_dataset_tiny
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.07807518032045319

gpt_train_12_128

This model is a fine-tuned version of openai-community/gpt2 on the gokuls/wiki_book_corpus_raw_dataset_tiny dataset. It achieves the following results on the evaluation set:

  • Loss: 10.0781
  • Accuracy: 0.0781

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 48
  • eval_batch_size: 48
  • seed: 10
  • distributed_type: multi-GPU
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
10.8438 0.0001 1 10.8438 0.0103
10.8359 0.0001 2 10.8438 0.0103
10.8438 0.0002 3 10.8438 0.0103
10.8359 0.0003 4 10.8438 0.0103
10.8438 0.0004 5 10.8438 0.0103
10.8438 0.0004 6 10.8438 0.0103
10.8359 0.0005 7 10.8438 0.0103
10.8359 0.0006 8 10.8438 0.0103
10.8438 0.0007 9 10.8438 0.0103
10.8359 0.0007 10 10.8438 0.0103
10.8359 0.0008 11 10.8438 0.0103
10.8438 0.0009 12 10.8438 0.0103
10.8438 0.0009 13 10.8438 0.0103
10.8359 0.0010 14 10.8438 0.0103
10.8359 0.0011 15 10.8438 0.0103
10.8438 0.0012 16 10.8438 0.0103
10.8359 0.0012 17 10.8438 0.0103
10.8438 0.0013 18 10.8281 0.0113
10.8203 0.0014 19 10.8125 0.0116
10.8203 0.0015 20 10.8047 0.0117
10.8047 0.0015 21 10.7891 0.0118
10.7969 0.0016 22 10.7734 0.0118
10.7812 0.0017 23 10.7656 0.0118
10.7656 0.0017 24 10.75 0.0119
10.7578 0.0018 25 10.7344 0.0121
10.75 0.0019 26 10.7266 0.0124
10.7344 0.0020 27 10.7188 0.0131
10.7266 0.0020 28 10.7031 0.0144
10.7109 0.0021 29 10.6953 0.0165
10.7031 0.0022 30 10.6875 0.0196
10.7031 0.0023 31 10.6797 0.0236
10.6875 0.0023 32 10.6719 0.0282
10.6797 0.0024 33 10.6641 0.0330
10.6719 0.0025 34 10.6641 0.0375
10.6719 0.0025 35 10.6562 0.0409
10.6719 0.0026 36 10.6484 0.0437
10.6484 0.0027 37 10.6484 0.0461
10.6562 0.0028 38 10.6406 0.0481
10.6484 0.0028 39 10.6406 0.0498
10.6484 0.0029 40 10.6328 0.0511
10.6406 0.0030 41 10.6328 0.0521
10.6406 0.0031 42 10.625 0.0529
10.6406 0.0031 43 10.625 0.0535
10.625 0.0032 44 10.6172 0.0539
10.625 0.0033 45 10.6172 0.0542
10.625 0.0033 46 10.6172 0.0543
10.6172 0.0034 47 10.6094 0.0544
10.625 0.0035 48 10.6094 0.0545
10.6172 0.0036 49 10.6016 0.0545
10.6016 0.0036 50 10.6016 0.0545
10.6016 0.0037 51 10.6016 0.0545
10.6016 0.0038 52 10.5938 0.0546
10.6016 0.0039 53 10.5938 0.0545
10.5938 0.0039 54 10.5938 0.0545
10.6016 0.0040 55 10.5859 0.0545
10.5859 0.0041 56 10.5859 0.0545
10.6016 0.0041 57 10.5859 0.0545
10.5859 0.0042 58 10.5859 0.0546
10.5859 0.0043 59 10.5781 0.0547
10.5781 0.0044 60 10.5781 0.0548
10.5781 0.0044 61 10.5781 0.0550
10.5781 0.0045 62 10.5703 0.0553
10.5781 0.0046 63 10.5703 0.0557
10.5703 0.0046 64 10.5703 0.0561
10.5781 0.0047 65 10.5625 0.0566
10.5625 0.0048 66 10.5625 0.0570
10.5781 0.0049 67 10.5625 0.0573
10.5703 0.0049 68 10.5547 0.0575
10.5625 0.0050 69 10.5547 0.0577
10.5625 0.0051 70 10.5547 0.0578
10.5625 0.0052 71 10.5547 0.0579
10.5547 0.0052 72 10.5469 0.0580
10.5469 0.0053 73 10.5469 0.0580
10.5469 0.0054 74 10.5469 0.0580
10.5547 0.0054 75 10.5391 0.0580
10.5547 0.0055 76 10.5391 0.0580
10.5469 0.0056 77 10.5391 0.0582
10.5469 0.0057 78 10.5391 0.0582
10.5312 0.0057 79 10.5312 0.0584
10.5312 0.0058 80 10.5312 0.0586
10.5312 0.0059 81 10.5312 0.0590
10.5312 0.0060 82 10.5312 0.0593
10.5312 0.0060 83 10.5234 0.0597
10.5234 0.0061 84 10.5234 0.0600
10.5312 0.0062 85 10.5234 0.0602
10.5312 0.0062 86 10.5234 0.0603
10.5234 0.0063 87 10.5156 0.0604
10.5156 0.0064 88 10.5156 0.0605
10.5234 0.0065 89 10.5156 0.0606
10.5156 0.0065 90 10.5156 0.0606
10.5156 0.0066 91 10.5078 0.0606
10.5156 0.0067 92 10.5078 0.0605
10.5156 0.0068 93 10.5078 0.0603
10.5156 0.0068 94 10.5078 0.0602
10.5234 0.0069 95 10.5 0.0601
10.5156 0.0070 96 10.5 0.0602
10.5078 0.0070 97 10.5 0.0603
10.5 0.0071 98 10.5 0.0603
10.5078 0.0072 99 10.5 0.0604
10.5078 0.0073 100 10.4922 0.0606
10.5 0.0073 101 10.4922 0.0607
10.4922 0.0074 102 10.4922 0.0609
10.4922 0.0075 103 10.4922 0.0612
10.4844 0.0076 104 10.4844 0.0614
10.4922 0.0076 105 10.4844 0.0617
10.4922 0.0077 106 10.4844 0.0619
10.4844 0.0078 107 10.4844 0.0622
10.4922 0.0078 108 10.4766 0.0625
10.4844 0.0079 109 10.4766 0.0628
10.4766 0.0080 110 10.4766 0.0630
10.4844 0.0081 111 10.4766 0.0632
10.4766 0.0081 112 10.4766 0.0634
10.4844 0.0082 113 10.4688 0.0636
10.4766 0.0083 114 10.4688 0.0638
10.4766 0.0084 115 10.4688 0.0640
10.4844 0.0084 116 10.4688 0.0643
10.4531 0.0085 117 10.4609 0.0644
10.4609 0.0086 118 10.4609 0.0647
10.4609 0.0086 119 10.4609 0.0648
10.4688 0.0087 120 10.4609 0.0649
10.4609 0.0088 121 10.4609 0.0651
10.4609 0.0089 122 10.4531 0.0653
10.4531 0.0089 123 10.4531 0.0656
10.4531 0.0090 124 10.4531 0.0659
10.4531 0.0091 125 10.4531 0.0660
10.4531 0.0092 126 10.4453 0.0662
10.4531 0.0092 127 10.4453 0.0664
10.4453 0.0093 128 10.4453 0.0667
10.4531 0.0094 129 10.4453 0.0670
10.4375 0.0094 130 10.4453 0.0673
10.4453 0.0095 131 10.4375 0.0676
10.4375 0.0096 132 10.4375 0.0678
10.4375 0.0097 133 10.4375 0.0679
10.4297 0.0097 134 10.4375 0.0679
10.4453 0.0098 135 10.4297 0.0678
10.4375 0.0099 136 10.4297 0.0677
10.4375 0.0100 137 10.4297 0.0677
10.4219 0.0100 138 10.4297 0.0677
10.4375 0.0101 139 10.4219 0.0678
10.4297 0.0102 140 10.4219 0.0680
10.4297 0.0102 141 10.4219 0.0682
10.4219 0.0103 142 10.4219 0.0684
10.4219 0.0104 143 10.4219 0.0687
10.4219 0.0105 144 10.4141 0.0689
10.4219 0.0105 145 10.4141 0.0692
10.4141 0.0106 146 10.4141 0.0693
10.4062 0.0107 147 10.4141 0.0695
10.4141 0.0108 148 10.4062 0.0696
10.4141 0.0108 149 10.4062 0.0697
10.4219 0.0109 150 10.4062 0.0697
10.4062 0.0110 151 10.4062 0.0698
10.4141 0.0110 152 10.4062 0.0700
10.4141 0.0111 153 10.3984 0.0701
10.4219 0.0112 154 10.3984 0.0702
10.4141 0.0113 155 10.3984 0.0704
10.4062 0.0113 156 10.3984 0.0705
10.4062 0.0114 157 10.3906 0.0707
10.3906 0.0115 158 10.3906 0.0708
10.3906 0.0116 159 10.3906 0.0710
10.3984 0.0116 160 10.3906 0.0711
10.3984 0.0117 161 10.3906 0.0711
10.3906 0.0118 162 10.3828 0.0712
10.3906 0.0118 163 10.3828 0.0712
10.3906 0.0119 164 10.3828 0.0714
10.3828 0.0120 165 10.3828 0.0715
10.375 0.0121 166 10.375 0.0716
10.3828 0.0121 167 10.375 0.0717
10.3828 0.0122 168 10.375 0.0718
10.3828 0.0123 169 10.375 0.0719
10.3828 0.0124 170 10.375 0.0721
10.3672 0.0124 171 10.3672 0.0721
10.375 0.0125 172 10.3672 0.0721
10.3594 0.0126 173 10.3672 0.0721
10.375 0.0126 174 10.3672 0.0720
10.3594 0.0127 175 10.3594 0.0721
10.3672 0.0128 176 10.3594 0.0722
10.375 0.0129 177 10.3594 0.0723
10.3672 0.0129 178 10.3594 0.0726
10.3672 0.0130 179 10.3594 0.0727
10.3594 0.0131 180 10.3516 0.0728
10.3672 0.0132 181 10.3516 0.0729
10.3594 0.0132 182 10.3516 0.0730
10.3516 0.0133 183 10.3516 0.0731
10.3594 0.0134 184 10.3516 0.0732
10.3516 0.0134 185 10.3438 0.0733
10.3516 0.0135 186 10.3438 0.0733
10.3438 0.0136 187 10.3438 0.0734
10.3516 0.0137 188 10.3438 0.0734
10.3516 0.0137 189 10.3359 0.0735
10.3438 0.0138 190 10.3359 0.0735
10.3516 0.0139 191 10.3359 0.0735
10.3359 0.0139 192 10.3359 0.0737
10.3359 0.0140 193 10.3359 0.0737
10.3359 0.0141 194 10.3281 0.0736
10.3359 0.0142 195 10.3281 0.0736
10.3359 0.0142 196 10.3281 0.0736
10.3281 0.0143 197 10.3281 0.0737
10.3359 0.0144 198 10.3281 0.0738
10.3203 0.0145 199 10.3203 0.0740
10.3359 0.0145 200 10.3203 0.0741
10.3359 0.0146 201 10.3203 0.0742
10.3281 0.0147 202 10.3203 0.0743
10.3203 0.0147 203 10.3125 0.0743
10.3203 0.0148 204 10.3125 0.0743
10.3281 0.0149 205 10.3125 0.0743
10.3125 0.0150 206 10.3125 0.0741
10.3125 0.0150 207 10.3125 0.0740
10.3047 0.0151 208 10.3047 0.0740
10.3125 0.0152 209 10.3047 0.0741
10.3125 0.0153 210 10.3047 0.0742
10.3203 0.0153 211 10.3047 0.0743
10.3047 0.0154 212 10.3047 0.0744
10.3203 0.0155 213 10.2969 0.0745
10.3125 0.0155 214 10.2969 0.0747
10.3047 0.0156 215 10.2969 0.0749
10.2969 0.0157 216 10.2969 0.0750
10.3047 0.0158 217 10.2969 0.0750
10.2969 0.0158 218 10.2891 0.0749
10.2891 0.0159 219 10.2891 0.0747
10.2969 0.0160 220 10.2891 0.0744
10.2969 0.0161 221 10.2891 0.0742
10.2891 0.0161 222 10.2891 0.0741
10.2891 0.0162 223 10.2812 0.0742
10.2891 0.0163 224 10.2812 0.0743
10.2891 0.0163 225 10.2812 0.0746
10.2969 0.0164 226 10.2812 0.0748
10.2812 0.0165 227 10.2734 0.0749
10.2891 0.0166 228 10.2734 0.0750
10.2734 0.0166 229 10.2734 0.0751
10.2969 0.0167 230 10.2734 0.0750
10.2656 0.0168 231 10.2734 0.0749
10.2734 0.0169 232 10.2656 0.0747
10.2734 0.0169 233 10.2656 0.0747
10.2734 0.0170 234 10.2656 0.0746
10.2656 0.0171 235 10.2656 0.0747
10.2656 0.0171 236 10.2656 0.0748
10.2734 0.0172 237 10.2578 0.0749
10.2656 0.0173 238 10.2578 0.0752
10.2734 0.0174 239 10.2578 0.0755
10.2578 0.0174 240 10.2578 0.0756
10.2734 0.0175 241 10.2578 0.0756
10.2656 0.0176 242 10.25 0.0756
10.2578 0.0177 243 10.25 0.0756
10.2578 0.0177 244 10.25 0.0756
10.2578 0.0178 245 10.25 0.0756
10.2578 0.0179 246 10.25 0.0756
10.2578 0.0179 247 10.2422 0.0757
10.2578 0.0180 248 10.2422 0.0758
10.2422 0.0181 249 10.2422 0.0759
10.2422 0.0182 250 10.2422 0.0759
10.2422 0.0182 251 10.2422 0.0759
10.2422 0.0183 252 10.2344 0.0759
10.2422 0.0184 253 10.2344 0.0759
10.2422 0.0185 254 10.2344 0.0759
10.2422 0.0185 255 10.2344 0.0761
10.2422 0.0186 256 10.2344 0.0761
10.2422 0.0187 257 10.2266 0.0760
10.2422 0.0187 258 10.2266 0.0760
10.2344 0.0188 259 10.2266 0.0759
10.2344 0.0189 260 10.2266 0.0759
10.2266 0.0190 261 10.2266 0.0760
10.2188 0.0190 262 10.2188 0.0760
10.2266 0.0191 263 10.2188 0.0762
10.2266 0.0192 264 10.2188 0.0762
10.2188 0.0193 265 10.2188 0.0762
10.2266 0.0193 266 10.2188 0.0762
10.2188 0.0194 267 10.2109 0.0762
10.2109 0.0195 268 10.2109 0.0763
10.2109 0.0195 269 10.2109 0.0762
10.2109 0.0196 270 10.2109 0.0761
10.2188 0.0197 271 10.2109 0.0761
10.2109 0.0198 272 10.2031 0.0760
10.2188 0.0198 273 10.2031 0.0761
10.2266 0.0199 274 10.2031 0.0762
10.2188 0.0200 275 10.2031 0.0762
10.2109 0.0201 276 10.1953 0.0761
10.2109 0.0201 277 10.1953 0.0762
10.1953 0.0202 278 10.1953 0.0762
10.2031 0.0203 279 10.1953 0.0763
10.2188 0.0203 280 10.1953 0.0765
10.1953 0.0204 281 10.1875 0.0766
10.1953 0.0205 282 10.1875 0.0767
10.2031 0.0206 283 10.1875 0.0767
10.1797 0.0206 284 10.1875 0.0766
10.1953 0.0207 285 10.1875 0.0765
10.1953 0.0208 286 10.1797 0.0764
10.1875 0.0209 287 10.1797 0.0764
10.1953 0.0209 288 10.1797 0.0765
10.1875 0.0210 289 10.1797 0.0765
10.1875 0.0211 290 10.1797 0.0768
10.1797 0.0211 291 10.1719 0.0770
10.1719 0.0212 292 10.1719 0.0771
10.1719 0.0213 293 10.1719 0.0772
10.1797 0.0214 294 10.1719 0.0773
10.1797 0.0214 295 10.1719 0.0773
10.1641 0.0215 296 10.1641 0.0773
10.1719 0.0216 297 10.1641 0.0773
10.1719 0.0217 298 10.1641 0.0773
10.1719 0.0217 299 10.1641 0.0773
10.1719 0.0218 300 10.1641 0.0773
10.1641 0.0219 301 10.1641 0.0773
10.1562 0.0219 302 10.1562 0.0772
10.1719 0.0220 303 10.1562 0.0771
10.1562 0.0221 304 10.1562 0.0772
10.1641 0.0222 305 10.1562 0.0773
10.1562 0.0222 306 10.1484 0.0773
10.1641 0.0223 307 10.1484 0.0773
10.1719 0.0224 308 10.1484 0.0775
10.1562 0.0224 309 10.1484 0.0775
10.1719 0.0225 310 10.1484 0.0775
10.1562 0.0226 311 10.1406 0.0774
10.1562 0.0227 312 10.1406 0.0774
10.1562 0.0227 313 10.1406 0.0773
10.1406 0.0228 314 10.1406 0.0774
10.1406 0.0229 315 10.1406 0.0774
10.1406 0.0230 316 10.1406 0.0774
10.1328 0.0230 317 10.1328 0.0775
10.1484 0.0231 318 10.1328 0.0775
10.1328 0.0232 319 10.1328 0.0775
10.1328 0.0232 320 10.1328 0.0775
10.125 0.0233 321 10.1328 0.0775
10.1406 0.0234 322 10.125 0.0776
10.1328 0.0235 323 10.125 0.0777
10.125 0.0235 324 10.125 0.0778
10.125 0.0236 325 10.125 0.0777
10.125 0.0237 326 10.125 0.0777
10.1328 0.0238 327 10.1172 0.0777
10.1172 0.0238 328 10.1172 0.0777
10.1172 0.0239 329 10.1172 0.0777
10.125 0.0240 330 10.1172 0.0778
10.1094 0.0240 331 10.1172 0.0778
10.1094 0.0241 332 10.1094 0.0777
10.1094 0.0242 333 10.1094 0.0776
10.1172 0.0243 334 10.1094 0.0775
10.125 0.0243 335 10.1094 0.0774
10.1172 0.0244 336 10.1094 0.0772
10.1016 0.0245 337 10.1016 0.0771
10.1094 0.0246 338 10.1016 0.0773
10.1172 0.0246 339 10.1016 0.0775
10.1094 0.0247 340 10.1016 0.0777
10.1172 0.0248 341 10.1016 0.0778
10.0938 0.0248 342 10.0938 0.0779
10.1016 0.0249 343 10.0938 0.0780
10.0938 0.0250 344 10.0938 0.0780
10.0938 0.0251 345 10.0938 0.0780
10.1016 0.0251 346 10.0938 0.0781
10.1094 0.0252 347 10.0859 0.0780
10.0938 0.0253 348 10.0859 0.0780
10.0938 0.0254 349 10.0859 0.0780
10.0859 0.0254 350 10.0859 0.0779
10.0859 0.0255 351 10.0859 0.0780
10.0938 0.0256 352 10.0781 0.0781

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.0a0+32f93b1
  • Datasets 2.20.0
  • Tokenizers 0.19.1