Edit model card

metrics on 1024 context:

  • valid_perplexity = 14.79
  • valid_cross_entropy = 2.69
  • train_perplexity = 13.77
  • train_cross_entropy = 2.62

metrics on 252 context:

  • valid_perplexity = 17.35

metrics on 378 context:

  • valid_perplexity = 16.4

metrics on 504 context:

  • valid_perplexity = 15.86

Dependence of the cross entropy loss on the length of the context for prediction

  • x-axis*128 = context length
  • y-axis = cross entropy

image/png

Downloads last month
4

Dataset used to train irodkin/gpt2-wiki2