Edit model card

(!) Don't forget to preprocess unknown_tokens and substitute them with <|endoftext|>. Otherwise the <unk> tokens in dataset will be split into the '<', 'unk' and '>' tokens

  • Full context (1024) perplexity on test set: 13.68

Dependence of the cross entropy loss on the length of the context for prediction

  • x-axis*128 = context length
  • y-axis = cross entropy

image/png

Downloads last month
5

Dataset used to train irodkin/gpt2-wiki103