--- datasets: - wikitext - wikitext-103-v1 language: - en metrics: - perplexity - cross_entropy --- **(!) _Don't forget to preprocess unknown_tokens and substitute them with <|endoftext|>. Otherwise the \ tokens in dataset will be split into the '<', 'unk' and '>' tokens_** **Dependence of the cross entropy loss on the length of the context for prediction** - x-axis*128 = context length - y-axis = cross entropy ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63c1ac8cc58fcfeac186bda2/oV5DpLPgK6Ui9X2um8r78.png)