.. Copyright 2020 The HuggingFace Team. All rights reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Perplexity of fixed-length models ======================================================================================================================= Perplexity (PPL) is one of the most common metrics for evaluating language models. Before diving in, we should note that the metric applies specifically to classical language models (sometimes called autoregressive or causal language models) and is not well defined for masked language models like BERT (see :doc:`summary of the models `). Perplexity is defined as the exponentiated average log-likelihood of a sequence. If we have a tokenized sequence :math:`X = (x_0, x_1, \dots, x_t)`, then the perplexity of :math:`X` is, .. math:: \text{PPL}(X) = \exp \left\{ {-\frac{1}{t}\sum_i^t \log p_\theta (x_i|x_{`_. Calculating PPL with fixed-length models ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If we weren't limited by a model's context size, we would evaluate the model's perplexity by autoregressively factorizing a sequence and conditioning on the entire preceding subsequence at each step, as shown below. .. image:: imgs/ppl_full.gif :width: 600 :alt: Full decomposition of a sequence with unlimited context length When working with approximate models, however, we typically have a constraint on the number of tokens the model can process. The largest version of :doc:`GPT-2 `, for example, has a fixed length of 1024 tokens, so we cannot calculate :math:`p_\theta(x_t|x_{