Spaces:
Sleeping
Sleeping
title: DmxPerplexity | |
emoji: π | |
colorFrom: purple | |
colorTo: pink | |
sdk: gradio | |
sdk_version: 4.7.1 | |
app_file: app.py | |
pinned: false | |
license: apache-2.0 | |
tags: | |
- evaluate | |
- metric | |
description: >- | |
Perplexity metric implemented by d-Matrix. | |
Perplexity (PPL) is one of the most common metrics for evaluating language models. | |
It is defined as the exponentiated average negative log-likelihood of a sequence, calculated with exponent base `e`. | |
Note that this metric is intended for Causual Language Models, the perplexity calculation is only correct if model uses Cross Entropy Loss. | |
For more information, see https://huggingface.co/docs/transformers/perplexity | |
# Metric Card for Perplexity | |
## Metric Description | |
Perplexity metric implemented by d-Matrix. | |
Perplexity (PPL) is one of the most common metrics for evaluating language models. | |
It is defined as the exponentiated average negative log-likelihood of a sequence, calculated with exponent base `e`. | |
Note that this metric is intended for Causual Language Models, the perplexity calculation is only correct if model uses Cross Entropy Loss. | |
For more information, see https://huggingface.co/docs/transformers/perplexity | |
## How to Use | |
At minimum, this metric requires the model and references as inputs. | |
```python | |
>>> import evaluate | |
>>> perplexity = evaluate.load("d-matrix/dmx_perplexity", module_type="metric") | |
>>> input_texts = ["lorem ipsum", "Happy Birthday!", "Bienvenue"] | |
>>> results = perplexity.compute(model='distilgpt2',references=input_texts) | |
>>> print(results) | |
{'loss': 4.993086338043213, 'perplexity': 147.390625} | |
``` | |
### Inputs | |
- **model** (`Union`[`str`,`AutoModelForCausalLM`]): model used for calculating Perplexity | |
- **references** (`list` of `str`): input text, each separate text snippet is one list entry. | |
- **device** (`str`): device to run on, defaults to 'cuda' when available. | |
- **max_length** (`int`): maximum sequence length, defaults to 2048. | |
### Output Values | |
- **loss** (`float`): the loss of the model predictions compared to the reference | |
- **perplexity**(`float`): measures the uncertainty of a model predicting text. Model performance is better when perplexity is lower. | |
Output Example(s): | |
```python | |
{'loss': 4.993086338043213, 'perplexity': 147.390625} | |
``` | |
This metric outputs a dictionary, containing the loss and perplexity score. | |
### Examples | |
```python | |
>>> import evaluate | |
>>> from datasets import load_dataset | |
>>> from transformers import AutoTokenizer, AutoModelForCausalLM | |
>>> perplexity = evaluate.load("d-matrix/dmx_perplexity", module_type="metric") | |
>>> input_texts = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")["text"][:10] | |
>>> model = AutoModelForCausalLM.from_pretrained("distilgpt2") | |
>>> results = perplexity.compute(model=model,references=input_texts) | |
>>> print(list(results.keys())) | |
['loss', 'perplexity'] | |
>>> print(results['loss']) | |
3.9706921577453613 | |
>>> print(results['perplexity']) | |
53.021217346191406 | |
``` | |
## Citation(s) | |
https://huggingface.co/docs/transformers/perplexity |