File size: 3,018 Bytes
4fa9f3e
1a74fec
4fa9f3e
 
 
 
219cce4
4fa9f3e
 
 
b97e015
 
 
 
219cce4
 
 
 
 
 
 
4fa9f3e
 
b97e015
 
 
 
 
 
 
 
74397ee
b97e015
 
 
1a74fec
b97e015
76e1a38
74397ee
b97e015
1a74fec
b97e015
1a74fec
76e1a38
 
 
 
1a74fec
76e1a38
 
 
 
 
 
 
 
 
ec2acd0
76e1a38
 
 
 
 
 
 
096defa
74397ee
096defa
 
 
76e1a38
 
 
74397ee
76e1a38
74397ee
76e1a38
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
title: DmxPerplexity
emoji: 🌖
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 4.41.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
- evaluate
- metric
description: >-
  Perplexity metric implemented by d-Matrix. Perplexity (PPL) is one of the most
  common metrics for evaluating language models. It is defined as the
  exponentiated average negative log-likelihood of a sequence, calculated with
  exponent base `e`. Note that this metric is intended for Causual Language
  Models, the perplexity calculation is only correct if model uses Cross Entropy
  Loss. For more information, see
  https://huggingface.co/docs/transformers/perplexity
---

# Metric Card for Perplexity


## Metric Description

Perplexity metric implemented by d-Matrix.
Perplexity (PPL) is one of the most common metrics for evaluating language models.
It is defined as the exponentiated average negative log-likelihood of a sequence, calculated with exponent base `e`.
Note that this metric is intended for Causual Language Models, the perplexity calculation is only correct if model uses Cross Entropy Loss.
For more information, see https://huggingface.co/docs/transformers/perplexity

## How to Use
At minimum, this metric requires the model and references as inputs.
```python
>>> import evaluate
>>> perplexity = evaluate.load("d-matrix/dmx_perplexity", module_type="metric")
>>> input_texts = ["lorem ipsum", "Happy Birthday!", "Bienvenue"]
>>> results = perplexity.compute(model='distilgpt2',references=input_texts)
>>> print(results)
{'loss': 4.993086338043213, 'perplexity': 147.390625}
```

### Inputs
- **model** (`Union`[`str`,`AutoModelForCausalLM`]): model used for calculating Perplexity
- **references** (`list` of `str`): input text, each separate text snippet is one list entry.
- **device** (`str`): device to run on, defaults to 'cuda' when available.
- **max_length** (`int`): maximum sequence length, defaults to 2048.

### Output Values
- **loss** (`float`): the loss of the model predictions compared to the reference
- **perplexity**(`float`): measures the uncertainty of a model predicting text. Model performance is better when perplexity is lower.

Output Example(s):
```python
{'loss': 4.993086338043213, 'perplexity': 147.390625}
```
This metric outputs a dictionary, containing the loss and perplexity score.

### Examples
```python
>>> import evaluate
>>> from datasets import load_dataset
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>> perplexity = evaluate.load("d-matrix/dmx_perplexity", module_type="metric")
>>> input_texts = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")["text"][:10]
>>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
>>> results = perplexity.compute(model=model,references=input_texts)
>>> print(list(results.keys()))
['loss', 'perplexity']
>>> print(results['loss']) 
3.9706921577453613
>>> print(results['perplexity']) 
53.021217346191406
```

## Citation(s)
https://huggingface.co/docs/transformers/perplexity