Spaces:

d-matrix
/

dmx_perplexity

Sleeping

App Files Files Community

dmx_perplexity / README.md

d-matrix

Update README

096defa verified 10 months ago

preview code

raw

history blame

3.01 kB

	---
	title: DmxPerplexity
	emoji: 🌖
	colorFrom: purple
	colorTo: pink
	sdk: gradio
	sdk_version: 4.7.1
	app_file: app.py
	pinned: false
	license: apache-2.0
	tags:
	- evaluate
	- metric
	description: >-
	Perplexity metric implemented by d-Matrix.
	Perplexity (PPL) is one of the most common metrics for evaluating language models.
	It is defined as the exponentiated average negative log-likelihood of a sequence, calculated with exponent base `e`.
	Note that this metric is intended for Causual Language Models, the perplexity calculation is only correct if model uses Cross Entropy Loss.
	For more information, see https://huggingface.co/docs/transformers/perplexity
	---

	# Metric Card for Perplexity


	## Metric Description

	Perplexity metric implemented by d-Matrix.
	Perplexity (PPL) is one of the most common metrics for evaluating language models.
	It is defined as the exponentiated average negative log-likelihood of a sequence, calculated with exponent base `e`.
	Note that this metric is intended for Causual Language Models, the perplexity calculation is only correct if model uses Cross Entropy Loss.
	For more information, see https://huggingface.co/docs/transformers/perplexity

	## How to Use
	At minimum, this metric requires the model and references as inputs.
	```python
	>>> import evaluate
	>>> perplexity = evaluate.load("d-matrix/dmx_perplexity", module_type="metric")
	>>> input_texts = ["lorem ipsum", "Happy Birthday!", "Bienvenue"]
	>>> results = perplexity.compute(model='distilgpt2',references=input_texts)
	>>> print(results)
	{'loss': 4.993086338043213, 'perplexity': 147.390625}
	```

	### Inputs
	- model (`Union`[`str`,`AutoModelForCausalLM`]): model used for calculating Perplexity
	- references (`list` of `str`): input text, each separate text snippet is one list entry.
	- device (`str`): device to run on, defaults to 'cuda' when available.
	- max_length (`int`): maximum sequence length, defaults to 2048.

	### Output Values
	- loss (`float`): the loss of the model predictions compared to the reference
	- perplexity(`float`): measures the uncertainty of a model predicting text. Model performance is better when perplexity is lower.

	Output Example(s):
	```python
	{'loss': 4.993086338043213, 'perplexity': 147.390625}
	```
	This metric outputs a dictionary, containing the loss and perplexity score.

	### Examples
	```python
	>>> import evaluate
	>>> from datasets import load_dataset
	>>> from transformers import AutoTokenizer, AutoModelForCausalLM
	>>> perplexity = evaluate.load("d-matrix/dmx_perplexity", module_type="metric")
	>>> input_texts = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")["text"][:10]
	>>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
	>>> results = perplexity.compute(model=model,references=input_texts)
	>>> print(list(results.keys()))
	['loss', 'perplexity']
	>>> print(results['loss'])
	3.9706921577453613
	>>> print(results['perplexity'])
	53.021217346191406
	```

	## Citation(s)
	https://huggingface.co/docs/transformers/perplexity