lvwerra HF staff commited on
Commit
9ea7bd1
1 Parent(s): a508230

Update Space (evaluate main: b2a25b3f)

Browse files
Files changed (3) hide show
  1. README.md +11 -9
  2. app.py +1 -1
  3. perplexity.py +3 -2
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: Perplexity
3
- emoji: 🤗
4
  colorFrom: blue
5
  colorTo: red
6
  sdk: gradio
@@ -15,11 +15,12 @@ tags:
15
  # Metric Card for Perplexity
16
 
17
  ## Metric Description
18
- Given a model and an input text sequence, perplexity measures how likely the model is to generate the input text sequence. This can be used in two main ways:
19
- 1. to evaluate how well the model has learned the distribution of the text it was trained on
20
- - In this case, the model input should be the trained model to be evaluated, and the input texts should be the text that the model was trained on.
21
- 2. to evaluate how well a selection of text matches the distribution of text that the input model was trained on
22
- - In this case, the model input should be a trained model, and the input texts should be the text to be evaluated.
 
23
 
24
  ## Intended Uses
25
  Any language generation task.
@@ -30,7 +31,7 @@ The metric takes a list of text as input, as well as the name of the model used
30
 
31
  ```python
32
  from evaluate import load
33
- perplexity = load("perplexity")
34
  results = perplexity.compute(input_texts=input_texts, model_id='gpt2')
35
  ```
36
 
@@ -58,7 +59,7 @@ This metric's range is 0 and up. A lower score is better.
58
  ### Examples
59
  Calculating perplexity on input_texts defined here:
60
  ```python
61
- perplexity = evaluate.load("perplexity")
62
  input_texts = ["lorem ipsum", "Happy Birthday!", "Bienvenue"]
63
  results = perplexity.compute(model_id='gpt2',
64
  add_start_token=False,
@@ -72,7 +73,7 @@ print(round(results["perplexities"][0], 2))
72
  ```
73
  Calculating perplexity on input_texts loaded in from a dataset:
74
  ```python
75
- perplexity = evaluate.load("perplexity")
76
  input_texts = datasets.load_dataset("wikitext",
77
  "wikitext-2-raw-v1",
78
  split="test")["text"][:50]
@@ -90,6 +91,7 @@ print(round(results["perplexities"][0], 2))
90
  ## Limitations and Bias
91
  Note that the output value is based heavily on what text the model was trained on. This means that perplexity scores are not comparable between models or datasets.
92
 
 
93
 
94
  ## Citation
95
 
1
  ---
2
  title: Perplexity
3
+ emoji: 🤗
4
  colorFrom: blue
5
  colorTo: red
6
  sdk: gradio
15
  # Metric Card for Perplexity
16
 
17
  ## Metric Description
18
+ Given a model and an input text sequence, perplexity measures how likely the model is to generate the input text sequence.
19
+
20
+ As a metric, it can be used to evaluate how well the model has learned the distribution of the text it was trained on
21
+
22
+
23
+ In this case, the model input should be the trained model to be evaluated, and the input texts should be the text that the model was trained on.
24
 
25
  ## Intended Uses
26
  Any language generation task.
31
 
32
  ```python
33
  from evaluate import load
34
+ perplexity = load("perplexity", module_type="metric")
35
  results = perplexity.compute(input_texts=input_texts, model_id='gpt2')
36
  ```
37
 
59
  ### Examples
60
  Calculating perplexity on input_texts defined here:
61
  ```python
62
+ perplexity = evaluate.load("perplexity", module_type="metric")
63
  input_texts = ["lorem ipsum", "Happy Birthday!", "Bienvenue"]
64
  results = perplexity.compute(model_id='gpt2',
65
  add_start_token=False,
73
  ```
74
  Calculating perplexity on input_texts loaded in from a dataset:
75
  ```python
76
+ perplexity = evaluate.load("perplexity", module_type="metric")
77
  input_texts = datasets.load_dataset("wikitext",
78
  "wikitext-2-raw-v1",
79
  split="test")["text"][:50]
91
  ## Limitations and Bias
92
  Note that the output value is based heavily on what text the model was trained on. This means that perplexity scores are not comparable between models or datasets.
93
 
94
+ See Meister and Cotterell, ["Language Model Evaluation Beyond Perplexity"]( https://arxiv.org/abs/2106.00085) (2021) for more information about alternative model evaluation strategies.
95
 
96
  ## Citation
97
 
app.py CHANGED
@@ -2,5 +2,5 @@ import evaluate
2
  from evaluate.utils import launch_gradio_widget
3
 
4
 
5
- module = evaluate.load("perplexity")
6
  launch_gradio_widget(module)
2
  from evaluate.utils import launch_gradio_widget
3
 
4
 
5
+ module = evaluate.load("perplexity", module_type="metric")
6
  launch_gradio_widget(module)
perplexity.py CHANGED
@@ -56,7 +56,7 @@ Returns:
56
  max length for the perplexity computation.
57
  Examples:
58
  Example 1:
59
- >>> perplexity = evaluate.load("perplexity")
60
  >>> input_texts = ["lorem ipsum", "Happy Birthday!", "Bienvenue"]
61
  >>> results = perplexity.compute(model_id='gpt2',
62
  ... add_start_token=False,
@@ -70,7 +70,7 @@ Examples:
70
 
71
  Example 2:
72
  >>> from datasets import load_dataset
73
- >>> perplexity = evaluate.load("perplexity")
74
  >>> input_texts = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")["text"][:10] # doctest: +SKIP
75
  >>> input_texts = [s for s in input_texts if s!='']
76
  >>> results = perplexity.compute(model_id='gpt2',
@@ -88,6 +88,7 @@ Examples:
88
  class Perplexity(evaluate.EvaluationModule):
89
  def _info(self):
90
  return evaluate.EvaluationModuleInfo(
 
91
  description=_DESCRIPTION,
92
  citation=_CITATION,
93
  inputs_description=_KWARGS_DESCRIPTION,
56
  max length for the perplexity computation.
57
  Examples:
58
  Example 1:
59
+ >>> perplexity = evaluate.load("perplexity", module_type="metric")
60
  >>> input_texts = ["lorem ipsum", "Happy Birthday!", "Bienvenue"]
61
  >>> results = perplexity.compute(model_id='gpt2',
62
  ... add_start_token=False,
70
 
71
  Example 2:
72
  >>> from datasets import load_dataset
73
+ >>> perplexity = evaluate.load("perplexity", module_type="metric")
74
  >>> input_texts = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")["text"][:10] # doctest: +SKIP
75
  >>> input_texts = [s for s in input_texts if s!='']
76
  >>> results = perplexity.compute(model_id='gpt2',
88
  class Perplexity(evaluate.EvaluationModule):
89
  def _info(self):
90
  return evaluate.EvaluationModuleInfo(
91
+ module_type="metric",
92
  description=_DESCRIPTION,
93
  citation=_CITATION,
94
  inputs_description=_KWARGS_DESCRIPTION,