Spaces:

evaluate-metric
/

google_bleu

Running

Add description to card metadata

by julien-c HF staff - opened May 30, 2022

←

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 title: Google BLEU
-emoji: 🤗
 colorFrom: blue
 colorTo: red
 sdk: gradio
@@ -8,10 +8,25 @@ sdk_version: 3.0.2
 app_file: app.py
 pinned: false
 tags:
-- evaluate
-- metric
 ---
 # Metric Card for Google BLEU

 ---
 title: Google BLEU
+emoji: 🤗
 colorFrom: blue
 colorTo: red
 sdk: gradio
 app_file: app.py
 pinned: false
 tags:
+  - evaluate
+  - metric
+description: |-
+  The BLEU score has some undesirable properties when used for single
+  sentences, as it was designed to be a corpus measure. We therefore
+  use a slightly different score for our RL experiments which we call
+  the 'GLEU score'. For the GLEU score, we record all sub-sequences of
+  1, 2, 3 or 4 tokens in output and target sequence (n-grams). We then
+  compute a recall, which is the ratio of the number of matching n-grams
+  to the number of total n-grams in the target (ground truth) sequence,
+  and a precision, which is the ratio of the number of matching n-grams
+  to the number of total n-grams in the generated output sequence. Then
+  GLEU score is simply the minimum of recall and precision. This GLEU
+  score's range is always between 0 (no matches) and 1 (all match) and
+  it is symmetrical when switching output and target. According to
+  our experiments, GLEU score correlates quite well with the BLEU
+  metric on a corpus level but does not have its drawbacks for our per
+  sentence reward objective.
 ---
 # Metric Card for Google BLEU