Spaces:

evaluate-metric
/

google_bleu

Running

lvwerra HF staff commited on Jun 8, 2022

Commit

f2b5e98

•

1 Parent(s): 3dd4381

Update Space (evaluate main: 05209ece)

Files changed (1) hide show

README.md CHANGED Viewed

@@ -10,6 +10,22 @@ pinned: false
 tags:
 - evaluate
 - metric
 ---
 # Metric Card for Google BLEU

 tags:
 - evaluate
 - metric
+description: >-
+  The BLEU score has some undesirable properties when used for single
+  sentences, as it was designed to be a corpus measure. We therefore
+  use a slightly different score for our RL experiments which we call
+  the 'GLEU score'. For the GLEU score, we record all sub-sequences of
+  1, 2, 3 or 4 tokens in output and target sequence (n-grams). We then
+  compute a recall, which is the ratio of the number of matching n-grams
+  to the number of total n-grams in the target (ground truth) sequence,
+  and a precision, which is the ratio of the number of matching n-grams
+  to the number of total n-grams in the generated output sequence. Then
+  GLEU score is simply the minimum of recall and precision. This GLEU
+  score's range is always between 0 (no matches) and 1 (all match) and
+  it is symmetrical when switching output and target. According to
+  our experiments, GLEU score correlates quite well with the BLEU
+  metric on a corpus level but does not have its drawbacks for our per
+  sentence reward objective.
 ---
 # Metric Card for Google BLEU