lvwerra HF staff commited on
Commit
f2b5e98
1 Parent(s): 3dd4381

Update Space (evaluate main: 05209ece)

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md CHANGED
@@ -10,6 +10,22 @@ pinned: false
10
  tags:
11
  - evaluate
12
  - metric
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
  # Metric Card for Google BLEU
 
10
  tags:
11
  - evaluate
12
  - metric
13
+ description: >-
14
+ The BLEU score has some undesirable properties when used for single
15
+ sentences, as it was designed to be a corpus measure. We therefore
16
+ use a slightly different score for our RL experiments which we call
17
+ the 'GLEU score'. For the GLEU score, we record all sub-sequences of
18
+ 1, 2, 3 or 4 tokens in output and target sequence (n-grams). We then
19
+ compute a recall, which is the ratio of the number of matching n-grams
20
+ to the number of total n-grams in the target (ground truth) sequence,
21
+ and a precision, which is the ratio of the number of matching n-grams
22
+ to the number of total n-grams in the generated output sequence. Then
23
+ GLEU score is simply the minimum of recall and precision. This GLEU
24
+ score's range is always between 0 (no matches) and 1 (all match) and
25
+ it is symmetrical when switching output and target. According to
26
+ our experiments, GLEU score correlates quite well with the BLEU
27
+ metric on a corpus level but does not have its drawbacks for our per
28
+ sentence reward objective.
29
  ---
30
 
31
  # Metric Card for Google BLEU