dvitel commited on
Commit
08dc526
β€’
1 Parent(s): ea09ebe

allow references to be simple list

Browse files
Files changed (3) hide show
  1. README.md +35 -12
  2. dataflow_match.py +2 -2
  3. my_codebleu.py +1 -1
README.md CHANGED
@@ -12,25 +12,42 @@ pinned: false
12
 
13
  # Metric Card for CodeBLEU
14
 
15
- ***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
16
-
17
  ## Metric Description
18
- *Give a brief overview of this metric, including what task(s) it is usually used for, if any.*
 
 
 
 
19
 
20
  ## How to Use
21
- *Give general statement of how to use the metric*
22
 
23
- *Provide simplest possible example for using the metric*
 
 
 
 
 
 
 
 
24
 
25
  ### Inputs
26
- *List all input arguments in the format below*
27
- - **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*
 
 
 
 
 
 
28
 
29
  ### Output Values
30
 
31
- *Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
32
-
33
- *State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*
 
 
34
 
35
  #### Values from Popular Papers
36
  *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
@@ -39,10 +56,16 @@ pinned: false
39
  *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
40
 
41
  ## Limitations and Bias
42
- *Note any known limitations or biases that the metric has, with links and references if possible.*
43
 
44
  ## Citation
45
- *Cite the source where this metric was introduced.*
 
 
 
 
 
 
46
 
47
  ## Further References
48
  *Add any useful further references.*
 
12
 
13
  # Metric Card for CodeBLEU
14
 
 
 
15
  ## Metric Description
16
+
17
+ CodeBLEU from [CodeXGLUE](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator)
18
+ and from article [CodeBLEU: a Method for Automatic Evaluation of Code Synthesis](https://arxiv.org/abs/2009.10297)
19
+
20
+ NOTE: currently works on Linux machines only due to dependency on languages.so.
21
 
22
  ## How to Use
 
23
 
24
+ ```python
25
+ src = 'class AcidicSwampOoze(MinionCard):§ def __init__(self):§ super().__init__("Acidic Swamp Ooze", 2, CHARACTER_CLASS.ALL, CARD_RARITY.COMMON, battlecry=Battlecry(Destroy(), WeaponSelector(EnemyPlayer())))§§ def create_minion(self, player):§ return Minion(3, 2)§'
26
+ tgt = 'class AcidSwampOoze(MinionCard):§ def __init__(self):§ super().__init__("Acidic Swamp Ooze", 2, CHARACTER_CLASS.ALL, CARD_RARITY.COMMON, battlecry=Battlecry(Destroy(), WeaponSelector(EnemyPlayer())))§§ def create_minion(self, player):§ return Minion(3, 2)§'
27
+ src = src.replace("Β§","\n")
28
+ tgt = tgt.replace("Β§","\n")
29
+ res = module.compute(predictions = [tgt], references = [[src]])
30
+ print(res)
31
+ #{'CodeBLEU': 0.9473264567644872, 'ngram_match_score': 0.8915993127600096, 'weighted_ngram_match_score': 0.8977065142979394, 'syntax_match_score': 1.0, 'dataflow_match_score': 1.0}
32
+ ```
33
 
34
  ### Inputs
35
+ - **predictions** (`list` of `str`s): Translations to score.
36
+ - **references** (`list` of `list`s of `str`s): references for each translation.
37
+ - **lang** programming language in ['java','js','c_sharp','php','go','python','ruby']
38
+ - **tokenizer**: approach used for standardizing `predictions` and `references`.
39
+ The default tokenizer is `tokenizer_13a`, a relatively minimal tokenization approach that is however equivalent to `mteval-v13a`, used by WMT.
40
+ This can be replaced by another tokenizer from a source such as [SacreBLEU](https://github.com/mjpost/sacrebleu/tree/master/sacrebleu/tokenizers).
41
+ - **params**: str, weights for averaging(see CodeBLEU paper).
42
+ Defaults to equal weights "0.25,0.25,0.25,0.25".
43
 
44
  ### Output Values
45
 
46
+ - CodeBLEU: resulting score,
47
+ - ngram_match_score: See paper CodeBLEU,
48
+ - weighted_ngram_match_score: See paper CodeBLEU,
49
+ - syntax_match_score: See paper CodeBLEU,
50
+ - dataflow_match_score: See paper CodeBLEU,
51
 
52
  #### Values from Popular Papers
53
  *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
 
56
  *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
57
 
58
  ## Limitations and Bias
59
+ Linux OS only. See above a set of programming languages supported.
60
 
61
  ## Citation
62
+ ```bibtex
63
+ @InProceedings{huggingface:module,
64
+ title = {CodeBLEU: A Metric for Evaluating Code Generation},
65
+ authors={Sedykh, Ivan},
66
+ year={2022}
67
+ }
68
+ ```
69
 
70
  ## Further References
71
  *Add any useful further references.*
dataflow_match.py CHANGED
@@ -36,11 +36,11 @@ def corpus_dataflow_match(references, candidates, lang, langso_dir):
36
  candidate = candidates[i]
37
  for reference in references_sample:
38
  try:
39
- candidate=remove_comments_and_docstrings(candidate,'java')
40
  except:
41
  pass
42
  try:
43
- reference=remove_comments_and_docstrings(reference,'java')
44
  except:
45
  pass
46
 
 
36
  candidate = candidates[i]
37
  for reference in references_sample:
38
  try:
39
+ candidate=remove_comments_and_docstrings(candidate,lang)
40
  except:
41
  pass
42
  try:
43
+ reference=remove_comments_and_docstrings(reference,lang)
44
  except:
45
  pass
46
 
my_codebleu.py CHANGED
@@ -24,7 +24,7 @@ def calc_codebleu(predictions, references, lang, tokenizer=None, params='0.25,0.
24
  alpha, beta, gamma, theta = [float(x) for x in params.split(',')]
25
 
26
  # preprocess inputs
27
- references = [[x.strip() for x in ref] for ref in references]
28
  hypothesis = [x.strip() for x in predictions]
29
 
30
  if not len(references) == len(hypothesis):
 
24
  alpha, beta, gamma, theta = [float(x) for x in params.split(',')]
25
 
26
  # preprocess inputs
27
+ references = [[x.strip() for x in ref] if type(ref) == list else [ref.strip()] for ref in references]
28
  hypothesis = [x.strip() for x in predictions]
29
 
30
  if not len(references) == len(hypothesis):