github-actions commited on
Commit
594067e
1 Parent(s): 48fa02d

Auto files update [main]

Browse files
Files changed (3) hide show
  1. README.md +8 -8
  2. app.py +0 -10
  3. codebleu.py +0 -3
README.md CHANGED
@@ -5,7 +5,7 @@ tags:
5
  - metric
6
  - code
7
  - codebleu
8
- description: "Unofficial `CodeBLEU` implementation with Linux and MacOS supports available with PyPI and HF HUB."
9
  sdk: gradio
10
  sdk_version: 3.19.1
11
  app_file: app.py
@@ -14,10 +14,13 @@ pinned: false
14
 
15
  # Metric Card for codebleu
16
 
17
- ***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
 
 
 
 
18
 
19
  ## Metric Description
20
- Unofficial `CodeBLEU` implementation with Linux and MacOS supports available with PyPI and HF HUB.
21
 
22
  > An ideal evaluation metric should consider the grammatical correctness and the logic correctness.
23
  > We propose weighted n-gram match and syntactic AST match to measure grammatical correctness, and introduce semantic data-flow match to calculate logic correctness.
@@ -29,9 +32,6 @@ In a nutshell, `CodeBLEU` is a weighted combination of `n-gram match (BLEU)`, `w
29
  The metric has shown higher correlation with human evaluation than `BLEU` and `accuracy` metrics.
30
 
31
  ## How to Use
32
- *Give general statement of how to use the metric*
33
-
34
- *Provide simplest possible example for using the metric*
35
 
36
  ### Inputs
37
 
@@ -80,7 +80,7 @@ print(result)
80
  # }
81
  ```
82
 
83
- Or using `evaluate` library (package required):
84
  ```python
85
  import evaluate
86
  metric = evaluate.load("k4black/codebleu")
@@ -91,7 +91,7 @@ reference = "def sum ( first , second ) :\n return second + first"
91
  result = metric.compute([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None)
92
  ```
93
 
94
- Note: `language` is required;
95
 
96
 
97
  ## Limitations and Bias
 
5
  - metric
6
  - code
7
  - codebleu
8
+ description: "Unofficial `CodeBLEU` implementation that supports Linux and MacOS."
9
  sdk: gradio
10
  sdk_version: 3.19.1
11
  app_file: app.py
 
14
 
15
  # Metric Card for codebleu
16
 
17
+ This repository contains an unofficial `CodeBLEU` implementation that supports Linux and MacOS. It is available through `PyPI` and the `evaluate` library.
18
+
19
+ The code is based on the original [CodeXGLUE/CodeBLEU](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator/CodeBLEU) and updated version by [XLCoST/CodeBLEU](https://github.com/reddy-lab-code-research/XLCoST/tree/main/code/translation/evaluator/CodeBLEU). It has been refactored, tested, built for macOS, and multiple improvements have been made to enhance usability
20
+
21
+ Available for: `Python`, `C`, `C#`, `C++`, `Java`, `JavaScript`, `PHP`.
22
 
23
  ## Metric Description
 
24
 
25
  > An ideal evaluation metric should consider the grammatical correctness and the logic correctness.
26
  > We propose weighted n-gram match and syntactic AST match to measure grammatical correctness, and introduce semantic data-flow match to calculate logic correctness.
 
32
  The metric has shown higher correlation with human evaluation than `BLEU` and `accuracy` metrics.
33
 
34
  ## How to Use
 
 
 
35
 
36
  ### Inputs
37
 
 
80
  # }
81
  ```
82
 
83
+ Or using `evaluate` library (`codebleu` package required):
84
  ```python
85
  import evaluate
86
  metric = evaluate.load("k4black/codebleu")
 
91
  result = metric.compute([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None)
92
  ```
93
 
94
+ Note: `lang` is required;
95
 
96
 
97
  ## Limitations and Bias
app.py CHANGED
@@ -1,15 +1,5 @@
1
- import importlib
2
- import subprocess
3
- import sys
4
-
5
  import evaluate
6
  from evaluate.utils import launch_gradio_widget
7
 
8
-
9
- # hotfix: somehow codebleu is not installed in the docker image
10
- subprocess.run([sys.executable, "-m", "pip", "install", "codebleu"], check=True)
11
- globals()["codebleu"] = importlib.import_module("codebleu")
12
-
13
-
14
  module = evaluate.load("k4black/codebleu")
15
  launch_gradio_widget(module)
 
 
 
 
 
1
  import evaluate
2
  from evaluate.utils import launch_gradio_widget
3
 
 
 
 
 
 
 
4
  module = evaluate.load("k4black/codebleu")
5
  launch_gradio_widget(module)
codebleu.py CHANGED
@@ -18,7 +18,6 @@ import datasets
18
  import evaluate
19
 
20
 
21
- # TODO: Add BibTeX citation
22
  _CITATION = """\
23
  @misc{ren2020codebleu,
24
  title={CodeBLEU: a Method for Automatic Evaluation of Code Synthesis},
@@ -30,7 +29,6 @@ _CITATION = """\
30
  }
31
  """
32
 
33
- # TODO: Add description of the module here
34
  _DESCRIPTION = """\
35
  Unofficial `CodeBLEU` implementation with Linux and MacOS supports available with PyPI and HF HUB.
36
 
@@ -38,7 +36,6 @@ Based on original [CodeXGLUE/CodeBLEU](https://github.com/microsoft/CodeXGLUE/tr
38
  """
39
 
40
 
41
- # TODO: Add description of the arguments of the module here
42
  _KWARGS_DESCRIPTION = """
43
  Calculate a weighted combination of `n-gram match (BLEU)`, `weighted n-gram match (BLEU-weighted)`, `AST match` and `data-flow match` scores.
44
 
 
18
  import evaluate
19
 
20
 
 
21
  _CITATION = """\
22
  @misc{ren2020codebleu,
23
  title={CodeBLEU: a Method for Automatic Evaluation of Code Synthesis},
 
29
  }
30
  """
31
 
 
32
  _DESCRIPTION = """\
33
  Unofficial `CodeBLEU` implementation with Linux and MacOS supports available with PyPI and HF HUB.
34
 
 
36
  """
37
 
38
 
 
39
  _KWARGS_DESCRIPTION = """
40
  Calculate a weighted combination of `n-gram match (BLEU)`, `weighted n-gram match (BLEU-weighted)`, `AST match` and `data-flow match` scores.
41