File size: 1,691 Bytes
16c255b
b20012f
 
 
 
16c255b
b20012f
16c255b
 
 
5b4d5f2
b20012f
 
5b4d5f2
 
 
b20012f
 
 
 
 
 
 
 
 
 
 
 
5b4d5f2
 
b20012f
 
5b4d5f2
 
b20012f
 
 
 
5b4d5f2
 
 
b20012f
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
title: NL to Bash Generation Eval
emoji: 🤗 
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 3.15.0
app_file: app.py
pinned: false
---
## Metric Description
The evaluation metrics for natural language to bash generation.
The preprocessing is customized for [`tldr`](https://github.com/tldr-pages/tldr) dataset where we first conduct annoymization on the variables.

## How to Use

This metric takes as input a list of predicted sentences and a list of reference sentences:

```python
predictions = ["ipcrm --shmem-id {{segment_id}}", 
    "trash-empty --keep-files {{path/to/file_or_directory}}"]
references = ["ipcrm --shmem-id {{shmem_id}}",
    "trash-empty {{10}}"]
tldr_metrics = evaluate.load("neulab/tldr_eval")
results = tldr_metrics.compute(predictions=predictions, references=references)
print(results)
>>> {'template_matching': 0.5, 'command_accuracy': 1.0, 'bleu_char': 65.67965919013294, 'token_recall': 0.9999999999583333, 'token_precision': 0.8333333333055555, 'token_f1': 0.8999999999183333}
```

### Inputs
- **predictions** (`list` of `str`s): Predictions to score.
- **references** (`list` of `str`s): References

### Output Values
- **template_matching**: the exact match accuracy
- **command_accuracy**: accuracy of predicting the correct bash command name (e.g., `ls`)
- **bleu_char**: char bleu score
- **token recall/precision/f1**: the recall/precision/f1 of the predicted tokens


## Citation
```@article{zhou2022doccoder,
  title={DocCoder: Generating Code by Retrieving and Reading Docs},
  author={Zhou, Shuyan and Alon, Uri and Xu, Frank F and Jiang, Zhengbao and Neubig, Graham},
  journal={arXiv preprint arXiv:2207.05987},
  year={2022}
}
```