|
--- |
|
title: NL to Bash Generation Eval |
|
emoji: 🤗 |
|
colorFrom: indigo |
|
colorTo: green |
|
sdk: gradio |
|
sdk_version: 3.15.0 |
|
app_file: app.py |
|
pinned: false |
|
--- |
|
## Metric Description |
|
The evaluation metrics for natural language to bash generation. |
|
The preprocessing is customized for [`tldr`](https://github.com/tldr-pages/tldr) dataset where we first conduct annoymization on the variables. |
|
|
|
## How to Use |
|
|
|
This metric takes as input a list of predicted sentences and a list of reference sentences: |
|
|
|
```python |
|
predictions = ["ipcrm --shmem-id {{segment_id}}", |
|
"trash-empty --keep-files {{path/to/file_or_directory}}"] |
|
references = ["ipcrm --shmem-id {{shmem_id}}", |
|
"trash-empty {{10}}"] |
|
tldr_metrics = evaluate.load("neulab/tldr_eval") |
|
results = tldr_metrics.compute(predictions=predictions, references=references) |
|
print(results) |
|
>>> {'template_matching': 0.5, 'command_accuracy': 1.0, 'bleu_char': 65.67965919013294, 'token_recall': 0.9999999999583333, 'token_precision': 0.8333333333055555, 'token_f1': 0.8999999999183333} |
|
``` |
|
|
|
### Inputs |
|
- **predictions** (`list` of `str`s): Predictions to score. |
|
- **references** (`list` of `str`s): References |
|
|
|
### Output Values |
|
- **template_matching**: the exact match accuracy |
|
- **command_accuracy**: accuracy of predicting the correct bash command name (e.g., `ls`) |
|
- **bleu_char**: char bleu score |
|
- **token recall/precision/f1**: the recall/precision/f1 of the predicted tokens |
|
|
|
|
|
## Citation |
|
```@article{zhou2022doccoder, |
|
title={DocCoder: Generating Code by Retrieving and Reading Docs}, |
|
author={Zhou, Shuyan and Alon, Uri and Xu, Frank F and Jiang, Zhengbao and Neubig, Graham}, |
|
journal={arXiv preprint arXiv:2207.05987}, |
|
year={2022} |
|
} |
|
``` |