tldr_eval / README.md
shuyanzh's picture
add descriptions
b20012f
---
title: NL to Bash Generation Eval
emoji: 🤗
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 3.15.0
app_file: app.py
pinned: false
---
## Metric Description
The evaluation metrics for natural language to bash generation.
The preprocessing is customized for [`tldr`](https://github.com/tldr-pages/tldr) dataset where we first conduct annoymization on the variables.
## How to Use
This metric takes as input a list of predicted sentences and a list of reference sentences:
```python
predictions = ["ipcrm --shmem-id {{segment_id}}",
"trash-empty --keep-files {{path/to/file_or_directory}}"]
references = ["ipcrm --shmem-id {{shmem_id}}",
"trash-empty {{10}}"]
tldr_metrics = evaluate.load("neulab/tldr_eval")
results = tldr_metrics.compute(predictions=predictions, references=references)
print(results)
>>> {'template_matching': 0.5, 'command_accuracy': 1.0, 'bleu_char': 65.67965919013294, 'token_recall': 0.9999999999583333, 'token_precision': 0.8333333333055555, 'token_f1': 0.8999999999183333}
```
### Inputs
- **predictions** (`list` of `str`s): Predictions to score.
- **references** (`list` of `str`s): References
### Output Values
- **template_matching**: the exact match accuracy
- **command_accuracy**: accuracy of predicting the correct bash command name (e.g., `ls`)
- **bleu_char**: char bleu score
- **token recall/precision/f1**: the recall/precision/f1 of the predicted tokens
## Citation
```@article{zhou2022doccoder,
title={DocCoder: Generating Code by Retrieving and Reading Docs},
author={Zhou, Shuyan and Alon, Uri and Xu, Frank F and Jiang, Zhengbao and Neubig, Graham},
journal={arXiv preprint arXiv:2207.05987},
year={2022}
}
```