--- title: NL to Bash Generation Eval emoji: 🤗 colorFrom: indigo colorTo: green sdk: gradio sdk_version: 3.15.0 app_file: app.py pinned: false --- ## Metric Description The evaluation metrics for natural language to bash generation. The preprocessing is customized for [`tldr`](https://github.com/tldr-pages/tldr) dataset where we first conduct annoymization on the variables. ## How to Use This metric takes as input a list of predicted sentences and a list of reference sentences: ```python predictions = ["ipcrm --shmem-id {{segment_id}}", "trash-empty --keep-files {{path/to/file_or_directory}}"] references = ["ipcrm --shmem-id {{shmem_id}}", "trash-empty {{10}}"] tldr_metrics = evaluate.load("neulab/tldr_eval") results = tldr_metrics.compute(predictions=predictions, references=references) print(results) >>> {'template_matching': 0.5, 'command_accuracy': 1.0, 'bleu_char': 65.67965919013294, 'token_recall': 0.9999999999583333, 'token_precision': 0.8333333333055555, 'token_f1': 0.8999999999183333} ``` ### Inputs - **predictions** (`list` of `str`s): Predictions to score. - **references** (`list` of `str`s): References ### Output Values - **template_matching**: the exact match accuracy - **command_accuracy**: accuracy of predicting the correct bash command name (e.g., `ls`) - **bleu_char**: char bleu score - **token recall/precision/f1**: the recall/precision/f1 of the predicted tokens ## Citation ```@article{zhou2022doccoder, title={DocCoder: Generating Code by Retrieving and Reading Docs}, author={Zhou, Shuyan and Alon, Uri and Xu, Frank F and Jiang, Zhengbao and Neubig, Graham}, journal={arXiv preprint arXiv:2207.05987}, year={2022} } ```