Spaces:

neulab
/

tldr_eval

Runtime error

App Files Files Community

tldr_eval / README.md

shuyanzh

add descriptions

b20012f over 1 year ago

preview code

raw history blame

No virus

1.69 kB

	---
	title: NL to Bash Generation Eval
	emoji: 🤗
	colorFrom: indigo
	colorTo: green
	sdk: gradio
	sdk_version: 3.15.0
	app_file: app.py
	pinned: false
	---
	## Metric Description
	The evaluation metrics for natural language to bash generation.
	The preprocessing is customized for [`tldr`](https://github.com/tldr-pages/tldr) dataset where we first conduct annoymization on the variables.

	## How to Use

	This metric takes as input a list of predicted sentences and a list of reference sentences:

	```python
	predictions = ["ipcrm --shmem-id {{segment_id}}",
	"trash-empty --keep-files {{path/to/file_or_directory}}"]
	references = ["ipcrm --shmem-id {{shmem_id}}",
	"trash-empty {{10}}"]
	tldr_metrics = evaluate.load("neulab/tldr_eval")
	results = tldr_metrics.compute(predictions=predictions, references=references)
	print(results)
	>>> {'template_matching': 0.5, 'command_accuracy': 1.0, 'bleu_char': 65.67965919013294, 'token_recall': 0.9999999999583333, 'token_precision': 0.8333333333055555, 'token_f1': 0.8999999999183333}
	```

	### Inputs
	- predictions (`list` of `str`s): Predictions to score.
	- references (`list` of `str`s): References

	### Output Values
	- template_matching: the exact match accuracy
	- command_accuracy: accuracy of predicting the correct bash command name (e.g., `ls`)
	- bleu_char: char bleu score
	- token recall/precision/f1: the recall/precision/f1 of the predicted tokens


	## Citation
	```@article{zhou2022doccoder,
	title={DocCoder: Generating Code by Retrieving and Reading Docs},
	author={Zhou, Shuyan and Alon, Uri and Xu, Frank F and Jiang, Zhengbao and Neubig, Graham},
	journal={arXiv preprint arXiv:2207.05987},
	year={2022}
	}
	```