Spaces:
Running
Running
| title: TREC Eval | |
| emoji: 🤗 | |
| colorFrom: blue | |
| colorTo: red | |
| sdk: gradio | |
| sdk_version: 3.19.1 | |
| app_file: app.py | |
| pinned: false | |
| tags: | |
| - evaluate | |
| - metric | |
| description: >- | |
| The TREC Eval metric combines a number of information retrieval metrics such as precision and nDCG. It is used to score rankings of retrieved documents with reference values. | |
| # Metric Card for TREC Eval | |
| ## Metric Description | |
| The TREC Eval metric combines a number of information retrieval metrics such as precision and normalized Discounted Cumulative Gain (nDCG). It is used to score rankings of retrieved documents with reference values. | |
| ## How to Use | |
| ```Python | |
| from evaluate import load | |
| trec_eval = load("trec_eval") | |
| results = trec_eval.compute(predictions=[run], references=[qrel]) | |
| ``` | |
| ### Inputs | |
| - **predictions** *(dict): a single retrieval run.* | |
| - **query** *(int): Query ID.* | |
| - **q0** *(str): Literal `"q0"`.* | |
| - **docid** *(str): Document ID.* | |
| - **rank** *(int): Rank of document.* | |
| - **score** *(float): Score of document.* | |
| - **system** *(str): Tag for current run.* | |
| - **references** *(dict): a single qrel.* | |
| - **query** *(int): Query ID.* | |
| - **q0** *(str): Literal `"q0"`.* | |
| - **docid** *(str): Document ID.* | |
| - **rel** *(int): Relevance of document.* | |
| ### Output Values | |
| - **runid** *(str): Run name.* | |
| - **num_ret** *(int): Number of retrieved documents.* | |
| - **num_rel** *(int): Number of relevant documents.* | |
| - **num_rel_ret** *(int): Number of retrieved relevant documents.* | |
| - **num_q** *(int): Number of queries.* | |
| - **map** *(float): Mean average precision.* | |
| - **gm_map** *(float): geometric mean average precision.* | |
| - **bpref** *(float): binary preference score.* | |
| - **Rprec** *(float): precision@R, where R is number of relevant documents.* | |
| - **recip_rank** *(float): reciprocal rank* | |
| - **P@k** *(float): precision@k (k in [5, 10, 15, 20, 30, 100, 200, 500, 1000]).* | |
| - **NDCG@k** *(float): nDCG@k (k in [5, 10, 15, 20, 30, 100, 200, 500, 1000]).* | |
| ### Examples | |
| A minimal example of looks as follows: | |
| ```Python | |
| qrel = { | |
| "query": [0], | |
| "q0": ["q0"], | |
| "docid": ["doc_1"], | |
| "rel": [2] | |
| } | |
| run = { | |
| "query": [0, 0], | |
| "q0": ["q0", "q0"], | |
| "docid": ["doc_2", "doc_1"], | |
| "rank": [0, 1], | |
| "score": [1.5, 1.2], | |
| "system": ["test", "test"] | |
| } | |
| trec_eval = evaluate.load("trec_eval") | |
| results = trec_eval.compute(references=[qrel], predictions=[run]) | |
| results["P@5"] | |
| 0.2 | |
| ``` | |
| A more realistic use case with an examples from [`trectools`](https://github.com/joaopalotti/trectools): | |
| ```python | |
| qrel = pd.read_csv("robust03_qrels.txt", sep="\s+", names=["query", "q0", "docid", "rel"]) | |
| qrel["q0"] = qrel["q0"].astype(str) | |
| qrel = qrel.to_dict(orient="list") | |
| run = pd.read_csv("input.InexpC2", sep="\s+", names=["query", "q0", "docid", "rank", "score", "system"]) | |
| run = run.to_dict(orient="list") | |
| trec_eval = evaluate.load("trec_eval") | |
| result = trec_eval.compute(run=[run], qrel=[qrel]) | |
| ``` | |
| ```python | |
| result | |
| {'runid': 'InexpC2', | |
| 'num_ret': 100000, | |
| 'num_rel': 6074, | |
| 'num_rel_ret': 3198, | |
| 'num_q': 100, | |
| 'map': 0.22485930431817494, | |
| 'gm_map': 0.10411523825735523, | |
| 'bpref': 0.217511695914079, | |
| 'Rprec': 0.2502547201167236, | |
| 'recip_rank': 0.6646545943335417, | |
| 'P@5': 0.44, | |
| 'P@10': 0.37, | |
| 'P@15': 0.34600000000000003, | |
| 'P@20': 0.30999999999999994, | |
| 'P@30': 0.2563333333333333, | |
| 'P@100': 0.1428, | |
| 'P@200': 0.09510000000000002, | |
| 'P@500': 0.05242, | |
| 'P@1000': 0.03198, | |
| 'NDCG@5': 0.4101480395089769, | |
| 'NDCG@10': 0.3806761417784469, | |
| 'NDCG@15': 0.37819463408955706, | |
| 'NDCG@20': 0.3686080836061317, | |
| 'NDCG@30': 0.352474353427451, | |
| 'NDCG@100': 0.3778329431025776, | |
| 'NDCG@200': 0.4119129817248979, | |
| 'NDCG@500': 0.4585354576461375, | |
| 'NDCG@1000': 0.49092149290805653} | |
| ``` | |
| ## Limitations and Bias | |
| The `trec_eval` metric requires the inputs to be in the TREC run and qrel formats for predictions and references. | |
| ## Citation | |
| ```bibtex | |
| @inproceedings{palotti2019, | |
| author = {Palotti, Joao and Scells, Harrisen and Zuccon, Guido}, | |
| title = {TrecTools: an open-source Python library for Information Retrieval practitioners involved in TREC-like campaigns}, | |
| series = {SIGIR'19}, | |
| year = {2019}, | |
| location = {Paris, France}, | |
| publisher = {ACM} | |
| } | |
| ``` | |
| ## Further References | |
| - Homepage: https://github.com/joaopalotti/trectools |