Spaces:
Running
Running
metadata
title: TREC Eval
emoji: 🤗
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
tags:
- evaluate
- metric
description: >-
The TREC Eval metric combines a number of information retrieval metrics such
as precision and nDCG. It is used to score rankings of retrieved documents
with reference values.
Metric Card for TREC Eval
Metric Description
The TREC Eval metric combines a number of information retrieval metrics such as precision and normalized Discounted Cumulative Gain (nDCG). It is used to score rankings of retrieved documents with reference values.
How to Use
from evaluate import load
trec_eval = load("trec_eval")
results = trec_eval.compute(predictions=[run], references=[qrel])
Inputs
- predictions (dict): a single retrieval run.
- query (int): Query ID.
- q0 (str): Literal
"q0"
. - docid (str): Document ID.
- rank (int): Rank of document.
- score (float): Score of document.
- system (str): Tag for current run.
- references (dict): a single qrel.
- query (int): Query ID.
- q0 (str): Literal
"q0"
. - docid (str): Document ID.
- rel (int): Relevance of document.
Output Values
- runid (str): Run name.
- num_ret (int): Number of retrieved documents.
- num_rel (int): Number of relevant documents.
- num_rel_ret (int): Number of retrieved relevant documents.
- num_q (int): Number of queries.
- map (float): Mean average precision.
- gm_map (float): geometric mean average precision.
- bpref (float): binary preference score.
- Rprec (float): precision@R, where R is number of relevant documents.
- recip_rank (float): reciprocal rank
- P@k (float): precision@k (k in [5, 10, 15, 20, 30, 100, 200, 500, 1000]).
- NDCG@k (float): nDCG@k (k in [5, 10, 15, 20, 30, 100, 200, 500, 1000]).
Examples
A minimal example of looks as follows:
qrel = {
"query": [0],
"q0": ["q0"],
"docid": ["doc_1"],
"rel": [2]
}
run = {
"query": [0, 0],
"q0": ["q0", "q0"],
"docid": ["doc_2", "doc_1"],
"rank": [0, 1],
"score": [1.5, 1.2],
"system": ["test", "test"]
}
trec_eval = evaluate.load("trec_eval")
results = trec_eval.compute(references=[qrel], predictions=[run])
results["P@5"]
0.2
A more realistic use case with an examples from trectools
:
qrel = pd.read_csv("robust03_qrels.txt", sep="\s+", names=["query", "q0", "docid", "rel"])
qrel["q0"] = qrel["q0"].astype(str)
qrel = qrel.to_dict(orient="list")
run = pd.read_csv("input.InexpC2", sep="\s+", names=["query", "q0", "docid", "rank", "score", "system"])
run = run.to_dict(orient="list")
trec_eval = evaluate.load("trec_eval")
result = trec_eval.compute(run=[run], qrel=[qrel])
result
{'runid': 'InexpC2',
'num_ret': 100000,
'num_rel': 6074,
'num_rel_ret': 3198,
'num_q': 100,
'map': 0.22485930431817494,
'gm_map': 0.10411523825735523,
'bpref': 0.217511695914079,
'Rprec': 0.2502547201167236,
'recip_rank': 0.6646545943335417,
'P@5': 0.44,
'P@10': 0.37,
'P@15': 0.34600000000000003,
'P@20': 0.30999999999999994,
'P@30': 0.2563333333333333,
'P@100': 0.1428,
'P@200': 0.09510000000000002,
'P@500': 0.05242,
'P@1000': 0.03198,
'NDCG@5': 0.4101480395089769,
'NDCG@10': 0.3806761417784469,
'NDCG@15': 0.37819463408955706,
'NDCG@20': 0.3686080836061317,
'NDCG@30': 0.352474353427451,
'NDCG@100': 0.3778329431025776,
'NDCG@200': 0.4119129817248979,
'NDCG@500': 0.4585354576461375,
'NDCG@1000': 0.49092149290805653}
Limitations and Bias
The trec_eval
metric requires the inputs to be in the TREC run and qrel formats for predictions and references.
Citation
@inproceedings{palotti2019,
author = {Palotti, Joao and Scells, Harrisen and Zuccon, Guido},
title = {TrecTools: an open-source Python library for Information Retrieval practitioners involved in TREC-like campaigns},
series = {SIGIR'19},
year = {2019},
location = {Paris, France},
publisher = {ACM}
}
Further References
- Homepage: https://github.com/joaopalotti/trectools