File size: 4,278 Bytes
468ed0c
5b46ada
c3569bd
 
 
468ed0c
b56e99c
468ed0c
 
c3569bd
 
 
 
 
468ed0c
 
5b46ada
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
title: TREC Eval
emoji: 🤗 
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
tags:
- evaluate
- metric
description: >-
  The TREC Eval metric combines a number of information retrieval metrics such as precision and nDCG. It is used to score rankings of retrieved documents with reference values.
---

# Metric Card for TREC Eval

## Metric Description

The TREC Eval metric combines a number of information retrieval metrics such as precision and normalized Discounted Cumulative Gain (nDCG). It is used to score rankings of retrieved documents with reference values.

## How to Use
```Python
from evaluate import load
trec_eval = load("trec_eval")
results = trec_eval.compute(predictions=[run], references=[qrel])
```

### Inputs
- **predictions** *(dict): a single retrieval run.*
    - **query** *(int): Query ID.*
    - **q0** *(str): Literal `"q0"`.*
    - **docid** *(str): Document ID.*
    - **rank** *(int): Rank of document.*
    - **score** *(float): Score of document.*
    - **system** *(str): Tag for current run.*
- **references** *(dict): a single qrel.*
    - **query** *(int): Query ID.*
    - **q0** *(str): Literal `"q0"`.*
    - **docid** *(str): Document ID.*
    - **rel** *(int): Relevance of document.*

### Output Values
- **runid** *(str): Run name.*  
- **num_ret** *(int): Number of retrieved documents.*  
- **num_rel** *(int): Number of relevant documents.*  
- **num_rel_ret** *(int): Number of retrieved relevant documents.*  
- **num_q** *(int): Number of queries.*  
- **map** *(float): Mean average precision.*
- **gm_map** *(float): geometric mean average precision.*  
- **bpref** *(float): binary preference score.*  
- **Rprec** *(float): precision@R, where R is number of relevant documents.*  
- **recip_rank** *(float): reciprocal rank*  
- **P@k** *(float): precision@k (k in [5, 10, 15, 20, 30, 100, 200, 500, 1000]).*  
- **NDCG@k** *(float): nDCG@k (k in [5, 10, 15, 20, 30, 100, 200, 500, 1000]).*  

### Examples

A minimal example of looks as follows:
```Python
qrel = {
    "query": [0],
    "q0": ["q0"],
    "docid": ["doc_1"],
    "rel": [2]
}
run = {
    "query": [0, 0],
    "q0": ["q0", "q0"],
    "docid": ["doc_2", "doc_1"],
    "rank": [0, 1],
    "score": [1.5, 1.2],
    "system": ["test", "test"]
}

trec_eval = evaluate.load("trec_eval")
results = trec_eval.compute(references=[qrel], predictions=[run])
results["P@5"]
0.2
```

A more realistic use case with an examples from [`trectools`](https://github.com/joaopalotti/trectools):

```python
qrel = pd.read_csv("robust03_qrels.txt", sep="\s+", names=["query", "q0", "docid", "rel"])
qrel["q0"] = qrel["q0"].astype(str)
qrel = qrel.to_dict(orient="list")

run = pd.read_csv("input.InexpC2", sep="\s+", names=["query", "q0", "docid", "rank", "score", "system"])
run = run.to_dict(orient="list")

trec_eval = evaluate.load("trec_eval")
result = trec_eval.compute(run=[run], qrel=[qrel])
```

```python
result

{'runid': 'InexpC2',
 'num_ret': 100000,
 'num_rel': 6074,
 'num_rel_ret': 3198,
 'num_q': 100,
 'map': 0.22485930431817494,
 'gm_map': 0.10411523825735523,
 'bpref': 0.217511695914079,
 'Rprec': 0.2502547201167236,
 'recip_rank': 0.6646545943335417,
 'P@5': 0.44,
 'P@10': 0.37,
 'P@15': 0.34600000000000003,
 'P@20': 0.30999999999999994,
 'P@30': 0.2563333333333333,
 'P@100': 0.1428,
 'P@200': 0.09510000000000002,
 'P@500': 0.05242,
 'P@1000': 0.03198,
 'NDCG@5': 0.4101480395089769,
 'NDCG@10': 0.3806761417784469,
 'NDCG@15': 0.37819463408955706,
 'NDCG@20': 0.3686080836061317,
 'NDCG@30': 0.352474353427451,
 'NDCG@100': 0.3778329431025776,
 'NDCG@200': 0.4119129817248979,
 'NDCG@500': 0.4585354576461375,
 'NDCG@1000': 0.49092149290805653}
```

## Limitations and Bias
The `trec_eval` metric requires the inputs to be in the TREC run and qrel formats for predictions and references.


## Citation

```bibtex
@inproceedings{palotti2019,
 author = {Palotti, Joao and Scells, Harrisen and Zuccon, Guido},
 title = {TrecTools: an open-source Python library for Information Retrieval practitioners involved in TREC-like campaigns},
 series = {SIGIR'19},
 year = {2019},
 location = {Paris, France},
 publisher = {ACM}
} 
```

## Further References

- Homepage: https://github.com/joaopalotti/trectools