File size: 4,768 Bytes
82bab0f
318c91b
82bab0f
220984d
 
82bab0f
 
 
 
 
318c91b
 
 
a9b8c5a
318c91b
a9b8c5a
 
 
 
 
 
 
fd58845
a9b8c5a
82bab0f
 
318c91b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fd58845
318c91b
fd58845
318c91b
fd58845
318c91b
 
 
 
fd58845
318c91b
 
 
fd58845
318c91b
 
 
 
 
fd58845
318c91b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
title: nDCG
emoji: 👁
colorFrom: red
colorTo: blue
sdk: gradio
sdk_version: 3.9.1
app_file: app.py
pinned: false
license: mit
tags:
- evaluate
- metric
- ranking

description: >- 
        The Discounted Cumulative Gain is a measure of ranking quality. 
        It is used to evaluate Information Retrieval Systems under the following 2 assumptions:
                1. Highly relevant documents/Labels are more useful when appearing earlier in the results
                2. Documents/Labels are relevant to different degrees
        It is defined as the Sum over all relevances of the retrieved documents reduced logarithmically proportional to 
        the position in which they were retrieved.
        The Normalized DCG (nDCG) divides the resulting value by the best possible value to get a value between 
        0 and 1 s.t. a perfect retrieval achieves a nDCG of 1.
---

# Metric Card for nDCG

## Metric Description
The Discounted Cumulative Gain is a measure of ranking quality. 
It is used to evaluate Information Retrieval Systems under the 2 assumptions:
1. Highly relevant documents/Labels are more useful when appearing earlier in the results
2. Documents/Labels are relevant to different degrees  

It is defined as the sum over all relevances of the retrieved documents reduced logarithmically proportional to 
the position in which they were retrieved.
The Normalized DCG (nDCG) divides the resulting value by the optimal value that can be achieved to get a value between 
0 and 1 s.t. a perfect retrieval achieves a nDCG of 1.0

## How to Use

At minimum, this metric takes as input two `list`s of `list`s, each containing `float`s: predictions and references.

```python
import evaluate
nDCG_metric = evaluate.load('JP-SystemsX/nDCG')
results = nDCG_metric.compute(references=[[0, 1]], predictions=[[0, 1]])
print(results)
["{'nDCG@2': 1.0}"]
```

### Inputs:
**references** (`list` of `float`): True relevance 

**predictions** (`list` of `float`): Either predicted relevance, probability estimates or confidence values 

**k** (`int`): If set to a value only the k highest scores in the ranking will be considered, else considers all outputs.
        Defaults to None.

**sample_weight** (`list` of `float`): Sample weights Defaults to None.

**ignore_ties** (`boolean`): If set to true, assumes that there are no ties (this is likely if predictions are continuous)
        for efficiency gains. Defaults to False.

### Output:
**normalized_discounted_cumulative_gain** (`float`): The averaged nDCG scores for all samples. 
        Minimum possible value is 0.0 Maximum possible value is 1.0

Output Example(s):
```python
{'nDCG@5': 1.0}
{'nDCG': 0.876}
```
This metric outputs a dictionary, containing the nDCG score   

     
### Examples:
    Example 1-A simple example
        >>> nDCG_metric = evaluate.load("JP-SystemsX/nDCG")
        >>> results = nDCG_metric.compute(references=[[10, 0, 0, 1, 5]], predictions=[[.1, .2, .3, 4, 70]])
        >>> print(results)
        {'nDCG': 0.6956940443813076}
    Example 2-The same as Example 1, except with k set to 3.
        >>> nDCG_metric = evaluate.load("JP-SystemsX/nDCG")
        >>> results = nDCG_metric.compute(references=[[10, 0, 0, 1, 5]], predictions=[[.1, .2, .3, 4, 70]], k=3)
        >>> print(results)
        {'nDCG@3': 0.4123818817534531}
    Example 3-There is only one relevant label, but there is a tie and the model can't decide which one is the one.
        >>> accuracy_metric = evaluate.load("accuracy")
        >>> results = nDCG_metric.compute(references=[[1, 0, 0, 0, 0]], predictions=[[1, 1, 0, 0, 0]], k=1)
        >>> print(results)
        {'nDCG@1': 0.5}
        >>> #That is it calculates both and returns the average of both
    Example 4-The Same as 3, except ignore_ties is set to True.
        >>> accuracy_metric = evaluate.load("accuracy")
        >>> results = nDCG_metric.compute(references=[[1, 0, 0, 0, 0]], predictions=[[1, 1, 0, 0, 0]], k=1, ignore_ties=True)
        >>> print(results)
        {'nDCG@1': 0.0}
        >>> # Alternative Result: {'nDCG@1': 1.0}
        >>> # That is it chooses one of the 2 candidates and calculates the score only for this one
        >>> # That means the score may vary depending on which one was chosen

## Citation(s)
```bibtex
@article{scikit-learn,
  title={Scikit-learn: Machine Learning in {P}ython},
  author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
         and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
         and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
         Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
  journal={Journal of Machine Learning Research},
  volume={12},
  pages={2825--2830},
  year={2011}
}