lvwerra HF staff commited on
Commit
a4e1447
1 Parent(s): ba22a6b

Update Space (evaluate main: 24356aad)

Browse files
Files changed (2) hide show
  1. README.md +51 -1
  2. mcnemar.py +2 -1
README.md CHANGED
@@ -17,14 +17,64 @@ tags:
17
 
18
  ## Comparison description
19
 
 
 
 
 
 
 
 
 
 
 
20
  ## How to use
21
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  ## Output values
23
 
24
- ### Values from popular papers
 
 
 
 
25
 
26
  ## Examples
27
 
 
 
 
 
 
 
 
 
 
28
  ## Limitations and bias
29
 
 
 
30
  ## Citations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ## Comparison description
19
 
20
+ McNemar's test is a non-parametric diagnostic test over a contingency table resulting from the predictions of two classifiers. The test compares the sensitivity and specificity of the diagnostic tests on the same group reference labels. It can be computed with:
21
+
22
+ McNemar = (SE - SP)**2 / SE + SP
23
+
24
+ Where:
25
+ * SE: Sensitivity (Test 1 positive; Test 2 negative)
26
+ * SP: Specificity (Test 1 negative; Test 2 positive)
27
+
28
+ In other words, SE and SP are the diagonal elements of the contingency table for the classifier predictions (`predictions1` and `predictions2`) with respect to the ground truth `references`.
29
+
30
  ## How to use
31
 
32
+ The McNemar comparison calculates the proportions of responses that exhibit disagreement between two classifiers. It is used to analyze paired nominal data.
33
+
34
+ ## Inputs
35
+
36
+ Its arguments are:
37
+
38
+ `predictions1`: a list of predictions from the first model.
39
+
40
+ `predictions2`: a list of predictions from the second model.
41
+
42
+ `references`: a list of the ground truth reference labels.
43
+
44
  ## Output values
45
 
46
+ The McNemar comparison outputs two things:
47
+
48
+ `stat`: The McNemar statistic.
49
+
50
+ `p`: The p value.
51
 
52
  ## Examples
53
 
54
+ Example comparison:
55
+
56
+ ```python
57
+ mcnemar = evaluate.load("mcnemar")
58
+ results = mcnemar.compute(references=[1, 0, 1], predictions1=[1, 1, 1], predictions2=[1, 0, 1])
59
+ print(results)
60
+ {'stat': 1.0, 'p': 0.31731050786291115}
61
+ ```
62
+
63
  ## Limitations and bias
64
 
65
+ The McNemar test is a non-parametric test, so it has relatively few assumptions (basically only that the observations are independent). It should be used used to analyze paired nominal data only.
66
+
67
  ## Citations
68
+
69
+ ```bibtex
70
+ @article{mcnemar1947note,
71
+ title={Note on the sampling error of the difference between correlated proportions or percentages},
72
+ author={McNemar, Quinn},
73
+ journal={Psychometrika},
74
+ volume={12},
75
+ number={2},
76
+ pages={153--157},
77
+ year={1947},
78
+ publisher={Springer-Verlag}
79
+ }
80
+ ```
mcnemar.py CHANGED
@@ -35,7 +35,8 @@ Args:
35
  references (`list` of `int`): Ground truth labels.
36
 
37
  Returns:
38
- p (`float` or `int`): McNemar test score. Minimum possible value is 0. Maximum possible value is 1.0. A lower p value means a more significant difference.
 
39
 
40
  Examples:
41
  >>> mcnemar = evaluate.load("mcnemar")
 
35
  references (`list` of `int`): Ground truth labels.
36
 
37
  Returns:
38
+ stat (`float`): McNemar test score.
39
+ p (`float`): The p value. Minimum possible value is 0. Maximum possible value is 1.0. A lower p value means a more significant difference.
40
 
41
  Examples:
42
  >>> mcnemar = evaluate.load("mcnemar")