Spaces:

evaluate-comparison
/

mcnemar

Running

App Files Files Community

lvwerra HF Staff commited on May 30, 2022

Commit

a4e1447

1 Parent(s): ba22a6b

Update Space (evaluate main: 24356aad)

Browse files

Files changed (2) hide show

README.md +51 -1
mcnemar.py +2 -1

README.md CHANGED Viewed

@@ -17,14 +17,64 @@ tags:
 ## Comparison description
 ## How to use
 ## Output values
-### Values from popular papers
 ## Examples
 ## Limitations and bias
 ## Citations

 ## Comparison description
+McNemar's test is a non-parametric diagnostic test over a contingency table resulting from the predictions of two classifiers. The test compares the sensitivity and specificity of the diagnostic tests on the same group reference labels. It can be computed with:
+McNemar = (SE - SP)**2 / SE + SP
+Where:
+* SE: Sensitivity (Test 1 positive; Test 2 negative)
+* SP: Specificity (Test 1 negative; Test 2 positive)
+In other words, SE and SP are the diagonal elements of the contingency table for the classifier predictions (`predictions1` and `predictions2`) with respect to the ground truth `references`.
 ## How to use
+The McNemar comparison calculates the proportions of responses that exhibit disagreement between two classifiers. It is used to analyze paired nominal data.
+## Inputs
+Its arguments are:
+`predictions1`: a list of predictions from the first model.
+`predictions2`: a list of predictions from the second model.
+`references`: a list of the ground truth reference labels.
 ## Output values
+The McNemar comparison outputs two things:
+`stat`: The McNemar statistic.
+`p`: The p value.
 ## Examples
+Example comparison:
+```python
+mcnemar = evaluate.load("mcnemar")
+results = mcnemar.compute(references=[1, 0, 1], predictions1=[1, 1, 1], predictions2=[1, 0, 1])
+print(results)
+{'stat': 1.0, 'p': 0.31731050786291115}
+```
 ## Limitations and bias
+The McNemar test is a non-parametric test, so it has relatively few assumptions (basically only that the observations are independent). It should be used used to analyze paired nominal data only.
 ## Citations
+```bibtex
+@article{mcnemar1947note,
+  title={Note on the sampling error of the difference between correlated proportions or percentages},
+  author={McNemar, Quinn},
+  journal={Psychometrika},
+  volume={12},
+  number={2},
+  pages={153--157},
+  year={1947},
+  publisher={Springer-Verlag}
+}
+```

mcnemar.py CHANGED Viewed

@@ -35,7 +35,8 @@ Args:
     references (`list` of `int`): Ground truth labels.
 Returns:
-    p (`float` or `int`): McNemar test score. Minimum possible value is 0. Maximum possible value is 1.0. A lower p value means a more significant difference.
 Examples:
     >>> mcnemar = evaluate.load("mcnemar")

     references (`list` of `int`): Ground truth labels.
 Returns:
+    stat (`float`): McNemar test score.
+    p (`float`): The p value. Minimum possible value is 0. Maximum possible value is 1.0. A lower p value means a more significant difference.
 Examples:
     >>> mcnemar = evaluate.load("mcnemar")