Spaces:

MarioBarbeque
/

CombinedEvaluationMetrics

Sleeping

App Files Files Community

John Graham Reynolds commited on Nov 5

Commit

41d497e

•

1 Parent(s): bcbab79

update predicitions and add examples

Browse files

Files changed (1) hide show

app.py +6 -4

app.py CHANGED Viewed

@@ -14,7 +14,7 @@ Check out the original, longstanding issue [here](https://github.com/huggingface
 `evaluate.combine()` multiple metrics related to multilabel text classification. Particularly, one cannot `combine` the `f1`, `precision`, and `recall` scores for \
 evaluation. I encountered this issue specifically while training [RoBERTa-base-DReiFT](https://huggingface.co/MarioBarbeque/RoBERTa-base-DReiFT) for multilabel \
 text classification of 805 labeled medical conditions based on drug reviews. The [following workaround](https://github.com/johngrahamreynolds/FixedMetricsForHF) was
-congifured. \n
 This Space shows how one can instantiate these custom `evaluate.Metric`s, each with their own unique methodology for averaging across labels, before `combine`-ing them into a
 HF `evaluate.CombinedEvaluations` object. From here, we can easily compute each of the metrics simultaneously using `compute`.</p>
@@ -43,7 +43,7 @@ def evaluation(predictions, metrics) -> str:
     predicted = [int(num) for num in predictions["Predicted Class Label"].to_list()]
     references = [int(num) for num in predictions["Actual Class Label"].to_list()]
-    combined.add(prediction=predicted, reference=references)
     outputs = combined.compute()
     return "Your metrics are as follows: \n" + outputs
@@ -96,8 +96,10 @@ space = gr.Interface(
     description=description,
     article=article,
     examples=[
-        [[[1,1],[1,0],[2,0],[1,2],[2,2]], [["f1", "weighted"], ["precision", "micro"], ["recall", "weighted"]]],
-        # [[["precision", "micro"], ["recall", "weighted"], ["f1", "macro"]]],
     ]
     cache_examples=False
 ).launch()

 `evaluate.combine()` multiple metrics related to multilabel text classification. Particularly, one cannot `combine` the `f1`, `precision`, and `recall` scores for \
 evaluation. I encountered this issue specifically while training [RoBERTa-base-DReiFT](https://huggingface.co/MarioBarbeque/RoBERTa-base-DReiFT) for multilabel \
 text classification of 805 labeled medical conditions based on drug reviews. The [following workaround](https://github.com/johngrahamreynolds/FixedMetricsForHF) was
+created to address this. \n
 This Space shows how one can instantiate these custom `evaluate.Metric`s, each with their own unique methodology for averaging across labels, before `combine`-ing them into a
 HF `evaluate.CombinedEvaluations` object. From here, we can easily compute each of the metrics simultaneously using `compute`.</p>
     predicted = [int(num) for num in predictions["Predicted Class Label"].to_list()]
     references = [int(num) for num in predictions["Actual Class Label"].to_list()]
+    combined.add_batch(predictions=predicted, references=references)
     outputs = combined.compute()
     return "Your metrics are as follows: \n" + outputs
     description=description,
     article=article,
     examples=[
+        [
+            [[1,1], [1,0], [2,0], [1,2], [2,2]],
+            [["f1", "weighted"], ["precision", "micro"], ["recall", "weighted"]]
+        ]
     ]
     cache_examples=False
 ).launch()