Spaces:

MarioBarbeque
/

CombinedEvaluationMetrics

Sleeping

App Files Files Community

John Graham Reynolds commited on Nov 5, 2024

Commit

8b58e10

1 Parent(s): ca651ac

fix minor errors

Browse files

Files changed (1) hide show

app.py +25 -5

app.py CHANGED Viewed

@@ -4,6 +4,7 @@ from fixed_recall import FixedRecall
 import evaluate
 import gradio as gr
 import pandas as pd
 title = "'Combine' multiple metrics with this 🤗 Evaluate 🪲 Fix!"
@@ -14,10 +15,29 @@ Check out the original, longstanding issue [here](https://github.com/huggingface
 `evaluate.combine()` multiple metrics related to multilabel text classification. Particularly, one cannot `combine` the `f1`, `precision`, and `recall` scores for \
 evaluation. I encountered this issue specifically while training [RoBERTa-base-DReiFT](https://huggingface.co/MarioBarbeque/RoBERTa-base-DReiFT) for multilabel \
 text classification of 805 labeled medical conditions based on drug reviews. The [following workaround](https://github.com/johngrahamreynolds/FixedMetricsForHF) was
-created to address this. \n
 This Space shows how one can instantiate these custom `evaluate.Metric`s, each with their own unique methodology for averaging across labels, before `combine`-ing them into a
-HF `evaluate.CombinedEvaluations` object. From here, we can easily compute each of the metrics simultaneously using `compute`.</p>
 """
@@ -39,13 +59,13 @@ def evaluation(predictions_df: pd.DataFrame, metrics_df: pd.DataFrame) -> str:
     combined_list = []
     if "f1" in metric_set:
-        f1 = FixedF1(average=metric_map["f1"])
         combined_list.append(f1)
     if "precision" in metric_set:
-        precision = FixedPrecision(average=metric_map["f1"])
         combined_list.append(precision)
     if "recall" in metric_set:
-        recall = FixedRecall(average=metric_map["f1"])
         combined_list.append(recall)
     combined = evaluate.combine(combined_list)

 import evaluate
 import gradio as gr
 import pandas as pd
+import numpy as np
 title = "'Combine' multiple metrics with this 🤗 Evaluate 🪲 Fix!"
 `evaluate.combine()` multiple metrics related to multilabel text classification. Particularly, one cannot `combine` the `f1`, `precision`, and `recall` scores for \
 evaluation. I encountered this issue specifically while training [RoBERTa-base-DReiFT](https://huggingface.co/MarioBarbeque/RoBERTa-base-DReiFT) for multilabel \
 text classification of 805 labeled medical conditions based on drug reviews. The [following workaround](https://github.com/johngrahamreynolds/FixedMetricsForHF) was
+created to address this - follow the link to view the source! \n
 This Space shows how one can instantiate these custom `evaluate.Metric`s, each with their own unique methodology for averaging across labels, before `combine`-ing them into a
+HF `evaluate.CombinedEvaluations` object. From here, we can easily compute each of the metrics simultaneously using `compute`. \n
+In general, one writes the following:\n
+```python
+f1 = FixedF1(average=...)
+precision = FixedPrecision(average=...)
+recall = FixedRecall(average=...)
+combined = evaluate.combine([f1, precision, recall])
+combined.add_batch(predictions=..., references=...)
+combined.compute()
+```\n
+where the `average` parameter can be different at instantiation time for each of the metrics. Acceptable values include `[None, 'micro', 'macro', 'weighted']` (
+or `binary` if there exist only two labels). \n
+Try it out using the examples below! Then try picking some various averaging methods yourself!
+</p>
 """
     combined_list = []
     if "f1" in metric_set:
+        f1 = FixedF1(average=metric_map["f1"] if metric_map["f1"] != "None" else None)
         combined_list.append(f1)
     if "precision" in metric_set:
+        precision = FixedPrecision(average=metric_map["precision"] if metric_map["precision"] != "None" else None, zero_division=np.nan)
         combined_list.append(precision)
     if "recall" in metric_set:
+        recall = FixedRecall(average=metric_map["recall"] if metric_map["recall"] != "None" else None)
         combined_list.append(recall)
     combined = evaluate.combine(combined_list)