Spaces:

MarioBarbeque
/

CombinedEvaluationMetrics

Sleeping

App Files Files Community

John Graham Reynolds commited on Nov 5

Commit

9b29e93

•

1 Parent(s): a9c64ab

took notes from demos for another attempt

Browse files

Files changed (1) hide show

app.py +43 -24

app.py CHANGED Viewed

@@ -8,34 +8,35 @@ import pandas as pd
 title = "'Combine' multiple metrics with this 🤗 Evaluate 🪲 Fix!"
 description = """<p style='text-align: center'>
-As I introduce myself to the entirety of the 🤗 ecosystem, I've put together this space to show off a temporary fix for a current 🪲 in the 🤗 Evaluate library. \n
-    Check out the original, longstanding issue [here](https://github.com/huggingface/evaluate/issues/234). This details how it is currently impossible to \
 'evaluate.combine()' multiple metrics related to multilabel text classification. Particularly, one cannot 'combine()' the f1, precision, and recall scores for \
 evaluation. I encountered this issue specifically while training [RoBERTa-base-DReiFT](https://huggingface.co/MarioBarbeque/RoBERTa-base-DReiFT) for multilabel \
 text classification of 805 labeled medical conditions based on drug reviews. \n
-Try to use \t to write some code? \t or how does that work? </p>
 """
-article = "<p style='text-align: center'> Check out the [original repo](https://github.com/johngrahamreynolds/FixedMetricsForHF) housing this code, and a quickly \
 trained [multilabel text classification model](https://github.com/johngrahamreynolds/RoBERTa-base-DReiFT/tree/main) that makes use of it during evaluation.</p>"
-# def show_off(predictions: list[list]) -> str:
-#     # f1 = FixedF1(average=weighting_map["f1"])
-#     # precision = FixedPrecision(average=weighting_map["precision"])
-#     # recall = FixedRecall(average=weighting_map["recall"])
-#     # combined = evaluate.combine([f1, recall, precision])
-#     # combined.add_batch(prediction=predictions, reference=references)
-#     # outputs =  combined.compute()
-#     outputs = predictions
-#     return "Your metrics are as follows: \n" + outputs
 # gr.Interface(
@@ -49,20 +50,38 @@ trained [multilabel text classification model](https://github.com/johngrahamreyn
 #     cache_examples=False
 # ).launch()
 def filter_records(records, gender):
     return records[records["gender"] == gender]
-demo = gr.Interface(
-    filter_records,
-    [
         gr.Dataframe(
-            headers=["name", "age", "gender"],
-            datatype=["str", "number", "str"],
             row_count=5,
-            col_count=(3, "fixed"),
         ),
-        gr.Dropdown(["M", "F", "O"]),
     ],
-    "dataframe",
-    description="Enter gender as 'M', 'F', or 'O' for other.",
 ).launch()

 title = "'Combine' multiple metrics with this 🤗 Evaluate 🪲 Fix!"
 description = """<p style='text-align: center'>
+As I introduce myself to the entirety of the 🤗 ecosystem, I've put together this Space to show off a temporary fix for a current 🪲 in the 🤗 Evaluate library. \n
+Check out the original, longstanding issue [here](https://github.com/huggingface/evaluate/issues/234). This details how it is currently impossible to \
 'evaluate.combine()' multiple metrics related to multilabel text classification. Particularly, one cannot 'combine()' the f1, precision, and recall scores for \
 evaluation. I encountered this issue specifically while training [RoBERTa-base-DReiFT](https://huggingface.co/MarioBarbeque/RoBERTa-base-DReiFT) for multilabel \
 text classification of 805 labeled medical conditions based on drug reviews. \n
+This Space shows how one can instantiate these custom metrics each with their own unique methodology for averaging across labels, combine them into a single
+HF `evaluate.EvaluationModule` (or `Metric`), and compute them.</p>
 """
+article = "<p style='text-align: center'>Check out the [original repo](https://github.com/johngrahamreynolds/FixedMetricsForHF) housing this code, and a quickly \
 trained [multilabel text classification model](https://github.com/johngrahamreynolds/RoBERTa-base-DReiFT/tree/main) that makes use of it during evaluation.</p>"
+def evaluation(predictions, metrics) -> str:
+    f1 = FixedF1(average=metrics["f1"])
+    precision = FixedPrecision(average=metrics["precision"])
+    recall = FixedRecall(average=metrics["recall"])
+    combined = evaluate.combine([f1, recall, precision])
+    df = predictions.get_dataframe()
+    predicted = df["Predicted Label"].to_list()
+    references = df["Actual Label"].to_list()
+    combined.add_batch(prediction=predicted, reference=references)
+    outputs =  combined.compute()
+    return "Your metrics are as follows: \n" + outputs
 # gr.Interface(
 #     cache_examples=False
 # ).launch()
+# use this to create examples
+# data = {'Name':['Tony', 'Steve', 'Bruce', 'Peter' ],
+#         'Age': [35, 70, 45, 20] }
+# # Creating DataFrame
+# df = pd.DataFrame(data)
 def filter_records(records, gender):
     return records[records["gender"] == gender]
+space = gr.Interface(
+    fn=evaluation,
+    inputs=[
         gr.Dataframe(
+            headers=["Predicted Label", "Actual Label"],
+            datatype=["number", "number"],
             row_count=5,
+            col_count=(2, "fixed"),
         ),
+        gr.Dataframe(
+            headers=["Metric", "Averaging Type"],
+            datatype=["str", "str"],
+            row_count=3,
+            col_count=(2, "fixed"),
+        )
     ],
+    outputs="textbox",
+    title=title,
+    description=description,
+    article=article,
+    cache_examples=False
 ).launch()