regression_evaluate / README.md
Rodrigo Ferreira Rodrigues
Updating documentation
0651b51

A newer version of the Gradio SDK is available: 6.9.0

Upgrade
metadata
title: regression_evaluate
datasets:
  - GeoBenchmark
tags:
  - evaluate
  - metric
description: 'TODO: add a description here'
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false

Metric Card for regression_evaluate

Metric Description

This metric aims to evaluate regression tasks done by LMs. It expects the model to generate a list of numerical values to compare it to gold list of numerical values.

How to Use

This metric takes 2 mandatory arguments : generations (a list of string), golds (a list of list of floats).

import evaluate
metric = evaluate.load("rfr2003/regression_evaluate")
results = metric.compute(generations=['[150, 0]'], golds=[183, 177, 146, 85, 70, 78, 55, 17, 0, -1, -1])
print(results)
{'precision': [4.0], 'recall': [344.0], 'macro-mean': [174.0], 'median macro-mean': 174.0}

This metric accepts one optional argument:

d: function used to compute the distance between a generated value and a gold one. The default value is a function computing the absolute difference between two numbers.

Output Values

This metric outputs a dictionary with the following values:

precision: Sum of the minimum distances between each predicted value and the set of gold values, computed for each question.

recall: Sum of the minimum distances between each gold value and the set of generated values, computed for each question.

macro-mean: Average between precision and recall, computed for each question.

median macro-mean: Median accross macro-mean values.

Values from Popular Papers

Examples

import evaluate
metric = evaluate.load("rfr2003/regression_evaluate")
results = metric.compute(generations=['[150, 0]'], golds=[183, 177, 146, 85, 70, 78, 55, 17, 0, -1, -1])
print(results)
{'precision': [4.0], 'recall': [344.0], 'macro-mean': [174.0], 'median macro-mean': 174.0}

Limitations and Bias

Citation

Further References