Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.9.0
title: Path_Planning_evaluate
datasets:
- GeoBenchmark
tags:
- evaluate
- metric
description: 'TODO: add a description here'
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
Metric Card for Path_Planning_evaluate
This metric is used to evaluate path planning tasks where an LM as to generate a valid path going from a starting point to one or multiple end points in a grid and by avoiding all the obstacles.
Metric Description
This metric is used to evaluate path planning tasks where an LM as to generate a valid path going from a starting point to one or multiple end points in a grid and by avoiding all the obstacles.
How to Use
This metric takes 5 mandatory arguments : generations (a list of string), golds (a list of list of integers corresponding to the gold paths in a list format), obstacles (a list of list of integers corresponding to the coordinates of the obstacles for each question), ends (a list of list of integers corresponding to the coordinates of the ending points for each question) and n (a list of integers corresponding to the size pf the grid).
import evaluate
pp_eval = evaluate.load("rfr2003/path_planning_evaluate")
results = pp_eval.compute(
generations=['[(0,0), (0,1), (1,1)]', '[(0,0), (1,0), (1,1)]', '[(0,0), (1,0), (1,1), (0,1)]', '(0,0'],
golds=[[(0,0), (0,1), (1,1)], [(0,0), (0,1), (1,1)], [(0,0), (0,1)], []],
obstacles=[[(1,0)], [(1,0)], [], []],
ends=[[(1,1)], [(1,1)], [(0,1)], [(0,1)]],
n=[2, 2, 2, 2]
)
print(results)
{'compliance_ratio': 0.75, 'success_ratio': 0.6666666666666666,'optimal_ratio': 0.3333333333333333, 'feasible_ratio': 0.6666666666666666, 'distance': 0, 'unreachable_acc': 1.0}
This metric doesn't take any optionnal arguments.
Output Values
This metric outputs a dictionary with the following values:
compliance_ratio: The ratio of generations that complied to a list format across all questions, which ranges from 0.0 to 1.0.
feasible_ratio: The ratio of generations that are feasable among all reachable questions, which ranges from 0.0 to 1.0.
sucess_ratio: The ratio of generations that are correct among all reachable questions, which ranges from 0.0 to 1.0.
optimal_ratio: The ratio of generations that are optimal among all reachable questions, which ranges from 0.0 to 1.0.
distance: The mean distance to the end point for feasable paths that were not correct, it's a positive real.
unreachable_acc: The ratio of detected unreachable paths among all unreachable paths, which ranges from 0.0 to 1.0.
Values from Popular Papers
Examples
import evaluate
pp_eval = evaluate.load("rfr2003/path_planning_evaluate")
results = pp_eval.compute(
generations=['[(0,0), (0,1), (1,1)]', '[(0,0), (1,0), (1,1)]', '[(0,0), (1,0), (1,1), (0,1)]', '(0,0'],
golds=[[(0,0), (0,1), (1,1)], [(0,0), (0,1), (1,1)], [(0,0), (0,1)], []],
obstacles=[[(1,0)], [(1,0)], [], []],
ends=[[(1,1)], [(1,1)], [(0,1)], [(0,1)]],
n=[2, 2, 2, 2]
)
print(results)
{'compliance_ratio': 0.75, 'success_ratio': 0.6666666666666666,'optimal_ratio': 0.3333333333333333, 'feasible_ratio': 0.6666666666666666, 'distance': 0, 'unreachable_acc': 1.0}