Spaces:

rfr2003
/

path_planning_evaluate

Sleeping

App Files Files Community

path_planning_evaluate / README.md

Rodrigo Ferreira Rodrigues

Updating Readme.md

8665589 18 days ago

preview code

raw

history blame contribute delete

3.31 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

metadata

title: Path_Planning_evaluate
datasets:
  - GeoBenchmark
tags:
  - evaluate
  - metric
description: 'TODO: add a description here'
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false

Metric Card for Path_Planning_evaluate

This metric is used to evaluate path planning tasks where an LM as to generate a valid path going from a starting point to one or multiple end points in a grid and by avoiding all the obstacles.

Metric Description

This metric is used to evaluate path planning tasks where an LM as to generate a valid path going from a starting point to one or multiple end points in a grid and by avoiding all the obstacles.

How to Use

This metric takes 5 mandatory arguments : generations (a list of string), golds (a list of list of integers corresponding to the gold paths in a list format), obstacles (a list of list of integers corresponding to the coordinates of the obstacles for each question), ends (a list of list of integers corresponding to the coordinates of the ending points for each question) and n (a list of integers corresponding to the size pf the grid).

import evaluate
pp_eval = evaluate.load("rfr2003/path_planning_evaluate")
results = pp_eval.compute(
    generations=['[(0,0), (0,1), (1,1)]', '[(0,0), (1,0), (1,1)]', '[(0,0), (1,0), (1,1), (0,1)]', '(0,0'], 
    golds=[[(0,0), (0,1), (1,1)], [(0,0), (0,1), (1,1)], [(0,0), (0,1)], []], 
    obstacles=[[(1,0)], [(1,0)], [], []], 
    ends=[[(1,1)], [(1,1)], [(0,1)], [(0,1)]], 
    n=[2, 2, 2, 2]
)
print(results)
{'compliance_ratio': 0.75, 'success_ratio': 0.6666666666666666,'optimal_ratio': 0.3333333333333333, 'feasible_ratio': 0.6666666666666666, 'distance': 0, 'unreachable_acc': 1.0}

This metric doesn't take any optionnal arguments.

Output Values

This metric outputs a dictionary with the following values:

compliance_ratio: The ratio of generations that complied to a list format across all questions, which ranges from 0.0 to 1.0. feasible_ratio: The ratio of generations that are feasable among all reachable questions, which ranges from 0.0 to 1.0. sucess_ratio: The ratio of generations that are correct among all reachable questions, which ranges from 0.0 to 1.0. optimal_ratio: The ratio of generations that are optimal among all reachable questions, which ranges from 0.0 to 1.0. distance: The mean distance to the end point for feasable paths that were not correct, it's a positive real. unreachable_acc: The ratio of detected unreachable paths among all unreachable paths, which ranges from 0.0 to 1.0.

Values from Popular Papers

Examples

import evaluate
pp_eval = evaluate.load("rfr2003/path_planning_evaluate")
results = pp_eval.compute(
    generations=['[(0,0), (0,1), (1,1)]', '[(0,0), (1,0), (1,1)]', '[(0,0), (1,0), (1,1), (0,1)]', '(0,0'], 
    golds=[[(0,0), (0,1), (1,1)], [(0,0), (0,1), (1,1)], [(0,0), (0,1)], []], 
    obstacles=[[(1,0)], [(1,0)], [], []], 
    ends=[[(1,1)], [(1,1)], [(0,1)], [(0,1)]], 
    n=[2, 2, 2, 2]
)
print(results)
{'compliance_ratio': 0.75, 'success_ratio': 0.6666666666666666,'optimal_ratio': 0.3333333333333333, 'feasible_ratio': 0.6666666666666666, 'distance': 0, 'unreachable_acc': 1.0}