Spaces:

prometheus-eval
/

BiGGen-Bench-Leaderboard

Running on CPU Upgrade

App Files Files Community

BiGGen-Bench-Leaderboard / src /content.py

scottsuk0306

Init

b1b6ed6 5 months ago

raw

history blame

2.1 kB

	LOGO = '<img src="https://raw.githubusercontent.com/prometheus-eval/leaderboard/main/logo.png">'

	TITLE = """<h1 align="center" id="space-title">🤗 BiGGen-Bench Leaderboard 🏋️</h1>"""

	BGB_LOGO = '<img src="https://raw.githubusercontent.com/prometheus-eval/leaderboard/main/logo.png" alt="Logo" style="width: 30%; display: block; margin: auto;">'
	BGB_TITLE = """<h1 align="center">BiGGen-Bench Leaderboard</h1>"""


	ABOUT = """
	## 📝 About
	### BiGGen-Bench Leaderboard

	Welcome to the 🌟 BiGGen-Bench Leaderboard 🚀, a dedicated benchmarking platform designed to evaluate the nuanced capabilities of Generative Language Models (GLMs) across a variety of complex and diverse tasks. Leveraging the refined methodologies of [BiGGen-Bench](https://github.com/prometheus-eval/prometheus-eval), our leaderboard offers a comprehensive assessment framework that mirrors human-like discernment and precision in evaluating language models.

	#### Evaluation Details

	- Evaluation Scope: Covers nine key capabilities of GLMs across 77 tasks, with 765 unique instances tailored to test specific aspects of model performance.
	- Scoring System: Utilizes a detailed scoring rubric from 1 to 5, reflecting a range of outcomes based on instance-specific criteria closely aligned with the nuanced requirements of each task.
	- Hardware and Setup: Benchmarks are conducted using a controlled setup to ensure consistent and fair comparison across different models.
	- Transparency and Openness: All codes, data, and detailed evaluation results are publicly available to foster transparency and enable community-driven enhancements and verifications.

	#### Benchmarking Script

	All benchmarks are executed using the provided [code](https://github.com/prometheus-eval/prometheus-eval/blob/main/BiGGen-Bench) within the BiGGen-Bench repository. This script ensures that all models are evaluated under identical conditions, guaranteeing reliability and reproducibility of results.

	"""


	CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results."
	CITATION_BUTTON = r"""TBA
	"""