🤗 BiGGen-Bench Leaderboard 🏋️

""" BGB_LOGO = '

' BGB_TITLE = """

BiGGen-Bench Leaderboard

""" ABOUT = """ ## 📝 About ### BiGGen-Bench Leaderboard Welcome to the 🌟 BiGGen-Bench Leaderboard 🚀, a dedicated benchmarking platform designed to evaluate the nuanced capabilities of Generative Language Models (GLMs) across a variety of complex and diverse tasks. Leveraging the refined methodologies of [BiGGen-Bench](https://github.com/prometheus-eval/prometheus-eval), our leaderboard offers a comprehensive assessment framework that mirrors human-like discernment and precision in evaluating language models. #### Evaluation Details - **Evaluation Scope**: Covers nine key capabilities of GLMs across 77 tasks, with 765 unique instances tailored to test specific aspects of model performance. - **Scoring System**: Utilizes a detailed scoring rubric from 1 to 5, reflecting a range of outcomes based on instance-specific criteria closely aligned with the nuanced requirements of each task. - **Hardware and Setup**: Benchmarks are conducted using a controlled setup to ensure consistent and fair comparison across different models. - **Transparency and Openness**: All codes, data, and detailed evaluation results are publicly available to foster transparency and enable community-driven enhancements and verifications. #### Benchmarking Script All benchmarks are executed using the provided [code](https://github.com/prometheus-eval/prometheus-eval/blob/main/BiGGen-Bench) within the BiGGen-Bench repository. This script ensures that all models are evaluated under identical conditions, guaranteeing reliability and reproducibility of results. """ CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results." CITATION_BUTTON = r"""TBA """