Spaces:
Sleeping
Sleeping
metadata
title: LeaderboardFinder
emoji: 🐢
colorFrom: pink
colorTo: gray
sdk: gradio
sdk_version: 4.22.0
app_file: app.py
pinned: false
If you want your leaderboard to appear, feel free to add relevant information in its metadata, and it will be displayed here.
Categories
Submission type
Arenas are not concerned by this category.
submission:automatic
: users can submit their models as such to the leaderboard, and evaluation is run automatically without human interventionsubmission:semiautomatic
: the leaderboard requires the model owner to run evaluations on his side and submit the resultssubmission:manual
: the leaderboard requires the leaderboard owner to run evaluations for new submissionssubmission:closed
: the leaderboard does not accept submissions at the moment
Test set status
Arenas are not concerned by this category.
test:public
: all the test sets used are public, the evaluations are completely reproducibletest:mix
: some test sets are public and some privatetest:private
: all the test sets used are private, the evaluations are hard to gametest:rolling
: the test sets used change regularly through time and evaluation scores are refreshed
Judges
judge:auto
: evaluations are run automatically, using an evaluation suite such aslm_eval
orlighteval
judge:model
: evaluations are run using a model as a judge approach to rate answerjudge:humans
: evaluations are done by humans to rate answer - this is an arenajudge:vibe_check
: evaluations are done manually by one human
Modalities
Can be any (or several) of the following list:
modality:text
modality:image
modality:video
modality:audio
A bit outside of usual modalitiesmodality:tools
: requires added tool usage - mostly for assistant modelsmodality:artefacts
: the leaderboard concerns itself with machine learning artefacts as themselves, for example, quality evaluation of text embeddings.
Evaluation categories
Can be any (or several) of the following list:
eval:generation
: the evaluation looks at generation capabilities specifically (can be image generation, text generation, ...)eval:math
eval:code
eval:performance
: model performance (speed, energy consumption, ...)eval:safety
: safety, toxicity, bias evaluations
Language
You can indicate the languages covered by your benchmark like so: language:mylanguage
.
At the moment, we do not support language codes, please use the language name in English.