LeaderboardsExplorer

Sleeping

App Files Files Community

LeaderboardsExplorer / README.md

Clémentine

need to add the selectors now

4b2522c 9 months ago

preview code

raw

history blame

2.52 kB

	---
	title: LeaderboardFinder
	emoji: 🐢
	colorFrom: pink
	colorTo: gray
	sdk: gradio
	sdk_version: 4.22.0
	app_file: app.py
	pinned: false
	---

	If you want your leaderboard to appear, feel free to add relevant information in its metadata, and it will be displayed here.

	# Categories

	## Submission type
	Arenas are not concerned by this category.

	- `submission:automatic`: users can submit their models as such to the leaderboard, and evaluation is run automatically without human intervention
	- `submission:semiautomatic`: the leaderboard requires the model owner to run evaluations on his side and submit the results
	- `submission:manual`: the leaderboard requires the leaderboard owner to run evaluations for new submissions
	- `submission:closed`: the leaderboard does not accept submissions at the moment

	## Test set status
	Arenas are not concerned by this category.

	- `test:public`: all the test sets used are public, the evaluations are completely reproducible
	- `test:mix`: some test sets are public and some private
	- `test:private`: all the test sets used are private, the evaluations are hard to game
	- `test:rolling`: the test sets used change regularly through time and evaluation scores are refreshed

	## Judges
	- `judge:auto`: evaluations are run automatically, using an evaluation suite such as `lm_eval` or `lighteval`
	- `judge:model`: evaluations are run using a model as a judge approach to rate answer
	- `judge:humans`: evaluations are done by humans to rate answer - this is an arena
	- `judge:vibe_check`: evaluations are done manually by one human

	## Modalities
	Can be any (or several) of the following list:
	- `modality:text`
	- `modality:image`
	- `modality:video`
	- `modality:audio`
	A bit outside of usual modalities
	- `modality:tools`: requires added tool usage - mostly for assistant models
	- `modality:artefacts`: the leaderboard concerns itself with machine learning artefacts as themselves, for example, quality evaluation of text embeddings.

	## Evaluation categories
	Can be any (or several) of the following list:
	- `eval:generation`: the evaluation looks at generation capabilities specifically (can be image generation, text generation, ...)
	- `eval:math`
	- `eval:code`
	- `eval:performance`: model performance (speed, energy consumption, ...)
	- `eval:safety`: safety, toxicity, bias evaluations

	## Language
	You can indicate the languages covered by your benchmark like so: `language:mylanguage`.
	At the moment, we do not support language codes, please use the language name in English.