Hallucinations Leaderboard
View and submit LLM evaluations
Nexus Function Calling Leaderboard
Visualize model performance on function calling tasks
Tofu Leaderboard
Explore unlearning performance metrics of language models
Enterprise Scenarios Leaderboard
Clembench
Browse and compare language model leaderboards
MVBench Leaderboard
Submit model evaluation and view leaderboard
Leaderboard / SeaEval
Browse leaderboard insights across various NLP tasks
Followers Leaderboard
Display a HuggingFace follower leaderboard
MindBigData Leaderboard
Track and rank EEG brain signal models
Yet Another LLM Leaderboard
Run a Streamlit web app
LLM Safety Leaderboard
View and submit machine learning model evaluations
Japanese Chatbot Arena Leaderboard
Compare two chatbots and vote on the better one
NPHardEval Leaderboard
Explore and compare LLM models through a leaderboard
Open Ita Llm Leaderboard
Track, rank and evaluate open LLMs in the italian language!
Open Chinese LLM Leaderboard
Display and filter LLM benchmark results
Open CoT Leaderboard
Track, rank and evaluate open LLMs' CoT quality
Subquadratic LLM Leaderboard
Submit and filter LLM models for evaluation
EQ Bench
View EQ-Bench Leaderboard for LLMs
Open PL LLM Leaderboard
View and filter LLM leaderboard data
Berkeley Function Calling Leaderboard
Powered By Intel Leaderboard
Evaluate and submit open-source LLMs for ranking on Intel's leaderboard
Salad Bench Leaderboard
Display model leaderboard from Excel data
Open Multilingual Reasoning Leaderboard
Display and search a leaderboard of math models