GSMA Open-Telco LLM Benchmarks
Track, rank and evaluate Open Telecom LLMs and chatbots
Track, rank and evaluate Open Telecom LLMs and chatbots
Evaluate medical AI models with datasets
Leaderboard for LLVM APR Benchmark
Reasoning benchmark in linguistics
Display competition info, datasets, leaderboards, rules, and submissions
Fetch and display competition info and leaderboards
Non official benchmark by Fish Speech
Play Atari games with AI agents
Benchmark load model and tts time
View and manage competition data
Measure BERT model performance using WASM and WebGPU
Compare LLM performance across benchmarks
llm-calibration-benchmark
Generate palindromes and evaluate grammar across models
Food detection and weight prediction benchmark
Manage and annotate your datasets
Display evaluation results in a leaderboard
Display evaluation results on a leaderboard
Perform data preprocessing and benchmark different libraries
Interact with scientific literature to generate abstracts, titles, and citations