Running
1
👀
Discover amazing AI apps made by the community!
VLMEvalKit Eval Results in video understanding benchmark
Leaderboard and arena of Video Generation models
A realistic benchmark with real CRM tasks for LLM agents.
Massive Multi-Task LLM Benchmark for Online Shopping
A benchmark for open-source multi-dialect Arabic ASR models
Demo of the new, massively multilingual leaderboard
Benchmark the ability of LLMs to produce secure code.
Adds Open LLM Leaderboard results to a target modelcard
Generative Tasks Evaluation of Arabic LLMs
Persian Text Embedding Benchmark
Companion leaderboard for the SLM survey paper
A Leaderboard that demonstrates LMM reasoning capabilities