Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs 19 days ago • 10
Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes? Mar 5 • 2
NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates Feb 2 • 1
The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models Jan 29 • 4
A guide to setting up your own Hugging Face leaderboard: an end-to-end example with Vectara's hallucination leaderboard Jan 12 • 3
view article Article Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face 2 days ago • 7
view article Article A guide to setting up your own Hugging Face leaderboard: an end-to-end example with Vectara's hallucination leaderboard Jan 12 • 3
view article Article The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models Jan 29 • 4
view article Article Introducing the Enterprise Scenarios Leaderboard: a Leaderboard for Real World Use Cases Jan 31 • 1
view article Article Introducing the Open Ko-LLM Leaderboard: Leading the Korean LLM Evaluation Ecosystem Feb 20 • 1
view article Article Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes? Mar 5 • 2
view article Article NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates Feb 2 • 1
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset Paper • 2403.09029 • Published Mar 14 • 52
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 562
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels Paper • 2312.17090 • Published Dec 28, 2023 • 4
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling Paper • 2312.15166 • Published Dec 23, 2023 • 55
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 28 items • Updated Mar 23 • 172
NLEBench+NorGLM: A Comprehensive Empirical Analysis and Benchmark Dataset for Generative Language Models in Norwegian Paper • 2312.01314 • Published Dec 3, 2023 • 2
LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 65 items • Updated 6 days ago • 292