LLM

QwQ-32B-Preview

Running on CPU Upgrade

12k

🏆

Open LLM Leaderboard

Track, rank and evaluate open LLMs and chatbots

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 104

Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 104

Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 158

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 142

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 138

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 16

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 7

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14

PaLM: Scaling Language Modeling with Pathways

Paper • 2204.02311 • Published Apr 5, 2022 • 2

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 27

LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 13

GPT-4 Technical Report

Paper • 2303.08774 • Published Mar 15, 2023 • 5

PaLM 2 Technical Report

Paper • 2305.10403 • Published May 17, 2023 • 6

Textbooks Are All You Need II: phi-1.5 technical report

Paper • 2309.05463 • Published Sep 11, 2023 • 87

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22 • 126

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22 • 126

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 109

meta-llama/Llama-Guard-3-8B

Text Generation • Updated Oct 11 • 221k • 144

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 14

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published Jun 17 • 57

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8 • 61

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29 • 136

allenai/Molmo-7B-D-0924

Image-Text-to-Text • Updated Oct 10 • 230k • 461

PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published 18 days ago • 118

OpenGVLab/InternVL2_5-78B

Image-Text-to-Text • Updated 4 days ago • 3.93k • 154

Running

366

⚡

InternVL

Qwen/Qwen2.5-0.5B-Instruct

Text Generation • Updated Sep 25 • 460k • • 148

Running

578

🚀

Qwen2.5

Qwen/Qwen2.5-0.5B

Text Generation • Updated Sep 25 • 286k • 146

google/gemma-1.1-7b-it

Text Generation • Updated Jun 27 • 15.5k • • 267

meta-llama/Meta-Llama-3-8B-Instruct

Text Generation • Updated Sep 27 • 1.89M • • 3.71k

meta-llama/Meta-Llama-3-8B

Text Generation • Updated Sep 27 • 572k • 5.92k

meta-llama/Meta-Llama-3-70B-Instruct

Text Generation • Updated 7 days ago • 88k • 1.44k

meta-llama/Meta-Llama-3-70B

Text Generation • Updated Sep 27 • 35.8k • 839

meta-llama/Meta-Llama-Guard-2-8B

Text Generation • Updated May 13 • 20.1k • 287

meta-llama/Llama-3.2-11B-Vision

Image-Text-to-Text • Updated Sep 27 • 94.8k • 412

meta-llama/Llama-3.2-1B-Instruct

Text Generation • Updated Oct 24 • 1.82M • • 656

Running

1.21k

🐢

Qwen2.5 Coder Artifacts

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19 • 135

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 102

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14 • 74

meta-llama/Llama-3.1-70B

Text Generation • Updated Sep 25 • 114k • 326

meta-llama/Llama-3.1-8B-Instruct

Text Generation • Updated Sep 25 • 4.42M • • 3.31k

microsoft/Phi-3.5-vision-instruct

Image-Text-to-Text • Updated Sep 26 • 337k • 619

microsoft/Phi-3.5-MoE-instruct

Text Generation • Updated Oct 24 • 54k • 539

meta-llama/Llama-3.2-3B

Text Generation • Updated Oct 24 • 1.15M • • 413

meta-llama/Llama-3.2-1B

Text Generation • Updated Oct 24 • 2.21M • • 1.3k

microsoft/Phi-3.5-mini-instruct

Text Generation • Updated Sep 18 • 596k • • 707

Qwen/Qwen2-VL-7B-Instruct

Image-Text-to-Text • Updated 16 days ago • 2.57M • • 962

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18 • 74

Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 7

liuhaotian/llava-v1.5-7b

Image-Text-to-Text • Updated May 8 • 1.42M • 391

Qwen/Qwen2-VL-2B-Instruct

Image-Text-to-Text • Updated 16 days ago • 926k • 326

openbmb/MiniCPM-Llama3-V-2_5

Image-Text-to-Text • Updated Sep 25 • 30.5k • 1.38k

Running

532

🖼💬

Vision Arena (Testing VLMs side-by-side)

microsoft/Florence-2-large

Image-Text-to-Text • Updated 13 days ago • 337k • 1.3k

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Paper • 2311.06242 • Published Nov 10, 2023 • 86

nvidia/NVLM-D-72B

Image-Text-to-Text • Updated Oct 18 • 10.8k • 757

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17 • 72

rhymes-ai/Aria

Image-Text-to-Text • Updated 4 days ago • 18.9k • 598

mistralai/Pixtral-12B-2409

Image-Text-to-Text • Updated 16 days ago • 550

HuggingFaceM4/idefics2-8b

Image-Text-to-Text • Updated Oct 14 • 22.5k • 597

liuhaotian/llava-v1.5-13b

Image-Text-to-Text • Updated May 9 • 118k • 488

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25 • 104

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3 • 100

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 126

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19 • 149

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 86

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 72

ShareGPT4Video/ShareGPT4Video

Viewer • Updated Jul 8 • 40.2k • 2.79k • 183

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11 • 55

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13 • 50

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24 • 58

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Paper • 2406.19741 • Published Jun 28 • 59

Vision language models are blind

Paper • 2407.06581 • Published Jul 9 • 82

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 68

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6 • 59

Sapiens: Foundation for Human Vision Models

Paper • 2408.12569 • Published Aug 22 • 89

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18 • 138

Baichuan-Omni Technical Report

Paper • 2410.08565 • Published Oct 11 • 84

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17 • 89

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published Nov 7 • 111

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88

Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6 • 112

Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30 • 116

OpenGVLab/InternVL2_5-1B

Image-Text-to-Text • Updated 4 days ago • 4.97k • 32

Qwen/Qwen2-Audio-7B-Instruct

Audio-Text-to-Text • Updated Nov 20 • 425k • 276

Qwen2-Audio Technical Report

Paper • 2407.10759 • Published Jul 15 • 55

Running on Zero

232

🔥

Qwen2-VL-7B

llava-hf/llava-v1.6-mistral-7b-hf

Image-Text-to-Text • Updated 30 days ago • 237k • 242

Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 37

openbmb/MiniCPM-V-2

Visual Question Answering • Updated Aug 6 • 5.59k • 433

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Paper • 2403.11703 • Published Mar 18 • 16

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

Paper • 2308.12038 • Published Aug 23, 2023 • 2

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3 • 79

mistralai/Mixtral-8x7B-Instruct-v0.1

Text Generation • Updated Aug 19 • 2.94M • • 4.23k

mistralai/Mistral-7B-v0.1

Text Generation • Updated Jul 24 • 3.02M • • 3.5k

microsoft/phi-2

Text Generation • Updated Apr 29 • 190k • 3.26k

mistralai/Mistral-7B-Instruct-v0.2

Text Generation • Updated Sep 27 • 3.29M • • 2.6k

Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 47

nvidia/Mistral-NeMo-Minitron-8B-Base

Text Generation • Updated Aug 22 • 16.1k • 166

Anychat

Qwen2.5 Coder Artifacts

QwQ-32B-Preview

Open LLM Leaderboard

InternVL

Qwen2.5

Qwen2.5 Coder Artifacts

Vision Arena (Testing VLMs side-by-side)

Qwen2-VL-7B