Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 149
Qwen2.5-1M Collection The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated 29 days ago • 108
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 10 items • Updated 2 days ago • 416
HIGGS Collection Models prequantized with [HIGGS](https://arxiv.org/abs/2411.17525) zero-shot quantization. Requires the latest `transformers` to run. • 18 items • Updated 26 days ago • 6
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 40 items • Updated Nov 28, 2024 • 295
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated 29 days ago • 572