Kimi-K2 Collection Moonshot's MoE LLMs with 1 trillion parameters, exceptional on agentic intellegence • 2 items • Updated 6 days ago • 95
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • 11 days ago • 549
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning Paper • 2301.13688 • Published Jan 31, 2023 • 9
Flan-T5 release Collection The Flan-T5 covers 4 checkpoints of different sizes each time. It also includes upgrades versions trained using Universal sampling • 7 items • Updated 8 days ago • 27
Open LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 65 items • Updated Mar 20 • 619
Unsloth Dynamic 2.0 Quants Collection New 2.0 version of our Dynamic GGUF + Quants. Dynamic 2.0 achieves superior accuracy & SOTA quantization performance. • 37 items • Updated 2 days ago • 145
BitNet Collection 🔥BitNet family of large language models (1-bit LLMs). • 7 items • Updated May 1 • 47
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models Paper • 2506.19697 • Published 24 days ago • 44
Jack of all Trades Models Collection Home of the Personality Engine series, models that can be molded to fit any task or purpose. • 2 items • Updated May 23 • 7
Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers Paper • 2504.18412 • Published Apr 25 • 1
Instella ✨ Collection Announcing Instella, a series of 3 billion parameter language models developed by AMD, trained from scratch on 128 Instinct MI300X GPUs. • 9 items • Updated Jun 16 • 7
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM By ariG23498 and 3 others • Mar 12 • 446
How many words does ChatGPT know? The answer is ChatWords Paper • 2309.16777 • Published Sep 28, 2023 • 1
Source files for GGUF, EXL2, AWQ, GPTQ, HQQ etc etc Collection Safetensor source files (by David_AU) to use directly and/or create different quants and/or merges. Link to GGUFS/full model card on each. • 231 items • Updated 1 day ago • 10
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes Paper • 2305.02301 • Published May 3, 2023 • 5