Gated Delta Networks: Improving Mamba2 with Delta Rule Paper • 2412.06464 • Published Dec 9, 2024 • 17
ByteDance-Seed/AHN-Mamba2-for-Qwen-2.5-Instruct-14B Text Generation • 51.4M • Updated Oct 24, 2025 • 50 • 15
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models Paper • 2603.26164 • Published Mar 27 • 365
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct Text Generation • 16B • Updated Jul 3, 2024 • 1.02M • • 611