MiniCPM-2B Collection The MiniCPM family of LLMs and VLLMs. • 18 items • Updated about 6 hours ago • 11
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Apr 18 • 564
In deep reinforcement learning, a pruned network is a good network Paper • 2402.12479 • Published Feb 19 • 16
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 106
SliceGPT: Compress Large Language Models by Deleting Rows and Columns Paper • 2401.15024 • Published Jan 26 • 62
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11 • 35
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws Paper • 2401.00448 • Published Dec 31, 2023 • 25
Beyond Surface: Probing LLaMA Across Scales and Layers Paper • 2312.04333 • Published Dec 7, 2023 • 18