view article Article Introducing smolagents: simple agents that write actions in code. Dec 31, 2024 • 779
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published 16 days ago • 44
view article Article Janus Pro: DeepSeek's Revolutionary Multimodal AI Model By LLMhacker • about 1 month ago • 31
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other • Jan 23 • 63
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published Jan 21 • 63
FAST: Efficient Action Tokenization for Vision-Language-Action Models Paper • 2501.09747 • Published Jan 16 • 23
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch By AviSoori1x • May 7, 2024 • 55
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) Dec 9, 2022 • 176
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28, 2024 • 188
PokerBench: Training Large Language Models to become Professional Poker Players Paper • 2501.08328 • Published Jan 14 • 17
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 273
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 92
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published Jan 7 • 50