SSMs Collection A collection of Mamba-2-based research models with 8B parameters trained on 3.5T tokens for comparison with Transformers. • 5 items • Updated Jan 17 • 27
MambaVision: A Hybrid Mamba-Transformer Vision Backbone Paper • 2407.08083 • Published Jul 10, 2024 • 29
Nemotron 4 340B Collection Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated Jan 17 • 162
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22, 2024 • 127
view article Article Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU! By lyogavin • Apr 21, 2024 • 44