nvidia/Llama-3.1-Nemotron-8B-UltraLong-4M-Instruct Text Generation • Updated 7 days ago • 2.58k • 96
Recurrent Models Collection These are checkpoints for recurrent LLMs developed to scale test-time compute by recurring in latent space. • 15 items • Updated 28 days ago • 7
LoRI Adapters Collection LoRI adapters for natural language understanding, code generation, mathematical reasoning, and safety alignment, based on LLaMA-3-8B and Mistral-7B. • 39 items • Updated 9 days ago • 1
Gemstone Models Collection Our 22 open source Gemstone models for scaling laws range from 50M to 2B parameters, spanning 11 widths from 256 to 3072 and 18 depths from 3 to 80. • 59 items • Updated Feb 26 • 8