Model Overview
NavinLLM is a bilingual (English/French) Mamba2-Hybrid model that integrates Mamba2, attention, and hybrid layers, designed with a sequence length of 4K tokens. The model training methodology is based on the techniques outlined in the “An Empirical Study of Mamba-based Language Models”. Each version of the NavinLLM models has been trained on varying amounts of data, ranging from 10 billion tokens for the smallest model (200M parameters) to 800 billion tokens for the largest (7B parameters). These models are provided as base models without fine-tuning, except for the instruct version.
Versions
NavinLLM-200M
: 200M Parameters Hybrid trained on 10B tokens (bilingual).NavinLLM-400M
: 400M Parameters Hybrid trained on 20B tokens (bilingual).NavinLLM-2B
: 2B Parameters pure SSM trained on 200B tokens (French).NavinLLM-7B
: 7B Parameters Hybrid trained on 800B tokens (bilingual).NavinLLM-7B-Instruct
: Fine-tuned on several tasks (Summarization / QA / Translation...)
Tokenizer
NavinLLM was trained using a custom SentencePiece Tokenizer, with two versions available: a 32k token vocabulary for more efficient representation, and a 52k token vocabulary designed to accommodate a broader range of tokens and linguistic variability.
Datasets
NavinLLM was trained on proprietary datasets, consisting of both publicly available data and synthetically generated content.
- Downloads last month
- 0