You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Dhavana-Base-150M

Decoder-only Transformer pretrained from scratch with strong Dhivehi support.

Field	Value
Parameters	125,264,640 (~150M)
Non-embedding params	100,688,640
Architecture	16 layers, d_model=768, GQA 12Q/4KV, SwiGLU, RMSNorm, RoPE
Context length	2048
Tokenizer	`Serialtechlab/dhavana-tok-v0`
Final step	12,000
Total unique training tokens	3,706,066,846
Training data mix	English 65% / Multilingual 13% / Dhivehi 12% / Math 7% / Code 3%

This is the base (pretrained) model — not instruction-tuned. SFT models will be released separately.

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support