You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Dhavana-Base-150M

Decoder-only Transformer pretrained from scratch with strong Dhivehi support.

Field Value
Parameters 125,264,640 (~150M)
Non-embedding params 100,688,640
Architecture 16 layers, d_model=768, GQA 12Q/4KV, SwiGLU, RMSNorm, RoPE
Context length 2048
Tokenizer Serialtechlab/dhavana-tok-v0
Final step 12,000
Total unique training tokens 3,706,066,846
Training data mix English 65% / Multilingual 13% / Dhivehi 12% / Math 7% / Code 3%

This is the base (pretrained) model — not instruction-tuned. SFT models will be released separately.

Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support