Dhavana-Base-150M
Decoder-only Transformer pretrained from scratch with strong Dhivehi support.
| Field | Value |
|---|---|
| Parameters | 125,264,640 (~150M) |
| Non-embedding params | 100,688,640 |
| Architecture | 16 layers, d_model=768, GQA 12Q/4KV, SwiGLU, RMSNorm, RoPE |
| Context length | 2048 |
| Tokenizer | Serialtechlab/dhavana-tok-v0 |
| Final step | 12,000 |
| Total unique training tokens | 3,706,066,846 |
| Training data mix | English 65% / Multilingual 13% / Dhivehi 12% / Math 7% / Code 3% |
This is the base (pretrained) model — not instruction-tuned. SFT models will be released separately.
- Downloads last month
- 15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support