Instructions to use TilQazyna/Til-Core-0.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TilQazyna/Til-Core-0.5B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TilQazyna/Til-Core-0.5B")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("TilQazyna/Til-Core-0.5B") model = AutoModelForMultimodalLM.from_pretrained("TilQazyna/Til-Core-0.5B") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use TilQazyna/Til-Core-0.5B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TilQazyna/Til-Core-0.5B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TilQazyna/Til-Core-0.5B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TilQazyna/Til-Core-0.5B
- SGLang
How to use TilQazyna/Til-Core-0.5B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TilQazyna/Til-Core-0.5B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TilQazyna/Til-Core-0.5B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TilQazyna/Til-Core-0.5B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TilQazyna/Til-Core-0.5B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TilQazyna/Til-Core-0.5B with Docker Model Runner:
docker model run hf.co/TilQazyna/Til-Core-0.5B
Til Core 0.5B
Til Core 0.5B is a 498-million-parameter Kazakh language model trained from scratch on a clean Kazakh corpus using a 256K morpheme-aware BPE tokenizer. It is a Qwen2-style decoder-only transformer built by TilQazyna as a compact, efficient foundation model for the Kazakh language.
Til — "language" in Kazakh. Til Core is the base on top of which task-specific Kazakh models (instruct, grammar correction, translation) can be fine-tuned.
Why a 256K morpheme-aware vocabulary?
Kazakh is highly agglutinative — a single root takes long chains of suffixes. Standard byte-level BPE fragments these into many sub-tokens, wasting context and parameters. Til Core uses a 256,000-token morpheme-aware BPE (stukenov/sozkz-morphbpe-256k-kk-v1) that aligns tokens with morphological boundaries, giving ~15–20% better compression on Kazakh text. The trade-off — a heavier embedding table — is absorbed by tying input/output embeddings and using a deeper-than-usual transformer body.
Model details
| Architecture | Qwen2 (decoder-only, SwiGLU, RoPE, GQA) |
| Parameters | 497.8M (embedding ≈ 229M, transformer ≈ 268M) |
| Vocabulary | 256,000 (morpheme-aware BPE) |
| Hidden size | 896 |
| Layers | 18 |
| Attention heads | 14 (GQA, 2 KV heads) |
| Intermediate size | 4864 |
| Context length | 32,768 (rope_theta = 1e6) |
| Tied embeddings | yes |
| Precision | bf16 |
Training
| Data | stukenov/sozkz-corpus-tokenized-kk-morphbpe256k-v1 — pre-tokenized clean Kazakh (~1.44M sequences × 2048 tokens ≈ 2.94B tokens) |
| Tokens seen | ≈ 5.88B (2 epochs) |
| Steps | 11,222 |
| Global batch | 524,288 tokens/step (8 × 8 × grad-accum 4 × 2048) |
| Optimizer | AdamW (β default), weight decay 0.1, grad clip 1.0 |
| LR schedule | 4e-4, cosine, 500 warmup steps |
| Sequence length | 2048 |
| Hardware | 8 × NVIDIA H200 (140 GB), ~3h15m |
| Final eval loss | 2.436 (validation), perplexity ≈ 11.4 |
Chinchilla-style budget: 498M params with ≈5.9B tokens (11.8 tokens/param).
Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "TilQazyna/Til-Core-0.5B"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, dtype=torch.bfloat16, device_map="auto").eval()
prompt = "Абай Құнанбайұлы — қазақтың"
ids = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=60, do_sample=True,
temperature=0.8, top_p=0.9, repetition_penalty=1.2)
print(tok.decode(out[0], skip_special_tokens=True))
The tokenizer is bundled with this repository (tokenizer.json, tokenizer_config.json).
Sample generations
Қазақстан Республикасының астанасы
→ … Астана қаласында орналасқан, Қазақстан Республикасы Президентінің
резиденциясы. Сарайдың негізгі ғимараттары: «Ақорда» залы …
Абай Құнанбайұлы — қазақтың
→ … рухани мәдениетінің көрнекті өкілі. Ол – ақын, ағартушы, жазба
әдебиетінің негізін салушы әрі дамытушы …
Жасанды интеллект дегеніміз —
→ … ақпаратты беру мен оны өңдеудің үздіксіз және тиімді жұмыс жасауын
қамтамасыз ететін технологиялар жиынтығы.
Limitations
- Base model, not instruction-tuned — it continues text, it does not follow chat instructions out of the box. Fine-tune for downstream tasks.
- Trained on web/encyclopedic Kazakh, so it can emit corpus artifacts (URLs, site names, boilerplate).
- No safety alignment — outputs are unfiltered.
- Knowledge is limited to the training corpus.
Citation
@misc{tilcore05b2026,
title = {Til Core 0.5B: a morpheme-aware Kazakh language model},
author = {TilQazyna},
year = {2026},
url = {https://huggingface.co/TilQazyna/Til-Core-0.5B}
}
Tokenizer: stukenov/sozkz-morphbpe-256k-kk-v1 · Dataset: stukenov/sozkz-corpus-tokenized-kk-morphbpe256k-v1
- Downloads last month
- 44