Instructions to use TilQazyna/Til-2B-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TilQazyna/Til-2B-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TilQazyna/Til-2B-instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("TilQazyna/Til-2B-instruct") model = AutoModelForCausalLM.from_pretrained("TilQazyna/Til-2B-instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use TilQazyna/Til-2B-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TilQazyna/Til-2B-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TilQazyna/Til-2B-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/TilQazyna/Til-2B-instruct
- SGLang
How to use TilQazyna/Til-2B-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TilQazyna/Til-2B-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TilQazyna/Til-2B-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TilQazyna/Til-2B-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TilQazyna/Til-2B-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use TilQazyna/Til-2B-instruct with Docker Model Runner:
docker model run hf.co/TilQazyna/Til-2B-instruct
Til-2B-instruct
Til-2B-instruct — Til-2B base-моделінің нұсқаулықты орындайтын (instruction-tuned) нұсқасы. Қазақша сұрақ-жауап, мәтін жазу, қорытындылау, қайта жазу сияқты тапсырмаларды орындайды.
Til-2B-instruct is an instruction-tuned version of the Til-2B base model, for Kazakh-first chat, question answering, writing, summarization and rewriting.
Model details
| Base | TilQazyna/Til-2B (1977M, dense + MLA) |
| Format | ChatML (`< |
| Loss | assistant tokens only |
| Data | ~345K instruction–response pairs (≈70% Kazakh; also ru/en/code/math) |
| Epochs / LR | 3 / 1e-5 (cosine), bf16 |
| Context | 4096 |
| Hardware | 8×H200, DDP |
The instruction mix covers QA, summarization, rephrasing, classification, NER, keyword extraction, simplification, title generation and general chat, with Russian/English/code/math instructions for cross-lingual robustness.
Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "TilQazyna/Til-2B-instruct"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, dtype=torch.bfloat16, device_map="auto")
msgs = [{"role": "user", "content": "Денсаулықты сақтау үшін 3 кеңес бер."}]
text = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
ids = tok(text, return_tensors="pt", add_special_tokens=False).input_ids.to(model.device)
out = model.generate(ids, max_new_tokens=300, do_sample=True, temperature=0.7,
top_p=0.9, repetition_penalty=1.1, pad_token_id=0,
eos_token_id=tok.convert_tokens_to_ids("<|im_end|>"))
print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))
Sample outputs
Q: Абай Құнанбайұлы кім болған? A: Абай Құнанбайұлы — қазақ халқының ұлы ақыны, ойшыл және қоғам қайраткері. Ол қазіргі Қазақстан аумағында дүниеге келген, өмірінің көп бөлігін Семей қаласында өткізген… (өлеңдері мен қара сөздері туралы жалғасады).
Q: Денсаулықты сақтау үшін 3 кеңес бер. A: 1. Көбірек су ішіңіз… 2. Дұрыс тамақтаныңыз… 3. Темекіден бас тартыңыз…
Intended use & limitations
- Intended: Kazakh-first assistant for QA, writing, summarization, rewriting.
- Reasoning/math: arithmetic and multi-step reasoning are weak (a known limit at this scale); the model may switch to Russian on math prompts.
- Factuality: can hallucinate; verify facts and numbers.
- No safety alignment / RLHF has been applied.
License
Apache 2.0. Access is gated (manual approval) for usage tracking.
- Downloads last month
- -
Model tree for TilQazyna/Til-2B-instruct
Base model
TilQazyna/Til-2B