Quick start

from unsloth import FastLanguageModel
import torch

max_seq_length = 2048  # Choose any! We auto support RoPE Scaling internally!
dtype = (
    None  # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
)
load_in_4bit = False  # Use 4bit quantization to reduce memory usage. Can be False.
load_in_8bit = False  # Use 8bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="azherali/Aqal-1.0-8B-Instruct",  # Choose ANY
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    load_in_8bit=load_in_8bit,
    # token = "YOUR_HF_TOKEN", # HF Token for gated models
)
FastLanguageModel.for_inference(model)  # Enable native 2x faster inference

messages = [
    {
        "role": "user",
        "content": "پانچ بچوں نے 20 چاکلیٹس برابر بانٹیں۔ ہر بچے کو کتنی چاکلیٹس ملیں گی؟",
    }
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,  # Must add for generation
)

from transformers import TextStreamer

_ = model.generate(
    **tokenizer(text, return_tensors="pt").to("cuda"),
    temperature=0.6,
    top_p=0.95,
    top_k=20,  # For non thinking
    streamer=TextStreamer(tokenizer, skip_prompt=True),
)

Training procedure

This model was trained with SFT.

Framework versions

  • TRL: 0.22.2
  • Transformers: 4.56.2
  • Pytorch: 2.12.0+rocm7.2
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2
Downloads last month
501
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
Input a message to start chatting with azherali/Aqal-1.0-8B-Instruct.

Model tree for azherali/Aqal-1.0-8B-Instruct

Finetuned
Qwen/Qwen3-8B
Finetuned
unsloth/Qwen3-8B
Finetuned
(2)
this model