qwen-1b-pruned-th

Depth-pruned (layer dropping) + healing SFT of Qwen2.5-3B for Thai.

Spec

  • Base: Qwen2.5-3B
  • Params: 1.70B (kept 18/36 decoder layers; drop middle, keep head+tail)
  • Healing: SFT on SEA-PILE v2 Thai (~8k docs), bf16
  • Requires: transformers>=4.44, accelerate

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
m = "Chokun00032/qwen-1b-pruned-th"
tok = AutoTokenizer.from_pretrained(m)
model = AutoModelForCausalLM.from_pretrained(m, torch_dtype=torch.bfloat16, device_map="cuda")
ids = tok("ปัญญาประดิษฐ์ คือ", return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=120, do_sample=True,
                     temperature=0.7, top_p=0.9, repetition_penalty=1.3)
print(tok.decode(out[0], skip_special_tokens=True))

Notes

  • Pruned base healed on raw corpus: Thai grammar is fluent, but factual/arithmetic ability is weak.
  • Use repetition_penalty>=1.2 to avoid loops.
  • Best used as a base for further instruction fine-tuning.
Downloads last month
30
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support