Instructions to use The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant") model = AutoModelForMultimodalLM.from_pretrained("The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant
- SGLang
How to use The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant with Docker Model Runner:
docker model run hf.co/The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant
The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant
A Bengali-friendly assistant LLM fine-tuned from Qwen2.5-0.5B on a Bengali book about Claude AI ("ক্লড এআই মাস্টারি" by Sajid Ahmed).
Model Details
- Base Model: Qwen/Qwen2.5-0.5B (494M parameters)
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- LoRA Config: rank=16, alpha=32, dropout=0.05
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Training Data: 2,486 examples generated from a 118-page Bengali book on Claude AI
- Training Examples: 2,129 (after filtering for assistant-token presence)
- Trainable Tokens: 409,505 (54.2% of total)
- Optimizer Steps: 266
- Final Loss: ~0.39
- Max Sequence Length: 384
- Learning Rate: 5e-5 (cosine schedule with 20-step warmup)
- Epochs: 1
- Compute: CPU-only (4 cores, 8GB RAM)
Training Approach
Data Generation
The training data was generated from the Bengali book "ক্লড এআই মাস্টারি" (Claude AI Mastery) using template-based generation covering:
- QA pairs (factual + conceptual)
- Topic-based questions
- Summary instructions
- Explain-like-I'm-5 instructions
- Casual chat turns
- Multi-turn conversations
- Analogy requests
- Comparison questions
- Step-by-step explanations
- Common mistakes
- Generic conversation
Masking Strategy
Only assistant tokens are trained (labels set). System and user tokens are masked with -100 to focus learning on response generation rather than prompt memorization.
Style
The assistant is trained to be friendly, patient, and example-driven — inspired by Claude's teaching style. The system prompt instructs:
"তুমি একজন বন্ধুত্বপূর্ণ বাংলা অ্যাসিস্ট্যান্ট। Claude AI বিষয়ে সহজ ভাষায় উদাহরণসহ উত্তর দাও।"
(You are a friendly Bengali assistant. Answer about Claude AI in simple language with examples.)
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained(
"The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant",
dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant",
trust_remote_code=True,
)
messages = [
{"role": "system", "content": "তুমি একজন বন্ধুত্বপূর্ণ বাংলা অ্যাসিস্ট্যান্ট। Claude AI বিষয়ে সহজ ভাষায় উদাহরণসহ উত্তর দাও।"},
{"role": "user", "content": "Claude কী?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=200,
do_sample=True,
temperature=0.6,
top_p=0.85,
repetition_penalty=1.2,
pad_token_id=tokenizer.eos_token_id,
)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
Limitations
- Base Model Size: Qwen2.5-0.5B is small (494M params). Output quality is limited.
- PDF Extraction Issues: The source PDF had Bengali font-encoding issues, so some compound characters were garbled in the training data.
- Single-Epoch Training: To avoid overfitting and save compute, only 1 epoch was used.
- CPU-Only Training: Training was done on CPU, which constrained model size and example count.
Training Metadata
See training_metadata.json for full training hyperparameters and statistics.
Files
model.safetensors— Full merged model (bf16, 943MB)adapter_model.safetensors— LoRA adapter only (35MB) — use this with the base model for faster loadingadapter_config.json— LoRA configurationtokenizer.json— Tokenizertraining_metadata.json— Training hyperparameters
License
Apache 2.0 (inherited from Qwen2.5)
Acknowledgments
- Base model: Qwen Team (Qwen2.5-0.5B)
- Training data: "ক্লড এআই মাস্টারি" book by Sajid Ahmed, published by Shehzeen Publications
- LoRA fine-tuning: PEFT library by HuggingFace
- Downloads last month
- -
Model tree for The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant
Base model
Qwen/Qwen2.5-0.5B