Instructions to use The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant")
model = AutoModelForMultimodalLM.from_pretrained("The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant

SGLang

How to use The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant with Docker Model Runner:
```
docker model run hf.co/The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant
```

The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant

A Bengali-friendly assistant LLM fine-tuned from Qwen2.5-0.5B on a Bengali book about Claude AI ("ক্লড এআই মাস্টারি" by Sajid Ahmed).

Model Details

Base Model: Qwen/Qwen2.5-0.5B (494M parameters)
Fine-tuning Method: LoRA (Low-Rank Adaptation)
LoRA Config: rank=16, alpha=32, dropout=0.05
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Data: 2,486 examples generated from a 118-page Bengali book on Claude AI
Training Examples: 2,129 (after filtering for assistant-token presence)
Trainable Tokens: 409,505 (54.2% of total)
Optimizer Steps: 266
Final Loss: ~0.39
Max Sequence Length: 384
Learning Rate: 5e-5 (cosine schedule with 20-step warmup)
Epochs: 1
Compute: CPU-only (4 cores, 8GB RAM)

Training Approach

Data Generation

The training data was generated from the Bengali book "ক্লড এআই মাস্টারি" (Claude AI Mastery) using template-based generation covering:

QA pairs (factual + conceptual)
Topic-based questions
Summary instructions
Explain-like-I'm-5 instructions
Casual chat turns
Multi-turn conversations
Analogy requests
Comparison questions
Step-by-step explanations
Common mistakes
Generic conversation

Masking Strategy

Only assistant tokens are trained (labels set). System and user tokens are masked with -100 to focus learning on response generation rather than prompt memorization.

Style

The assistant is trained to be friendly, patient, and example-driven — inspired by Claude's teaching style. The system prompt instructs:

"তুমি একজন বন্ধুত্বপূর্ণ বাংলা অ্যাসিস্ট্যান্ট। Claude AI বিষয়ে সহজ ভাষায় উদাহরণসহ উত্তর দাও।"

(You are a friendly Bengali assistant. Answer about Claude AI in simple language with examples.)

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant",
    dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant",
    trust_remote_code=True,
)

messages = [
    {"role": "system", "content": "তুমি একজন বন্ধুত্বপূর্ণ বাংলা অ্যাসিস্ট্যান্ট। Claude AI বিষয়ে সহজ ভাষায় উদাহরণসহ উত্তর দাও।"},
    {"role": "user", "content": "Claude কী?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    do_sample=True,
    temperature=0.6,
    top_p=0.85,
    repetition_penalty=1.2,
    pad_token_id=tokenizer.eos_token_id,
)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Limitations

Base Model Size: Qwen2.5-0.5B is small (494M params). Output quality is limited.
PDF Extraction Issues: The source PDF had Bengali font-encoding issues, so some compound characters were garbled in the training data.
Single-Epoch Training: To avoid overfitting and save compute, only 1 epoch was used.
CPU-Only Training: Training was done on CPU, which constrained model size and example count.

Training Metadata

See training_metadata.json for full training hyperparameters and statistics.

Files

model.safetensors — Full merged model (bf16, 943MB)
adapter_model.safetensors — LoRA adapter only (35MB) — use this with the base model for faster loading
adapter_config.json — LoRA configuration
tokenizer.json — Tokenizer
training_metadata.json — Training hyperparameters

License

Apache 2.0 (inherited from Qwen2.5)

Acknowledgments

Base model: Qwen Team (Qwen2.5-0.5B)
Training data: "ক্লড এআই মাস্টারি" book by Sajid Ahmed, published by Shehzeen Publications
LoRA fine-tuning: PEFT library by HuggingFace

Downloads last month: -

Safetensors

Model size

0.5B params

Tensor type

BF16

Model tree for The-JDdev/Qwen2.5-0.5B-Bengali-Claude-Assistant

Base model

Qwen/Qwen2.5-0.5B

Adapter

(419)

this model