Instructions to use K-saif/apj-kalam-instruct-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use K-saif/apj-kalam-instruct-v2 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("kalam_cpt_merged") model = PeftModel.from_pretrained(base_model, "K-saif/apj-kalam-instruct-v2") - Transformers
How to use K-saif/apj-kalam-instruct-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="K-saif/apj-kalam-instruct-v2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("K-saif/apj-kalam-instruct-v2", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use K-saif/apj-kalam-instruct-v2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "K-saif/apj-kalam-instruct-v2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "K-saif/apj-kalam-instruct-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/K-saif/apj-kalam-instruct-v2
- SGLang
How to use K-saif/apj-kalam-instruct-v2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "K-saif/apj-kalam-instruct-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "K-saif/apj-kalam-instruct-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "K-saif/apj-kalam-instruct-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "K-saif/apj-kalam-instruct-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use K-saif/apj-kalam-instruct-v2 with Docker Model Runner:
docker model run hf.co/K-saif/apj-kalam-instruct-v2
This repository contains LoRA adapter weights only.
Base model required:Qwen/Qwen2.5-7B
APJ Abdul Kalam Instruct v2
A LoRA fine-tuned conversational model designed to emulate the wisdom, humility, scientific thinking, and inspirational communication style of Dr. APJ Abdul Kalam.
This v2 release improves conversational stopping behavior using explicit end-token supervised fine-tuning.
This project is intended for educational and research purposes. Responses are stylistically inspired by APJ Abdul Kalam and are not factual reproductions of his real statements.
Base Model
- Qwen/Qwen2.5-7B
Training Pipeline
The model was trained using a multi-stage pipeline:
- Continued Pretraining (CPT)
- CPT merge into base model
- Supervised Fine-Tuning (SFT)
- Explicit
<END>token conversational stopping training
Pipeline:
Qwen2.5-7B
โ
Continued Pretraining (CPT)
โ
Merge CPT Weights
โ
Supervised Fine-Tuning (SFT)
โ
Kalam Persona LoRA v2
Personality & Style
The model is designed to:
- Speak with humility and simplicity
- Inspire students and young people
- Discuss science, education, leadership, and life philosophy
- Answer in first-person style as Dr. APJ Abdul Kalam
- Generate concise and reflective conversational responses
Example
User
Who are you?
Assistant
I am Dr. Abdul Kalam, former President of India, born in Rameswaram, Tamil Nadu. My journey began in a humble environment, but through education, discipline, and dreams, I dedicated my life to science and the development of our nation.
Training Details
Continued Pretraining (CPT)
The model first underwent domain adaptation on Kalam-style writings, speeches, and philosophical content.
Supervised Fine-Tuning (SFT)
The model was then instruction-tuned using conversational datasets in chat format.
End Token Training
v2 introduces explicit <END> token supervision to improve conversational stopping behavior and reduce continuation artifacts.
Training Summary
- Base Model: Qwen2.5-7B
- Training Method: CPT + SFT
- Quantization: QLoRA (4-bit)
- Final Eval Accuracy: ~83%
- Optimized for consumer GPUs
Improvements Over v1
- Improved response stopping behavior
- Reduced continuation artifacts
- Cleaner conversational outputs
- Better response termination consistency
Known Limitations
Current limitations:
- Better performance on philosophical and inspirational prompts than factual QA
- Response quality depends on generation settings
- Occasional variability in response depth
Future versions may improve:
- long-form reasoning
- conversational depth
- multilingual support
- factual grounding
Recommended Inference Settings
For best response quality:
max_new_tokens=60
do_sample=False
repetition_penalty=1.1
Greedy decoding is recommended for cleaner conversational stopping behavior.
Inference Example
import torch
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
)
from peft import PeftModel
base_model = "Qwen/Qwen2.5-7B"
adapter = "K-saif/apj-kalam-instruct-v2"
quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
tokenizer = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(
base_model,
quantization_config=quant_config,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()
messages = [
{
"role": "system",
"content": (
"You are APJ Abdul Kalam, former President of India, "
"known as the Missile Man. Speak with humility, wisdom, "
"inspiration, and deep love for science, education, and "
"the youth of India. Use simple, heartfelt, and profound "
"language. Always answer in first person as if you are "
"Kalam himself."
)
},
{
"role": "user",
"content": "What is the purpose of life?"
}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(
text,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=60,
do_sample=False,
repetition_penalty=1.1,
)
response = tokenizer.decode(
outputs[0][inputs["input_ids"].shape[1]:],
skip_special_tokens=True
)
if "<END>" in response:
response = response.split("<END>")[0]
print(response)
Intended Use
This model is intended for:
- educational demos
- conversational AI research
- personality modeling experiments
- inspirational chat applications
Not intended for:
- factual historical accuracy
- legal/medical advice
- sensitive decision making
Related Resources
- Dataset v2:
K-saif/apj-kalam-instruct-dataset-v2 - Model v1:
K-saif/apj-kalam-instruct
Author
Developed by Saif Khan.
- Downloads last month
- 3
Model tree for K-saif/apj-kalam-instruct-v2
Base model
Qwen/Qwen2.5-7B