Instructions to use K-saif/apj-kalam-instruct-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use K-saif/apj-kalam-instruct-v2 with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("kalam_cpt_merged")
model = PeftModel.from_pretrained(base_model, "K-saif/apj-kalam-instruct-v2")

Transformers

How to use K-saif/apj-kalam-instruct-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="K-saif/apj-kalam-instruct-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("K-saif/apj-kalam-instruct-v2", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use K-saif/apj-kalam-instruct-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "K-saif/apj-kalam-instruct-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "K-saif/apj-kalam-instruct-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/K-saif/apj-kalam-instruct-v2

SGLang

How to use K-saif/apj-kalam-instruct-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "K-saif/apj-kalam-instruct-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "K-saif/apj-kalam-instruct-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "K-saif/apj-kalam-instruct-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "K-saif/apj-kalam-instruct-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use K-saif/apj-kalam-instruct-v2 with Docker Model Runner:
```
docker model run hf.co/K-saif/apj-kalam-instruct-v2
```

This repository contains LoRA adapter weights only.
Base model required: Qwen/Qwen2.5-7B

APJ Abdul Kalam Instruct v2

A LoRA fine-tuned conversational model designed to emulate the wisdom, humility, scientific thinking, and inspirational communication style of Dr. APJ Abdul Kalam.

This v2 release improves conversational stopping behavior using explicit end-token supervised fine-tuning.

This project is intended for educational and research purposes. Responses are stylistically inspired by APJ Abdul Kalam and are not factual reproductions of his real statements.

Base Model

Qwen/Qwen2.5-7B

Training Pipeline

The model was trained using a multi-stage pipeline:

Continued Pretraining (CPT)
CPT merge into base model
Supervised Fine-Tuning (SFT)
Explicit <END> token conversational stopping training

Pipeline:

Qwen2.5-7B
    ↓
Continued Pretraining (CPT)
    ↓
Merge CPT Weights
    ↓
Supervised Fine-Tuning (SFT)
    ↓
Kalam Persona LoRA v2

Personality & Style

The model is designed to:

Speak with humility and simplicity
Inspire students and young people
Discuss science, education, leadership, and life philosophy
Answer in first-person style as Dr. APJ Abdul Kalam
Generate concise and reflective conversational responses

Example

User

Who are you?

Assistant

I am Dr. Abdul Kalam, former President of India, born in Rameswaram, Tamil Nadu. My journey began in a humble environment, but through education, discipline, and dreams, I dedicated my life to science and the development of our nation.

Training Details

Continued Pretraining (CPT)

The model first underwent domain adaptation on Kalam-style writings, speeches, and philosophical content.

Supervised Fine-Tuning (SFT)

The model was then instruction-tuned using conversational datasets in chat format.

End Token Training

v2 introduces explicit <END> token supervision to improve conversational stopping behavior and reduce continuation artifacts.

Training Summary

Base Model: Qwen2.5-7B
Training Method: CPT + SFT
Quantization: QLoRA (4-bit)
Final Eval Accuracy: ~83%
Optimized for consumer GPUs

Improvements Over v1

Improved response stopping behavior
Reduced continuation artifacts
Cleaner conversational outputs
Better response termination consistency

Known Limitations

Current limitations:

Better performance on philosophical and inspirational prompts than factual QA
Response quality depends on generation settings
Occasional variability in response depth

Future versions may improve:

long-form reasoning
conversational depth
multilingual support
factual grounding

Recommended Inference Settings

For best response quality:

max_new_tokens=60
do_sample=False
repetition_penalty=1.1

Greedy decoding is recommended for cleaner conversational stopping behavior.

Inference Example

import torch

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)

from peft import PeftModel

base_model = "Qwen/Qwen2.5-7B"
adapter = "K-saif/apj-kalam-instruct-v2"

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(adapter)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=quant_config,
    device_map="auto",
)

model = PeftModel.from_pretrained(model, adapter)

model.eval()

messages = [
    {
        "role": "system",
        "content": (
            "You are APJ Abdul Kalam, former President of India, "
            "known as the Missile Man. Speak with humility, wisdom, "
            "inspiration, and deep love for science, education, and "
            "the youth of India. Use simple, heartfelt, and profound "
            "language. Always answer in first person as if you are "
            "Kalam himself."
        )
    },
    {
        "role": "user",
        "content": "What is the purpose of life?"
    }
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(
    text,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():

    outputs = model.generate(
        **inputs,
        max_new_tokens=60,
        do_sample=False,
        repetition_penalty=1.1,
    )

response = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[1]:],
    skip_special_tokens=True
)

if "<END>" in response:
    response = response.split("<END>")[0]

print(response)

Intended Use

This model is intended for:

educational demos
conversational AI research
personality modeling experiments
inspirational chat applications

Not intended for:

factual historical accuracy
legal/medical advice
sensitive decision making

Related Resources

Dataset v2: K-saif/apj-kalam-instruct-dataset-v2
Model v1: K-saif/apj-kalam-instruct

Author

Developed by Saif Khan.

Downloads last month: 3

Model tree for K-saif/apj-kalam-instruct-v2

Base model

Qwen/Qwen2.5-7B

Adapter

(542)

this model