Instructions to use KATHIR2006/Zenthi-AI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use KATHIR2006/Zenthi-AI with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="KATHIR2006/Zenthi-AI")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("KATHIR2006/Zenthi-AI")
model = AutoModelForCausalLM.from_pretrained("KATHIR2006/Zenthi-AI")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use KATHIR2006/Zenthi-AI with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "KATHIR2006/Zenthi-AI"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KATHIR2006/Zenthi-AI",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/KATHIR2006/Zenthi-AI

SGLang

How to use KATHIR2006/Zenthi-AI with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "KATHIR2006/Zenthi-AI" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KATHIR2006/Zenthi-AI",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "KATHIR2006/Zenthi-AI" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KATHIR2006/Zenthi-AI",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use KATHIR2006/Zenthi-AI with Docker Model Runner:
```
docker model run hf.co/KATHIR2006/Zenthi-AI
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Zenthi-AI OS: Agentic Multi-Model Small Language Model (SLM)

Zenthi-AI is a production-grade, custom fine-tuned Small Language Model (SLM) conversational assistant. It is optimized for high-speed, local-first execution and acts as the intent-routing brain and synthesis engine of the Zenthi-AI Multi-Model Operating System.

This repository hosts the merged, full-precision model weights.

🚀 Key Features

Base Foundation: Built on the highly capable Qwen/Qwen2.5-0.5B-Instruct.
Parameter-Efficient Finetuning: Optimized via QLoRA (4-bit quantization) on a merged, cleaned dataset of Alpaca, Dolly 15K, OpenHermes, UltraChat, and ShareGPT.
Agentic Orchestrator Routing: Tuned specifically to act as a Router and Planner Agent, classifying query intents with high accuracy (CODE, VISION, RAG, SEARCH, KNOWLEDGE, COMPLEX).
Quantization-Ready: Quantized to GGUF format for local deployment (quantized size under 500 MB).
Local RAG Integration: Built to work in tandem with local ChromaDB embedding vector stores.
Web Search Coordination: Designed to synthesize real-time context fetched from local SearXNG search clients.
Memory Management: Keeps a windowed session history for conversational continuity.

📊 Evaluation & Routing Performance

The model's semantic routing accuracy was benchmarked across 500 unique evaluation test queries (100 queries per intent category) running on a local GPU:

Overall Routing Accuracy: 72.60%
Average Latency: 651.54 ms per query

Intent Category	Accuracy (%)	Target Expert Model
CODE	100.00%	`qwen2.5-coder:3b`
VISION	100.00%	`riven/smolvlm:latest`
SEARCH	99.00%	`qwen2.5:1.5b-instruct`
RAG	43.00%	`qwen2.5:1.5b-instruct`
KNOWLEDGE	21.00%	`qwen2.5:1.5b-instruct`

💻 Local Usage & Integration

1. Ollama Deployment (GGUF)

To run Zenthi-AI locally in Ollama:

Create a Modelfile with the system prompt:

FROM zenthi-ai:latest
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM """I am Zenthi-AI OS, a production-grade Agentic Multi-Model AI Operating System. I deliver accurate, secure, maintainable, and production-ready solutions by coordinating specialized AI capabilities."""

Build and run:

ollama create Zenthi-AI -f Modelfile
ollama run Zenthi-AI

2. Python Transformers API

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "KATHIR2006/zenthi-ai"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# Start conversation
messages = [
    {"role": "system", "content": "You are Zenthi-AI OS, a production-grade Agentic Multi-Model AI Operating System."},
    {"role": "user", "content": "Explain photosynthesis simply."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.batch_decode(generated_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]
print(response)

🛠️ Fine-Tuned Expert Adapters

This repository also hosts the fine-tuned LoRA adapters for the specialized expert models of the Zenthi-AI OS:

1. Code Expert Adapters (`code-adapters/`)

Base Model: Qwen/Qwen2.5-Coder-3B-Instruct
Dataset: Custom programming and instruction dataset (1,200 training steps)
Final Loss: 0.1843
Usage: Optimized for React, Node.js, Python, MERN stack development, reviews, and refactoring.

2. Vision Expert Adapters (`vision-adapters/`)

Base Model: HuggingFaceTB/SmolVLM-Instruct
Dataset: Synthetic VQA shape and color recognition dataset (100 training steps)
Final Loss: 0.9077
Usage: Fine-tuned for OCR, visual question-answering, and image analysis.

⚖️ Licenses & Compliance

This project is dual-licensed:

LLM Model Weights & Adaptations: Licensed under the Apache License 2.0 (in compliance with the base Qwen2.5 license).
RAG Engine, Multi-Agent Framework, & Backend Codebase: Licensed under the MIT License.

Downloads last month: 77

Safetensors

Model size

0.5B params

Tensor type

F16

Model tree for KATHIR2006/Zenthi-AI

Base model

HuggingFaceTB/SmolLM2-1.7B

Quantized

HuggingFaceTB/SmolLM2-1.7B-Instruct

Quantized

HuggingFaceTB/SmolVLM-Instruct

Finetuned

(153)

this model

KATHIR2006
/

Zenthi-AI

You need to agree to share your contact information to access this model

Zenthi-AI OS: Agentic Multi-Model Small Language Model (SLM)

🚀 Key Features

📊 Evaluation & Routing Performance

💻 Local Usage & Integration

1. Ollama Deployment (GGUF)

2. Python Transformers API

🛠️ Fine-Tuned Expert Adapters

1. Code Expert Adapters (`code-adapters/`)

2. Vision Expert Adapters (`vision-adapters/`)

⚖️ Licenses & Compliance

Model tree for KATHIR2006/Zenthi-AI

Datasets used to train KATHIR2006/Zenthi-AI

Space using KATHIR2006/Zenthi-AI 1

You need to agree to share your contact information to access this model

Zenthi-AI OS: Agentic Multi-Model Small Language Model (SLM)

🚀 Key Features

📊 Evaluation & Routing Performance

💻 Local Usage & Integration

1. Ollama Deployment (GGUF)

2. Python Transformers API

🛠️ Fine-Tuned Expert Adapters

1. Code Expert Adapters (code-adapters/)

2. Vision Expert Adapters (vision-adapters/)

⚖️ Licenses & Compliance

Model tree for KATHIR2006/Zenthi-AI

Datasets used to train KATHIR2006/Zenthi-AI

Space using KATHIR2006/Zenthi-AI 1

1. Code Expert Adapters (`code-adapters/`)

2. Vision Expert Adapters (`vision-adapters/`)