Instructions to use Abhinav-Tyagi/synapse-slm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Abhinav-Tyagi/synapse-slm with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Abhinav-Tyagi/synapse-slm", filename="synapse-trained.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Abhinav-Tyagi/synapse-slm with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Abhinav-Tyagi/synapse-slm # Run inference directly in the terminal: llama-cli -hf Abhinav-Tyagi/synapse-slm
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Abhinav-Tyagi/synapse-slm # Run inference directly in the terminal: llama-cli -hf Abhinav-Tyagi/synapse-slm
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Abhinav-Tyagi/synapse-slm # Run inference directly in the terminal: ./llama-cli -hf Abhinav-Tyagi/synapse-slm
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Abhinav-Tyagi/synapse-slm # Run inference directly in the terminal: ./build/bin/llama-cli -hf Abhinav-Tyagi/synapse-slm
Use Docker
docker model run hf.co/Abhinav-Tyagi/synapse-slm
- LM Studio
- Jan
- vLLM
How to use Abhinav-Tyagi/synapse-slm with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Abhinav-Tyagi/synapse-slm" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Abhinav-Tyagi/synapse-slm", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Abhinav-Tyagi/synapse-slm
- Ollama
How to use Abhinav-Tyagi/synapse-slm with Ollama:
ollama run hf.co/Abhinav-Tyagi/synapse-slm
- Unsloth Studio new
How to use Abhinav-Tyagi/synapse-slm with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Abhinav-Tyagi/synapse-slm to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Abhinav-Tyagi/synapse-slm to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Abhinav-Tyagi/synapse-slm to start chatting
- Pi new
How to use Abhinav-Tyagi/synapse-slm with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Abhinav-Tyagi/synapse-slm
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Abhinav-Tyagi/synapse-slm" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Abhinav-Tyagi/synapse-slm with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Abhinav-Tyagi/synapse-slm
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Abhinav-Tyagi/synapse-slm
Run Hermes
hermes
- Docker Model Runner
How to use Abhinav-Tyagi/synapse-slm with Docker Model Runner:
docker model run hf.co/Abhinav-Tyagi/synapse-slm
- Lemonade
How to use Abhinav-Tyagi/synapse-slm with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Abhinav-Tyagi/synapse-slm
Run and chat with the model
lemonade run user.synapse-slm-{{QUANT_TAG}}List all available models
lemonade list
Synapse SLM โ Personalized Offline AI Assistant
Built by Abhinav Tyagi
๐ GitHub โข ๐ LinkedIn โข ๐ง Live Demo
What is Synapse SLM?
Synapse SLM is a QLoRA fine-tuned Llama-3.2-3B model optimized for:
- Hinglish (Hindi-English code-switching) conversations
- Offline, CPU-only inference via 4-bit GGUF quantization
- Context-aware responses via an offline RAG pipeline
- Persona-consistent, instruction-tuned behavior
This is not a wrapper around an API โ it runs fully locally, inside ~4GB RAM, at ~45 tokens/sec on CPU.
Model Details
| Property | Value |
|---|---|
| Base Model | Llama-3.2-3B-Instruct |
| Fine-tuning Method | QLoRA (rank=16, alpha=32) |
| Quantization | 4-bit GGUF via llama.cpp |
| Inference Speed | ~45 tokens/sec (CPU-only) |
| RAM Footprint | ~4GB |
| Training Data | 3,500+ Hinglish instruction samples |
| Languages | English, Hindi, Hinglish |
| Deployment | Docker containerized |
Key Innovations
1. QLoRA Fine-Tuning for Behavioral Shaping
Fine-tuned with rank=16, alpha=32 on 3,500+ Hinglish instruction samples. The training objective wasn't just language โ it was behavioral engineering: teaching the model when to explain, when to commit, and how to handle Hindi-English code-switching naturally.
Fine-tuning doesn't just improve answers โ it rewires behavior. The training signal determines what feels "safe" to the model: explain vs. hedge, commit vs. qualify, answer vs. avoid.
2. 4-bit GGUF Quantization + CPU Inference
Converted to GGUF format using llama.cpp. Achieves ~45 tokens/sec on CPU-only hardware within a ~4GB RAM footprint โ making it deployable on any laptop without GPU.
3. Offline RAG Pipeline
Implements embedding-based retrieval for local PDF/TXT ingestion. Supports context-aware responses without any cloud API dependency โ fully air-gapped.
4. Hinglish Code-Switching
Trained specifically on Hindi-English mixed language patterns. Handles natural Hinglish input without requiring language detection or preprocessing.
Behavioral Study: How Fine-Tuning Changes Model Behavior
During development, Abhinav Tyagi trained two variants from the same base model to study behavioral drift:
- Model A (Synapse) โ optimized for clarity, explanation, and usefulness
- Model B (Reflection-Heavy) โ trained to emphasize uncertainty, limits, and caution
Key finding: Same architecture. Same tokenizer. Same base weights. Only the training signal differed โ yet the behavioral output was completely different.
Model B wasn't hallucinating. It was over-aligned. And still useless.
Alignment without usability collapses into abstraction. Reasoning without explanation helps no one.
This study shaped Synapse's training philosophy: reasoning must serve explanation, not replace it.
Usage
With llama.cpp (Recommended for CPU)
# Install llama.cpp
pip install llama-cpp-python
# Run inference
from llama_cpp import Llama
llm = Llama(
model_path="synapse-slm-q4.gguf",
n_ctx=2048,
n_threads=8
)
response = llm(
"Bhai, explain karo gradient descent kya hota hai",
max_tokens=512,
temperature=0.7
)
print(response['choices'][0]['text'])
With Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("Abhinav-Tyagi/synapse-slm")
model = AutoModelForCausalLM.from_pretrained(
"Abhinav-Tyagi/synapse-slm",
torch_dtype=torch.float16,
device_map="auto"
)
inputs = tokenizer("Explain neural networks simply", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Docker (Full Offline Setup)
git clone https://github.com/abhinavtyagi466/synapse-slm
cd synapse-slm
docker compose up
# Access at http://localhost:7860
Training Details
Base Model : meta-llama/Llama-3.2-3B-Instruct
Method : QLoRA
LoRA Rank : 16
LoRA Alpha : 32
Dataset Size : 3,500+ instruction pairs
Languages : English + Hindi + Hinglish
Quantization : 4-bit GGUF (llama.cpp)
Inference : ~45 tokens/sec on CPU
RAM : ~4GB footprint
Deployment : Docker containerized
RAG : Offline PDF/TXT ingestion via dense embeddings
Offline RAG Pipeline
Synapse includes a fully offline RAG system:
- Ingestion โ Drop any PDF or TXT file into the
/docsfolder - Embedding โ Documents are chunked and embedded locally (no API calls)
- Retrieval โ At query time, top-k relevant chunks are retrieved via FAISS
- Generation โ Retrieved context is injected into the prompt before generation
No internet required. No API keys. Fully private.
Performance
| Metric | Value |
|---|---|
| Inference Speed (CPU) | ~45 tokens/sec |
| RAM Usage | ~4GB |
| Quantization | 4-bit GGUF |
| Hinglish Fluency | Improved via targeted instruction tuning |
| Context Window | 2048 tokens |
About the Author
Abhinav Tyagi is an LLM Engineer specializing in fine-tuning, quantization, and deployment of production-ready AI systems. He also built:
- Synapse-124M โ A 124M parameter transformer built from scratch with GQA, MoE, Sliding Window Attention, NTK-RoPE, SwiGLU, and a custom BPE tokenizer
- Synapse Wingman โ A full agentic AI desktop assistant controlled via Telegram with vision, WhatsApp automation, and multi-step task execution
- Smart Contextual RAG Chatbot โ Hybrid RAG with CoVe (Chain of Verification), multi-query generation, FAISS, reducing cloud API costs by ~40%
- Psywarp โ Published research on multimodal cognitive AI framework for emotion and behavior modeling (DOI: 10.5281/zenodo.18182199)
๐ง abhinavtyagi5418@gmail.com
๐ GitHub
๐ผ LinkedIn
License
MIT License โ free to use, modify, and distribute with attribution.
"Building AI that works offline, works fast, and actually works."
โ Abhinav Tyagi
- Downloads last month
- 2
We're not able to determine the quantization variants.
Model tree for Abhinav-Tyagi/synapse-slm
Base model
meta-llama/Llama-3.2-3B-Instruct