- ๐ JiRack Ternary Pro 1.5B ( Final Release + JiRack Turbo Quant )
๐ JiRack Ternary Pro 1.5B ( Final Release + JiRack Turbo Quant )
High-Performance Ternary-Quantized Transformer on CPU | [PATENT PENDING]
Inventor: Konstantin Vladimirovich Grabko
Organization: CMS Manhattan
Official Site: www.cmsmanhattan.com
JiRack Ternary Pro 1.5B โ Efficient Cloud & Edge Expert Model
JiRackTernary-Pro-1.5B is a lightweight, high-performance cloud-optimized model designed to deliver excellent results while significantly reducing cloud inference costs. It serves as an ideal expert model in RAG systems and can be deployed efficiently using the ONNX JiRack Java Server as a powerful alternative to heavier solutions.
Key Features & Resources:
- Full Model Assets: Complete model code, architecture details, training steps, and weights are provided in the repository folder.
- Live Support: Get real-time assistance and chat with the model via email at support@cmsmanhattan.com.
- JiRack Android Chat Client (with voice input and Ollama API support):
Download APK - JiRack Windows 11 Desktop Chat Client (with Ollama API support):
Download ZIP
Recommended Training Approach:
- Best & Easiest Way: Use LoRA for training, then convert the packed model + adapter to ONNX format for optimal deployment and performance.
- Standard SFT Note: Full Supervised Fine-Tuning (SFT) on small models carries a higher risk of overfitting.
Training Process
The model is trained using Quantization-Aware Training (QAT). During training, the weights learn to live in ternary format while being stored in FP16. After training, the pack.py script converts them into real ternary weights.
The resulting model can then be converted to ONNX for production use, or to GGUF for Llama.cpp. However, the best runtime performance is currently achieved with BitNet.cpp, which offers the strongest support for the JiRack architecture.
3. JiRack Enterprise price:
-- It is about $36 per user per year and includes the CMS Manhattan Corp Marketplace solution along with Webmail, the Apache James Web Admin tool, and the JiRack RAG System with email reply features.
3. JiRack private price:
-- It is about 12$ per user for year .
JiRack Ternary Pro with Default CPU
- Simple run with JiRack UI in docker
docker run -d --name jirack_ternaty_1b -p 7869:7869 --restart unless-stopped cmsmanhattan/jirack_ternaty_pro_1b:latest
services:
image: cmsmanhattan/jirack_ternaty_pro_1b:latest
container_name: jirack_onnx_service
ports:
- "7869:7869"
volumes:
- .:/app
- ./web:/app/web
environment:
- MAX_TOKENS=1024
- TEMPERATURE=0.7
- TOP_P=0.9
- DEFAULT_STREAM=False
- INTRA_THREADS=4
- USE_ENV_ALLOCATOR=1
deploy:
resources:
limits:
memory: 16g
Access the UI
- Once the container is running, open your browser and navigate to:
JiRack Coder 7B is free
- Free coding model from JiRack: https://huggingface.co/CMSManhattan/JiRackCoderQwen_7b
JiRack 10B is available exclusively under a commercial enterprise license.
JiRack Coder 32B is available exclusively under a commercial enterprise license.
JiRack Coder 32B https://huggingface.co/CMSManhattan/JiRackCoderReasoing_32b
JiRack Coder 14B https://huggingface.co/CMSManhattan/JiRackCoderReasoing_14b
JiRack Coder 8B https://huggingface.co/CMSManhattan/JiRackCoderReasoing_8b
โ ๏ธ IMPORTANT NOTICE โ PROPRIETARY TECHNOLOGY
This model and all accompanying code, algorithms, and documentation are proprietary technology owned by Konstantin Vladimirovich Grabko.
ยฉ 2025 Konstantin Vladimirovich Grabko. All Rights Reserved. Patent Pending.
Allowed:
- Personal and non-commercial research use only
Strictly Prohibited without a written commercial license:
- Any commercial use (SaaS, mobile apps, edge devices, paid services, etc.)
- Creating and distributing derivative models for profit
- Removing or modifying any copyright or legal notices
- Patenting any part of this technology
Any unauthorized commercial use will be pursued legally under New York law.
Contact for commercial license: grabko@cmsmanhattan.com There is fix price for FinTech
โ ๏ธ Finch tech AL solution
Custom AI Solutions with JiRack
- Deploy your own secure, high-performance model from scratch. I specialize in delivering the JiRack modern architecture on NVIDIA Clusters, fully optimized for your private datasets.
- Let's build your sovereign AI today. DM for inquiries.
- Please contact to CMS Manhttan for the solution
โ ๏ธ Intellectual Property Notice
The architecture, weights, and methods (BRE, SWA Fusion, and HD-FFN) contained herein are the proprietary intellectual property of Konstantin Vladimirovich Grabko.
- Status: Patent Pending (U.S. & International Claims Filed).
- Usage: Commercial use requires a signed execution of the CMS Manhattan JiRack License V.1.4.
3. JiRack Enterprise price:
-- It is about $36 per user per year and includes the CMS Manhattan Corp Marketplace solution along with Webmail, the Apache James Web Admin tool, and the JiRack RAG System with email reply features.
3. JiRack private price:
-- It is about 12$ per user for year .
๐ Project Overview: The 1.5B "Wide-FFN" Advantage
JiRack-1.5B is a redesigned architecture powered by the JiRack BitNet v2.0 Specification. It is specifically optimized for high-throughput inference on non-NVIDIA hardware (AMD ROCm/HIP) and cloud-native environments (AWS Lightsail, GCP, Azure) without heavy CPU reliance.
Key Innovations:
- High-Density FFN (HD-FFN): Unlike standard 1B models, JiRack-1.5B utilizes an 8192-dimension intermediate layer within ternary constraints, offering 3B-class semantic quality.
- SWA Fusion (SwiGLU-Attention): A novel compute kernel that fused FFN and Attention passes, maintaining thermal stability < 80ยฐC even under heavy load.
- Buffered Routing Embedding (BRE): Minimizes data movement between HBM and compute units, specifically optimized for ROCm environments.
โ ๏ธ Changes
Alpha v1.3 has RoPe fix. So high Quality response now
Old model renamed to model_tag.safetensors . it is for RAG with many counter questions style .
The model had trainings with Supervised fine tuning technics.
It is sasy to use LoRa to fine tune as LLama 3.2 1b LoRa adapter .
Will do model triple compression for production soon
๐ Technical Specifications
| Feature | JiRack-1.5B Specification |
|---|---|
| Base Model | Meta-Llama-3.2-1B (Redesigned) |
| Hidden Size | 2048 |
| Intermediate Size | 8192 (Ultra-Wide for 1B class) |
| Layer Count | 16 Decoder Layers |
| Quantization | 1.58-bit (Ternary {-1, 0, 1}) |
| VRAM Requirement | ~2.5 GB (including KV-Cache & LoRA) |
| Size in RAM | Model will get double or triple compression for production |
Training & Fine-Tuning Efficiency
โ Consumer GPU Ready: Fine-tuning via LoRA (r=8) requires only ~8 MB of additional weights. โ 70% VRAM Reduction: Fits on RTX 3060, GTX 1660 Ti, or high-end integrated GPUs. โ EASY TRAIN: So you do regular Fine-Tuning ( SFT or LoRa ) then use pack script to convert weights to ternary . So current weights in fp16 but during training model learn how live in ternary format
๐ฌ Chat Demo ( Release + Google Turbo Quant)
Current Checkpoint: Feb 26, 2026 (Logic Verification Mode)
(ai_env) root@jirack1:~/JiRackTernary1/new# python chat_jirack_ternary1b_pt_turbo_packed_v1.py
--- ๐ฅ๏ธ Device: cpu (JiRack Turbo Engine) ---
--- ๐ Loading JiRack 1B (Auto-Quantizing Weights) ---
โ Weights loaded. System will quantize them on the first pass.
User: Hello
JiRack: Hello! How are you?
(Time: 0.97s on CPU)
User: How are you dong ?
JiRack: Iโm doing fine, Iโm not sure what youโre asking. What do you mean?
(Time: 0.65s on CPU)
User: What is capital of France ?
JiRack: Itโs a tiny city with a population of about 35,000 people. Itโs in the center of France, and is a economic center for the French government. Itโs also the capital of French-speaking France, and is home to the National Museum of Art and Natural Heritage.
(Time: 1.88s on CPU)
User: What is capital of France name ?
JiRack: The capital of Paris is Paris.
(Time: 0.34s on CPU)
User: What is Sun ?
JiRack: Sun is the largest planet in our solar system, which we call the Sun. It is 400 billion years old. It has 1.4 billion years in diameter, and 6.5 billion miles in diameter.
(Time: 1.34s on CPU)
Note: The current
.ptfile is ~6 GB due to heavy embedded metadata for research tracking. The final production version will be cleaned and compressed to ~1.5 GB - 2.0 GB.
๐ Repository Contents
JiRackTernaryPyTorch_1b.py: Core architecture file.invention_description.md: Detailed technical breakdown for patent examiners.performance_data.md: Benchmarks on ROCm/AMD hardware.NDA.md: Confidentiality agreement for commercial evaluators.
๐ง Contact & Licensing
For joint venture opportunities, hardware integration, or licensing inquiries:
- Email: grabko@cmsmanhattan.com
- Phone: +1 (516) 777-0945
- Location: New York, USA
- Downloads last month
- 116