Spaces:
No application file
title: README
emoji: π
colorFrom: blue
colorTo: red
sdk: docker
pinned: false
___ _ ____ _
|_ | |_| | _ \ __ _ ___| | __
| | | | | |_) / _` |/ __| |/ /
/\__/ / | | | _ < (_| | (__| <
\____/ |_| |_| \_\__,_|\___|_|\_\
Java Intelligent Rack System with PyTorch models with ML script and chatbots
JiRack LLM by CMS Manhattan with Models open source code
Web services with OnnxRuntime Launch Inference images with Open AI and Ollama REST AP support in docker hub.
OnnxRuntime suppots to run models on many GPU cards and Data Centers
Docker repo : https://hub.docker.com/u/cmsmanhattan
π₯ CMSManhattan - Frontier Ternary Neural Networks
Creating the world's first 405B parameter ternary model
π― Mission
Democratizing access to massive language models through extreme efficiency. Training state-of-the-art LLMs on accessible hardware using 1.58-bit (ternary) precision.
π JiRackTernary Series
A family of efficient large language models based on BitNet architecture with ternary weights {-1, 0, 1}.
π Public Models
| Model | Parameters | Size | Status | Link |
|---|---|---|---|---|
| JiRackTernary_1b | 1B | ~350MB | β Released | Download |
| JiRackTernary_8b | 8B | ~3GB | β Released | Download |
π Private Models (In Training)
| Model | Parameters | Size | Status | ETA |
|---|---|---|---|---|
| JiRackTernary_70b | 70B | ~25GB | π§ Training (Step 15,600+) | Q2 2026 |
| JiRackTernary_405b | 405B | ~115GB | π₯ WORLD'S FIRST 405B TERNARY | Q3 2026 |
β‘ Key Innovations
7x Compression
Traditional LLaMA-3 70B: ~140 GB (FP16)
JiRackTernary 70B: ~25 GB (1.58-bit)
Compression ratio: 7x smaller! π₯
Accessible Training
- 70B trained on single A100 80GB (Colab Pro+ - $50/month)
- 405B trained on single H200 141GB (Colab Enterprise)
- Novel layer-by-layer training approach
- No supercomputer clusters required!
Production-Ready Architecture
- LLaMA-based with BitLinear layers
- Ultra-lean memory offloading
- 4-in-1 weight packing
- Optimized for inference speed
π¬ Technical Highlights
Architecture Details
- Base: LLaMA-3 architecture
- Precision: 1.58-bit ternary weights {-1, 0, 1}
- Layers: Custom JiRackBitLinear with weight packing
- Normalization: RMSNorm
- Training: Layer-by-layer with gradient accumulation
Training Infrastructure
70B Model:
βββ Hardware: A100 80GB (Colab Pro+)
βββ Method: Layer-by-layer training
βββ Batch size: 1 (micro)
βββ Sequence length: 768 tokens
βββ Cost: ~$50/month
405B Model:
βββ Hardware: H200 141GB (Colab Enterprise)
βββ Method: Advanced layer-by-layer
βββ Optimized for massive scale
βββ World's first 405B ternary model π
π Performance
Model Comparison
| Model | Size | Precision | Memory | Training Cost |
|---|---|---|---|---|
| LLaMA-3 70B | 140GB | FP16 | Massive cluster | $$$$$$ |
| LLaMA-3 70B (4-bit) | 35GB | 4-bit | 2-4x A100 | N/A (PTQ) |
| JiRackTernary 70B | 25GB | 1.58-bit | 1x A100 | $150-200 |
Current Training Status (Updated: 2026-02-09)
70B Model:
- Step: 15,600+
- Loss: ~7-9
- PPL: ~3,000-5,000
- Status: Early training phase (target: 100k+ steps)
405B Model:
- Status: Active training on H200
- Target: World's first converged 405B ternary model
- Timeline: Q3 2026 estimated completion
π Research & Publications
Upcoming
π Technical Paper (In Preparation)
- Title: "JiRackTernary-405B: Scaling Ternary Neural Networks to 405 Billion Parameters"
- Target: NeurIPS 2026 / ICML 2027
π Benchmark Suite
- MMLU, HellaSwag, HumanEval
- Comparison with LLaMA-3, Mixtral, DeepSeek
- Efficiency metrics (inference speed, memory)
π€ Open Source Release
- Training code & documentation
- Layer-by-layer methodology
- Reproducibility guidelines
π Why This Matters
For Researchers
β
Train massive models without supercomputers
β
Reproduce frontier research on Colab
β
Enable new compression research directions
For Industry
β
Deploy 405B-class models on fewer GPUs
β
Faster inference with ternary operations
β
Lower hosting costs (7x smaller)
For Community
β
Democratization of large language models
β
Accessible AI for everyone
β
Open research methodology
π Learn More
Blog Posts (Coming Soon)
- π§ "Training 70B on a Single A100: Our Layer-by-Layer Approach"
- π "Ternary Weights at Scale: Lessons from 15,000 Steps"
- π "Road to 405B: The Journey to World's First Ternary Mega-Model"
Technical Documentation
- π Architecture deep-dive
- π οΈ Training methodology
- π» Code examples & tutorials
π€ Community
Get Involved
- β Star our models on HuggingFace
- π¬ Join discussions in model repos
- π Report issues or suggestions
- π§ Contact: [Your contact method]
Citation
@misc{jiracternary2026,
author = {CMSManhattan (kgrabko)},
title = {JiRackTernary: Scaling Ternary Neural Networks to 405 Billion Parameters},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/CMSManhattan}
}
π Recognition
World's First 405B Ternary Model π₯
Proving that massive language models can be trained efficiently on accessible hardware
π Follow Progress
Track our journey:
- π Regular updates in model repos
- π Training metrics & visualizations
- π― Milestone announcements
- π Research publications
Making AI accessible, one ternary weight at a time. β¨
Last updated: 2026-02-09
PHONE 516-777-0945
Demo JiRack LLM and CMS Manhattan RAG System
Download RAG System
git clone https://grabko1@bitbucket.org/cmsmanhattan/rag.git
Deployment kit on Docker or Kubernetes with API Gateway and Service Discovery by Microservice Architecture https://www.youtube.com/watch?v=M4Q8_Dr35Cc
Deployment script for RAG System: https://bitbucket.org/cmsmanhattan/rag
Trademark infomation : https://uspto.report/TM/90579072
- SERIAL NUMBER 90579072