SYNAXIM Format Engine

 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•—   β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—  β–ˆβ–ˆβ•—β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ–ˆβ•—
 β–ˆβ–ˆβ•”β•β•β•β•β•β•šβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β•šβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ•‘
 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β–ˆβ–ˆβ•”β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ•”β• β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘
 β•šβ•β•β•β•β–ˆβ–ˆβ•‘  β•šβ–ˆβ–ˆβ•”β•  β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•”β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘
 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β• β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β•šβ•β• β–ˆβ–ˆβ•‘
 β•šβ•β•β•β•β•β•β•   β•šβ•β•   β•šβ•β•  β•šβ•β•β•β•β•šβ•β•  β•šβ•β•β•šβ•β•  β•šβ•β•β•šβ•β•β•šβ•β•     β•šβ•β•

Llama 3.2 1B β€” SYNAXIM .symb Format (INT4)

The first model converted to the SYNAXIM proprietary .symb inference format.

This is Meta's Llama-3.2-1B converted to SYNAXIM's framework-free .symb binary format with INT4 per-group quantization. It runs entirely through the SYNAXIM Symbiotic State Engine β€” no PyTorch, no Transformers library, no KV-Cache.


Quick Start

1. Install SYNAXIM

pip install grrn-inference

Or install from source:

git clone https://github.com/GRRN-MAKER/SYNAXIM.git
cd SYNAXIM
pip install -e .

2. Download This Model

# Using huggingface-cli
huggingface-cli download GRRNNOB/SYNAXIM --local-dir ./llama-1b-symb

# Or using Python
from huggingface_hub import snapshot_download
snapshot_download("GRRNNOB/SYNAXIM", local_dir="./llama-1b-symb")

3. Run Inference

from grrn_inference import GRRNModel

# Load the model
model = GRRNModel.from_pretrained("./llama-1b-symb")

# Generate text
result = model.generate("The meaning of life is", max_tokens=50, temperature=0.7)
print(result.text)
print(f"Speed: {result.tokens_per_second} tok/s")

4. Chat (OpenAI-Style)

result = model.chat([
    {"role": "user", "content": "Explain quantum computing simply."}
], max_tokens=200)

print(result.choices[0].message["content"])

5. Streaming

for chunk in model.stream("Once upon a time", max_tokens=100):
    print(chunk.text, end="", flush=True)

6. Serve as OpenAI API

from grrn_inference import serve
serve("./llama-1b-symb", port=8000, api_key="my-secret-key")

Then connect with any OpenAI client:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="my-secret-key")
response = client.chat.completions.create(
    model="llama-1b-symb",
    messages=[{"role": "user", "content": "Hello!"}]
)

Model Details

Property Value
Base Model meta-llama/Llama-3.2-1B
Architecture LlamaForCausalLM (Dense)
Parameters 1.24B
Hidden Size 2048
Layers 16
Attention Heads 32 Q / 8 KV (GQA 4:1)
Head Dim 64
Vocabulary 128,256 tokens
Intermediate Size 8,192
Activation SiLU
RoPE ΞΈ 500,000
Tied Embeddings Yes (lm_head = embed_tokens.T)
Format .symb (SYNAXIM proprietary binary)
Quantization INT4, group_size=128
Compression 3.8Γ— vs FP16
Total Size ~674 MB

.symb File Structure

llama-1b-symb/
β”œβ”€β”€ config.symb.json              # Architecture + quantization config
β”œβ”€β”€ embeddings.symb               # Token embeddings (INT4, 66 MB)
β”œβ”€β”€ final_norm.symb               # Final RMSNorm (FP16, 4 KB)
β”œβ”€β”€ tokenizer/                    # Tokenizer files
β”‚   β”œβ”€β”€ tokenizer.json
β”‚   β”œβ”€β”€ tokenizer_config.json
β”‚   └── special_tokens_map.json
└── layers/
    β”œβ”€β”€ layer_00/
    β”‚   β”œβ”€β”€ attn_q.symb           # Q projection (INT4)
    β”‚   β”œβ”€β”€ attn_k.symb           # K projection (INT4)
    β”‚   β”œβ”€β”€ attn_v.symb           # V projection (INT4)
    β”‚   β”œβ”€β”€ attn_o.symb           # Output projection (INT4)
    β”‚   β”œβ”€β”€ mlp_gate.symb         # SwiGLU gate (INT4)
    β”‚   β”œβ”€β”€ mlp_up.symb           # SwiGLU up (INT4)
    β”‚   β”œβ”€β”€ mlp_down.symb         # SwiGLU down (INT4)
    β”‚   β”œβ”€β”€ norm_attn.symb        # Pre-attention RMSNorm (FP16)
    β”‚   └── norm_mlp.symb         # Pre-MLP RMSNorm (FP16)
    β”œβ”€β”€ layer_01/
    β”‚   └── ...
    └── layer_15/
        └── ...

How SYNAXIM Works

SYNAXIM replaces the standard Transformer inference paradigm:

Standard Transformer SYNAXIM
Memory Model KV-Cache (grows with context) O(1) M matrix (fixed size)
Attention QΒ·K^TΒ·V with stored K,V pairs Sigmoid-gated associative memory update
Runtime PyTorch + CUDA NumPy only (zero framework)
Weight Format safetensors (open) .symb (proprietary INT4 bitpacked)
Install Size ~2 GB (PyTorch + deps) < 5 MB

The Symbiotic Gate

Instead of growing a KV cache, SYNAXIM maintains a fixed-size state matrix M:

M_{t+1} = Οƒ(gate_score) Β· M_t + (1 - Οƒ(gate_score)) Β· key βŠ— value
output  = query Β· M_{t+1}
  • M is (D Γ— D) β€” fixed size, never grows, regardless of sequence length
  • Gate score computed from QΒ·K similarity controls retention vs. new imprint
  • Memory: O(DΒ²) fixed vs O(nΒ·d) growing KV-cache

Device Selection

# Auto-detect (uses Numba if available, else pure NumPy)
model = GRRNModel.from_pretrained("./llama-1b-symb", device="cpu")

# Force Numba-accelerated CPU (requires: pip install grrn-inference[cpu])
model = GRRNModel.from_pretrained("./llama-1b-symb", device="cpu-accelerated")

# Force pure NumPy (no dependencies beyond numpy)
model = GRRNModel.from_pretrained("./llama-1b-symb", device="cpu-numpy")

# Triton GPU (requires: pip install grrn-inference[gpu])
model = GRRNModel.from_pretrained("./llama-1b-symb", device="cuda")

System Requirements

Requirement Minimum
Python 3.9+
RAM 4 GB
Disk 1 GB
OS macOS, Linux, Windows
GPU Not required (CPU-only)

Core dependencies: numpy, safetensors, tqdm β€” that's it.


Convert Your Own Model

pip install grrn-inference
grrn-convert meta-llama/Llama-3.2-1B ./my-llama-symb --quantize int4

Or in Python:

from grrn_inference import SymbioticConverter

converter = SymbioticConverter()
converter.convert(
    source="meta-llama/Llama-3.2-1B",
    output_dir="./my-llama-symb",
    quantize="int4"
)

Supports: LLaMA, Qwen, Mistral, Phi, Gemma, Mixtral, DeepSeek, DBRX.


Important Notes

⚠️ This is a test release of the SYNAXIM engine.

This model was converted from standard Transformer weights (trained with KV-cache self-attention). The SYNAXIM Symbiotic State Engine uses a fundamentally different inference paradigm (O(1) associative memory). Output quality from standard models running through the Symbiotic Gate will differ from their original behavior β€” this is by design.

This release demonstrates the complete pipeline: install β†’ download β†’ convert β†’ load β†’ generate β†’ serve. Future releases will include models specifically trained for the Symbiotic Gate paradigm.


Links


Citation

@software{synaxim,
  title={SYNAXIM: Symbiotic Native Axiom Inference Machine},
  author={GRRNMAKER},
  year={2026},
  url={https://github.com/GRRN-MAKER/SYNAXIM}
}

SYNAXIM β€” Because inference should be a machine, not a framework. Built by GRRNMAKER

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for GRRNMAKE/SYNAXIM

Finetuned
(937)
this model