███████╗██╗   ██╗███╗   ██╗ █████╗ ██╗  ██╗██╗███╗   ███╗
 ██╔════╝╚██╗ ██╔╝████╗  ██║██╔══██╗╚██╗██╔╝██║████╗ ████║
 ███████╗ ╚████╔╝ ██╔██╗ ██║███████║ ╚███╔╝ ██║██╔████╔██║
 ╚════██║  ╚██╔╝  ██║╚██╗██║██╔══██║ ██╔██╗ ██║██║╚██╔╝██║
 ███████║   ██║   ██║ ╚████║██║  ██║██╔╝ ██╗██║██║ ╚═╝ ██║
 ╚══════╝   ╚═╝   ╚═╝  ╚═══╝╚═╝  ╚═╝╚═╝  ╚═╝╚═╝╚═╝     ╚═╝

Llama 3.2 1B — SYNAXIM `.symb` Format (INT4)

The first model converted to the SYNAXIM proprietary .symb inference format.

This is Meta's Llama-3.2-1B converted to SYNAXIM's framework-free .symb binary format with INT4 per-group quantization. It runs entirely through the SYNAXIM Symbiotic State Engine — no PyTorch, no Transformers library, no KV-Cache.

Quick Start

1. Install SYNAXIM

pip install grrn-inference

Or install from source:

git clone https://github.com/GRRN-MAKER/SYNAXIM.git
cd SYNAXIM
pip install -e .

2. Download This Model

# Using huggingface-cli
huggingface-cli download GRRNNOB/SYNAXIM --local-dir ./llama-1b-symb

# Or using Python
from huggingface_hub import snapshot_download
snapshot_download("GRRNNOB/SYNAXIM", local_dir="./llama-1b-symb")

3. Run Inference

from grrn_inference import GRRNModel

# Load the model
model = GRRNModel.from_pretrained("./llama-1b-symb")

# Generate text
result = model.generate("The meaning of life is", max_tokens=50, temperature=0.7)
print(result.text)
print(f"Speed: {result.tokens_per_second} tok/s")

4. Chat (OpenAI-Style)

result = model.chat([
    {"role": "user", "content": "Explain quantum computing simply."}
], max_tokens=200)

print(result.choices[0].message["content"])

5. Streaming

for chunk in model.stream("Once upon a time", max_tokens=100):
    print(chunk.text, end="", flush=True)

6. Serve as OpenAI API

from grrn_inference import serve
serve("./llama-1b-symb", port=8000, api_key="my-secret-key")

Then connect with any OpenAI client:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="my-secret-key")
response = client.chat.completions.create(
    model="llama-1b-symb",
    messages=[{"role": "user", "content": "Hello!"}]
)

Model Details

Property	Value
Base Model	meta-llama/Llama-3.2-1B
Architecture	LlamaForCausalLM (Dense)
Parameters	1.24B
Hidden Size	2048
Layers	16
Attention Heads	32 Q / 8 KV (GQA 4:1)
Head Dim	64
Vocabulary	128,256 tokens
Intermediate Size	8,192
Activation	SiLU
RoPE θ	500,000
Tied Embeddings	Yes (lm_head = embed_tokens.T)
Format	`.symb` (SYNAXIM proprietary binary)
Quantization	INT4, group_size=128
Compression	3.8× vs FP16
Total Size	~674 MB

`.symb` File Structure

llama-1b-symb/
├── config.symb.json              # Architecture + quantization config
├── embeddings.symb               # Token embeddings (INT4, 66 MB)
├── final_norm.symb               # Final RMSNorm (FP16, 4 KB)
├── tokenizer/                    # Tokenizer files
│   ├── tokenizer.json
│   ├── tokenizer_config.json
│   └── special_tokens_map.json
└── layers/
    ├── layer_00/
    │   ├── attn_q.symb           # Q projection (INT4)
    │   ├── attn_k.symb           # K projection (INT4)
    │   ├── attn_v.symb           # V projection (INT4)
    │   ├── attn_o.symb           # Output projection (INT4)
    │   ├── mlp_gate.symb         # SwiGLU gate (INT4)
    │   ├── mlp_up.symb           # SwiGLU up (INT4)
    │   ├── mlp_down.symb         # SwiGLU down (INT4)
    │   ├── norm_attn.symb        # Pre-attention RMSNorm (FP16)
    │   └── norm_mlp.symb         # Pre-MLP RMSNorm (FP16)
    ├── layer_01/
    │   └── ...
    └── layer_15/
        └── ...

How SYNAXIM Works

SYNAXIM replaces the standard Transformer inference paradigm:

	Standard Transformer	SYNAXIM
Memory Model	KV-Cache (grows with context)	O(1) M matrix (fixed size)
Attention	Q·K^T·V with stored K,V pairs	Sigmoid-gated associative memory update
Runtime	PyTorch + CUDA	NumPy only (zero framework)
Weight Format	safetensors (open)	`.symb` (proprietary INT4 bitpacked)
Install Size	~2 GB (PyTorch + deps)	< 5 MB

The Symbiotic Gate

Instead of growing a KV cache, SYNAXIM maintains a fixed-size state matrix M:

M_{t+1} = σ(gate_score) · M_t + (1 - σ(gate_score)) · key ⊗ value
output  = query · M_{t+1}

M is (D × D) — fixed size, never grows, regardless of sequence length
Gate score computed from Q·K similarity controls retention vs. new imprint
Memory: O(D²) fixed vs O(n·d) growing KV-cache

Device Selection

# Auto-detect (uses Numba if available, else pure NumPy)
model = GRRNModel.from_pretrained("./llama-1b-symb", device="cpu")

# Force Numba-accelerated CPU (requires: pip install grrn-inference[cpu])
model = GRRNModel.from_pretrained("./llama-1b-symb", device="cpu-accelerated")

# Force pure NumPy (no dependencies beyond numpy)
model = GRRNModel.from_pretrained("./llama-1b-symb", device="cpu-numpy")

# Triton GPU (requires: pip install grrn-inference[gpu])
model = GRRNModel.from_pretrained("./llama-1b-symb", device="cuda")

System Requirements

Requirement	Minimum
Python	3.9+
RAM	4 GB
Disk	1 GB
OS	macOS, Linux, Windows
GPU	Not required (CPU-only)

Core dependencies: numpy, safetensors, tqdm — that's it.

Convert Your Own Model

pip install grrn-inference
grrn-convert meta-llama/Llama-3.2-1B ./my-llama-symb --quantize int4

Or in Python:

from grrn_inference import SymbioticConverter

converter = SymbioticConverter()
converter.convert(
    source="meta-llama/Llama-3.2-1B",
    output_dir="./my-llama-symb",
    quantize="int4"
)

Supports: LLaMA, Qwen, Mistral, Phi, Gemma, Mixtral, DeepSeek, DBRX.

Important Notes

⚠️ This is a test release of the SYNAXIM engine.

This model was converted from standard Transformer weights (trained with KV-cache self-attention). The SYNAXIM Symbiotic State Engine uses a fundamentally different inference paradigm (O(1) associative memory). Output quality from standard models running through the Symbiotic Gate will differ from their original behavior — this is by design.

This release demonstrates the complete pipeline: install → download → convert → load → generate → serve. Future releases will include models specifically trained for the Symbiotic Gate paradigm.

Citation

@software{synaxim,
  title={SYNAXIM: Symbiotic Native Axiom Inference Machine},
  author={GRRNMAKER},
  year={2026},
  url={https://github.com/GRRN-MAKER/SYNAXIM}
}

SYNAXIM — Because inference should be a machine, not a framework. Built by GRRNMAKER

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for GRRNMAKE/SYNAXIM

Base model

meta-llama/Llama-3.2-1B

Finetuned

(937)

this model

GRRNMAKE
/

SYNAXIM

Llama 3.2 1B — SYNAXIM `.symb` Format (INT4)

Quick Start

1. Install SYNAXIM

2. Download This Model

3. Run Inference

4. Chat (OpenAI-Style)

5. Streaming

6. Serve as OpenAI API

Model Details

`.symb` File Structure

How SYNAXIM Works

The Symbiotic Gate

Device Selection

System Requirements

Convert Your Own Model

Important Notes

Links

Citation

Model tree for GRRNMAKE/SYNAXIM

Llama 3.2 1B — SYNAXIM .symb Format (INT4)

Quick Start

1. Install SYNAXIM

2. Download This Model

3. Run Inference

4. Chat (OpenAI-Style)

5. Streaming

6. Serve as OpenAI API

Model Details

.symb File Structure

How SYNAXIM Works

The Symbiotic Gate

Device Selection

System Requirements

Convert Your Own Model

Important Notes

Links

Citation

Model tree for GRRNMAKE/SYNAXIM

Llama 3.2 1B — SYNAXIM `.symb` Format (INT4)

`.symb` File Structure