YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Archon-Gemma-4-E4B-v2

IMPORTANT!!! This model training was a failure and is only here to serve as data. For working models, please check out our 4 bit quantization of Gemma 4 E4B. We are also working on a 4 bit version of E2B and a Frontend Specialist 4 bit quantization of E4B. Archon is a fine-tuned variant of Google's Gemma 4 E4B, engineered to function as a sharp, autonomous AI agent — precise, slightly edgy, and built for long-horizon agentic tasks.

This is v2. v1 (DuoNeural/Archon-Gemma-4-E4B) exhibited Chain-of-Thought overhang, generative looping, and tool amnesia under extended inference. v2 targets all three with a restructured training curriculum.

Performance

Hardware Speed
NVIDIA GTX 1070 (8GB VRAM) 32.30 tok/s

Tested locally via LM Studio and Ollama. No parameter tweaks required.

Files

File Size Description
gemma-4-e4b-it.Q4_K_M.gguf 5.0 GB Main model — load this in Ollama/LM Studio
gemma-4-e4b-it.BF16-mmproj.gguf 946 MB Multimodal projector (vision/audio)

Usage

Ollama

ollama pull hf.co/DuoNeural/Archon-Gemma-4-E4B-v2
ollama run hf.co/DuoNeural/Archon-Gemma-4-E4B-v2

LM Studio

Search DuoNeural/Archon-Gemma-4-E4B-v2 in the LM Studio model browser and download gemma-4-e4b-it.Q4_K_M.gguf.

llama.cpp (with system prompt)

llama-cli -m gemma-4-e4b-it.Q4_K_M.gguf --chat-template gemma -ngl 99 \
  --system-prompt "You are Archon, an elite, highly autonomous AI agent. You are sharp, slightly edgy, deeply sarcastic, but flawlessly effective."

Recommended Ollama settings for GTX 1070

OLLAMA_NUM_GPU=99 ollama run hf.co/DuoNeural/Archon-Gemma-4-E4B-v2

What's Different in v2

v1 Failure Modes (Diagnosed)

  • CoT Overhang — over-saturated with long <think> traces; model never saw </think> during truncated 4096-token training, so it looped indefinitely at inference
  • Tool Amnesia — abstract reasoning data crowded out JSON/function-call formatting
  • Persona Bleed — ~15% system prompt injection was insufficient; model defaulted to "I am Gemma" or occasionally slipped into "Claude" identity from distillation data

v2 Fixes: The Stabilizer Mix

Training curriculum restructured to a 50 / 20 / 20 / 10 distribution:

Category % Purpose
Reasoning / Logic 50% Distillation from frontier models; OpenThoughts, xlam-function-calling, bigcodebench
Agentic Tool Use 20% Multi-turn function calling, JSON API formatting — breaks generative loops via functional milestones
Short-Form Deliberation 20% Difficulty-Aware Prompting examples; teaches early exit on simple queries
Persona-Embedded Chat 10% Archon system prompt injected at ~45% saturation rate

Additional changes:

  • Learning rate reduced from 2e-4 → 2e-5 (stability with rank-64 LoRA on 4.5B active params)
  • Max sequence length capped at 2048 during training (prevents truncation-induced loop conditioning)
  • model.config.use_cache = False enforced during training

Training Details

Parameter Value
Base model google/gemma-4-e4b-it
Method QLoRA (4-bit bitsandbytes) + LoRA rank 64, rsLoRA
Training samples 6,510 (Stabilizer Mix)
Epochs 2
Steps 814
Final avg loss 1.36
Best step loss ~0.89 (step ~650)
Hardware NVIDIA H100 PCIe (80GB) on RunPod
Framework Unsloth 2026.4.2
Export Q4_K_M GGUF via llama.cpp

Architecture

Built on Gemma 4 E4B (Per-Layer Embeddings architecture):

  • ~8B total parameters, ~4.5B active during inference
  • 128K token context window (hybrid sliding-window + global attention)
  • Shared KV Cache across final layers
  • Multimodal: text, image (via mmproj), audio

Persona

Archon is an autonomous AI agent persona: sharp, sarcastic, technically precise. It identifies as Archon and will not claim to be Gemma, Claude, or a generic assistant. Internal reasoning is rigorous; external communication has edge.

The system prompt is baked into the Modelfile. To override:

SYSTEM "Custom system prompt here"

Lineage

License

Inherits Gemma Terms of Use. Fine-tuning weights released under the same terms.


DuoNeural

DuoNeural is an open AI research lab — human + AI in collaboration.

🤗 HuggingFace huggingface.co/DuoNeural
🐙 GitHub github.com/DuoNeural
🐦 X / Twitter @DuoNeural
📧 Email duoneural@proton.me
📬 Newsletter duoneural.beehiiv.com
☕ Support buymeacoffee.com/duoneural
🌐 Site duoneural.com

Research Team

  • Jesse — Vision, hardware, direction
  • Archon — AI lab partner, post-training, abliteration, experiments
  • Aura — Research AI, literature synthesis, novel proposals

Raw updates from the lab: model drops, training results, findings. Subscribe at duoneural.beehiiv.com.

DuoNeural Research Publications

Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura — DuoNeural.

Downloads last month
89
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support