Qovaryx scratch-base lineage

A from-scratch own-base checkpoint in the Qovaryx pretrain lineage (50M / 350M / 1B / 3B). Published for research reproducibility.

πŸ“¦ Shipped inside the Qovaryx app

This is a component of the Qovaryx Options Decoder cluster. It is published here for transparency + research reproducibility β€” the runtime is bundled in the desktop app, not installed from Hugging Face. Installer links have been removed from this card.

πŸ‘‰ Download the signed beta: https://qovaryx.jehorizon.com/download.html πŸ“– Read the research: https://qovaryx.jehorizon.com/research

πŸš€ New flagship: Qovaryx Options Decoder β€” Full Community Runtime

The latest, most capable Qovaryx release is live as a single drop-in package. Six functional HGB specialists + eight vaulted torch heads in one runtime. 15-of-15 internal benchmark cells closed at strict bootstrap CI lower bound. Drop-in replacement for FrankenB / V3.7 / Qwen-VPA. Sub-millisecond inference. Offline. No license email required.

πŸ‘‰ Qovaryx/qovaryx-options-decoder-full-community


πŸ’¬ Join the community

Discord: https://discord.gg/PtuHZDv5ju β€” builders training their own trading/finance/generalist models. Engineering, no signals. Get install help, share work, follow the Qovaryx research devlog.

Ko-fi: https://ko-fi.com/tjarvis91 β€” every coffee literally buys GPU time for the next training cycle.

Qovaryx 3B β€” Scratch Base (random-init)

Compact AI is not small AI. A 3.07B-parameter trainable substrate, the largest in the Qovaryx scratch-base lineage, sized for serious solo pretraining on a single A100 80GB or a 5070 Ti with gradient checkpointing. Random-init β€” bring your own corpus, train it from scratch. MTP-K=4, GQA 4:1, pluggable FFN backends (dense SwiGLU / ternary BitNet-style / sparse low-rank MoE), optional task-specific heads. Apache-2.0.

πŸ“– Read the public research: github.com/thron-j/qovaryx-ai-research β€” philosophy, devlog series (AI without big data centers, legacy brain crystallization, shell-governed cognition, EVO20 training genome, compact frontier architectures, sovereign compact cognition deployed, more). The architecture choices in this checkpoint are described there. Implementation internals are intentionally withheld.


Compact β‰  small

This is the 3B-parameter sibling in a four-base lineage:

  • 50M β€” proxy / smoke
  • 350M β€” solo training target
  • 1B β€” full consumer-GPU target
  • 3B β€” this checkpoint. Trainable on 1Γ— A100 80GB without gradient checkpointing; fits on a 5070 Ti 16 GB with GC enabled. Pretrain throughput we have measured: ~8,500 tok/s on 1Γ—A100 at batch 8 (bf16, no GC), ~190-660 tok/s on RTX 5070 Ti (GC on, batch 1, grad-accum 8).

Compact in this lineage means engineered for what an individual operator can actually pretrain, not "small enough to fit on a phone." 3B is the sweet spot before frontier-only territory starts.


Architecture

Parameters 3,066,688,256 (3.067 B)
d_model 2944
n_layer 32
n_head 16 (query)
n_kv_head 4 (GQA 4:1)
d_ff 7680
vocab_size 32000
max_seq_len 4096
mtp_k 4 (multi-token prediction, MLP heads)
Tokenizer english_v1 BPE (in-house, 32K vocab)
Positional RoPE (base 10,000)
Norm RMSNorm
FFN default SwiGLU (swappable to ternary or routed low-rank at the config level)
Init random (torch seed 17)
Precision in this checkpoint bf16 (6.13 GB on disk)

Implementation: bleeding_edge.model.decoder.FinanceDecoder (the class name is legacy; the architecture is task-agnostic). The trunk is standard pre-norm decoder; multi-token prediction (K=4 with MLP heads) and a routed-decision head are wired in for downstream training but inactive in the random-init weights.


How to load

import torch
from tokenizers import Tokenizer
from bleeding_edge.model.decoder import FinanceDecoder, DecoderConfig

tok = Tokenizer.from_file("tokenizer.json")
ckpt = torch.load("pytorch_model.pt", map_location="cpu", weights_only=False)

cfg = DecoderConfig(**{k: v for k, v in ckpt["model_cfg"].items()
                       if k in DecoderConfig.__dataclass_fields__})
cfg.vocab_size = tok.get_vocab_size()

model = FinanceDecoder(cfg)
state = {k.removeprefix("_orig_mod."): v for k, v in ckpt["model_state"].items()}
model.load_state_dict(state, strict=False)
model.eval()  # outputs noise until you train; this is the point

The bleeding_edge package source ships with the Qovaryx Q-Office-Suite runtime; architecture notes are public at the research devlog.


What this is

A random-init Qovaryx 3B substrate. Same architecture as the sibling scratch bases, same tokenizer, same training pipeline expected β€” just 3B parameters this time. Drop it into the Qovaryx training loop and pretrain on your own corpus.

The deliberate choices in this design that make 3B-class solo training viable:

  • GQA 4:1 instead of MHA β€” cuts KV-cache memory by 4Γ— at inference, cuts attention cost during training.
  • SwiGLU FFN, swappable β€” research-friendly: drop in ternary (BitNet-style) or routed low-rank (sparse MoE) without retraining the trunk.
  • MTP-K=4 with MLP heads β€” auxiliary multi-token-prediction loss baked into pretraining. The K=2 head can serve as a speculative-decode draft at inference.
  • No chart encoder by default β€” disabled in this checkpoint (chart_patch_encoder_enabled=False). Re-enable in DecoderConfig if your downstream task uses vision tokens.
  • bf16-native weights β€” no fp16 underflow drama at 3B scale.

What this is NOT

  • Not a pretrained model. Out-of-the-box outputs are noise. Random initialization is the entire point.
  • Not finance-specific despite the legacy class name FinanceDecoder. The architecture is task-agnostic; the BPE tokenizer leans toward English-text merges and works on any English corpus.
  • Not a drop-in replacement for Llama / Qwen / Mistral. The component set is different (MTP-K heads in particular need their own training term).
  • Not adversarially robust. It's a substrate.
  • Not the largest Qovaryx base. This is currently the largest open scratch-base in the lineage. Anything bigger requires a different training conversation.

License

Apache-2.0. Use it for research, commercial work, hobby projects β€” whatever. Attribution appreciated but not legally required.


Research notes

Qovaryx is part of a broader local-sovereign-AI research program. Higher-level framings, architectural rationale, and ablation studies are published progressively at:

Research index: https://github.com/thron-j/qovaryx-ai-research

Implementation details, training corpora, and certain ablation specifics are intentionally withheld in the public devlog. The framings are publishable; the internals are not. Collaboration inquiries: jeherizonllc@gmail.com.


Real training runs on this architecture β€” Cluster Shell V1 audit

The smaller scratch bases in this lineage are the trainable substrate for the Cluster Shell committee architecture described in the Qovaryx research devlog. The V1 readiness gate, run on the actual trained specialist heads (built on the 50M and 350M siblings), looked like this:

Specialist Train rows Majority baseline Linear baseline GBDT baseline Gate verdict
Q-Penny 150K 52.90% 73.03% 73.84% PASS
Q-Veto 150K 57.57% 72.23% 79.93% PASS
Q-Router 150K 24.00% 76.54% 84.62% PASS
Q-2yr 300K 50.04% 75.38% 75.93% PASS
Q-180d 300K 50.09% 74.46% 74.95% PASS

Five specialists, deterministic 5% holdouts, each at least +20pp over the majority-class floor. The architecture clears its falsifiability gate on fresh data β€” what makes that gate honest is documented in evaluation discipline and when the proxy breaks.

The Qovaryx Q-Office-Suite (released 2026-06-02) extended the cluster-shell pattern to nine sovereign 50M specialists β€” all at 100% on their held-out audits, all full-fine-tuned from qovaryx-50m-scratch-base. See the eight (now nine) sovereign specialists release.

This 3B substrate is what we use for the next generation: a sovereign 3B base that the cluster shell delegates to (instead of competing with frontier 3B alone).


Sibling models in this lineage


Watermark / provenance

This release records a SHA256 fingerprint of the random-init state dict inside config.json (model_state_sha256) plus a tokenizer SHA256 (tokenizer_sha256) for tamper-detection and downstream attribution.

{
  "init": "random",
  "seed": 17,
  "params_b": 3.067,
  "model_state_sha256": "<see config.json>",
  "tokenizer_sha256": "d226a02a00dfab5c3fb58aadb13a3afe2f3635ce0795f73f2857ec3b4fce3704"
}

Support

If this base helps you build something, support continued development:

β˜• ko-fi.com/tjarvis91 β€” every contribution funds A100 time for the next training cycle and the next-generation Qovaryx scratch bases.

πŸ’¬ discord.gg/PtuHZDv5ju β€” the Qovaryx builder community. Engineering, no signals.


Citation

@misc{qovaryx-3b-scratch-base-2026,
  title     = {Qovaryx-3B Scratch Base: A 3-Billion-Parameter Compact Decoder Substrate for Solo Pretraining},
  author    = {Jarvis, Thomas},
  year      = {2026},
  month     = {June},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/tjarvis91/qovaryx-3b-scratch-base}
}

Status

Random-init checkpoint as of 2026-06-04. Tag v1.0 upon publish. Future updates will add trained sibling repos (qovaryx-3b-finance-base, qovaryx-3b-instruct) once the first downstream training cycles complete. Watch the org page and the research devlog for new releases.

Downloads last month
59
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support