- Q-SheetExtract-50M-Sovereign β Spreadsheet field extractor β SKU, qty, rows, cols, name, price
- What this model does, in one sentence
- Honest performance
- What it's used for β real workflows
- What problem this actually solves
- Integration paths
- Example
- What this is NOT
- Proprietary Qovaryx technology β built on our own scratch base
- Architecture (Qovaryx proprietary)
- How to load it (Python)
- License & posture
- Sibling specialists in the Qovaryx Q-Office-Suite
- Watermark
- Community & support
- What this model does, in one sentence
Q-SheetExtract-50M-Sovereign β Spreadsheet field extractor β SKU, qty, rows, cols, name, price
Pull the right field out of the table. JSON, every time.
What this model does, in one sentence
Given a small CSV / TSV / spreadsheet excerpt and a target schema, returns a JSON object with the requested fields extracted exactly as they appear (SKU, qty, rows, cols, name, price, etc.). Strict shape; no extra keys. Designed for clean field extraction; aggregate computations (mean, stdev) are intentionally out-of-scope for this head.
Honest performance
- Task: spreadsheet field extraction
- Metric:
json_content(extracted JSON object equals gold (canonicalized)) - Holdout: n=37 rows, never seen in training, scored row-by-row
- Score: 100.0% mean
- Bootstrap CI 95% lower bound: 1.000
- Gate threshold: 0.95
- Verdict: PASS at point estimate AND at bootstrap CI lower bound
What it's used for β real workflows
- CSV / TSV field extraction β Feed a CSV excerpt and a target schema ({sku, qty} or {rows, cols} or {name, price}); get back canonical JSON. No header-row regex maintenance.
- Dataset-shape probe β Quick {rows, cols} extraction across a folder of datasets to populate a data catalog.
- Inventory record pickup β Pull SKU + qty from spreadsheets that vary by vendor; feed downstream into your inventory system.
- Pricing table parsing β Name + price across product feeds in slightly-different formats. Q-SheetExtract handles the shape variance; you handle the business logic.
What problem this actually solves
Spreadsheet ingestion is a thousand bespoke parsers, each two years behind the current vendor format. Q-SheetExtract handles small-table field extraction by intent, not by regex. Aggregate computations (mean, stdev) are intentionally out of scope β that's a tool-use job, and the model card says so.
Integration paths
- Pre-ingest cleanup step β After CSV upload, before the database write β Q-SheetExtract canonicalizes field positions.
- Q-Office-Suite runtime β POST /run/q-sheetextract with the table excerpt.
- Pair with Q-Coder for the math β Need the mean too? Use Q-SheetExtract for fields, then call Q-Coder to emit the aggregate expression.
Example
Input:
Extract row + col counts. JSON {rows, cols}.
[72 rows x 16 columns dataset]
Output:
{"rows": 72, "cols": 16}
What this is NOT
- Not a general-purpose chatbot. This head does one job and does it consistently. Free-text generation outside the trained task surface will degrade.
- Not a replacement for a verifier. This is one component in the Qovaryx cluster-shell architecture. The decision-acceptance discipline lives in the wrapper, not in the head.
- Not reproducible from this card. Weights and audit are public; the crystal corpus, eval gate constants, and training hyperparameters are not.
Proprietary Qovaryx technology β built on our own scratch base
This is a 53.5M-parameter sovereign specialist in the Qovaryx Compact Specialist Suite. It is full-fine-tuned from tjarvis91/qovaryx-50m-scratch-base β our own scratch-trained base, not a borrowed foundation model.
- Base: Qovaryx 50M scratch base. Pretrained from random initialization on 491.5M tokens. Not SmolLM2. Not Qwen. Not Llama. Not Mistral. Not Phi. No HuggingFace foundation. No closed-source weights. Every parameter traces back to a Qovaryx training run on Qovaryx hardware.
- Tokenizer: Qovaryx
english_v1BPE (vocab 32000), built in-house against our own pretraining corpus. - Architecture: Qovaryx FinanceDecoder β 12 decoder blocks, GQA, RoPE, SwiGLU FFN, RMSNorm, MTP heads, decision head.
- Recipe: Qovaryx crystallization discipline β train the law before replaying the noise.
- Runs on CPU. No GPU required at inference.
Architecture (Qovaryx proprietary)
- 53.5M parameters
- 12 decoder blocks, d_model=512, n_head=8, GQA n_kv_head=2
- SwiGLU FFN, RoPE positional, RMSNorm
- Multi-token prediction (MTP) auxiliary heads
- Decision head for routed-decision tasks
- Tokenizer: Qovaryx
english_v1BPE, vocab 32000 (in-house build) - Pretrained from
qovaryx-50m-scratch-basestep 60000 β 491.5M tokens - Full fine-tune (no LoRA, no QLoRA, no adapter): every parameter was updated on the Qovaryx crystal corpus for this specialist
How to load it (Python)
import torch
from tokenizers import Tokenizer
from bleeding_edge.model.decoder import FinanceDecoder, DecoderConfig
tok = Tokenizer.from_file("tokenizer.json")
ckpt = torch.load("pytorch_model.pt", map_location="cpu", weights_only=False)
cfg = DecoderConfig(**{k: v for k, v in ckpt["model_cfg"].items() if k in DecoderConfig.__dataclass_fields__})
cfg.vocab_size = tok.get_vocab_size()
model = FinanceDecoder(cfg).eval()
state = {k.removeprefix("_orig_mod."): v for k, v in ckpt["model_state"].items()}
model.load_state_dict(state, strict=False)
prompt = "Extract row + col counts. JSON {rows, cols}.\n[72 rows x 16 columns dataset]"
ids = tok.encode(prompt).ids
cur = torch.tensor([ids], dtype=torch.long)
with torch.no_grad():
for _ in range(120):
nxt = int(torch.argmax(model(cur, return_decision=False).logits[:, -1, :], dim=-1))
if nxt == 0: break
cur = torch.cat([cur, torch.tensor([[nxt]])], dim=1)
print(tok.decode(cur[0].tolist()[len(ids):]))
License & posture
Apache 2.0 for the published weights, model card, and example code.
The Qovaryx scratch base build pipeline, the crystallization corpus, the eval gate constants, the cluster routing policy, and the protected runtime entrypoint are Qovaryx proprietary technology and are not included in this release. Same posture as every previous Qovaryx public release: ship the weights and the audit, not the recipe.
Sibling specialists in the Qovaryx Q-Office-Suite
All nine specialists share the qovaryx-50m-scratch-base and the same audit discipline. Use one directly; use all nine through the cluster shell.
- Q-Triage β ticket routing
- Q-DocCite β document citation
- Q-Invoice β invoice extraction
- Q-ToolCall β agent tool-calls
- Q-Meeting β meeting structuring
- Q-FinCite β 10-K/10-Q citation
- Q-CmdSafe β command safety triage
- Q-SheetExtract β spreadsheet extraction
- Q-Coder β Python code skeletons
Watermark
This release carries a SHA256 issue fingerprint inside release.json for tamper-detection and attribution.
Community & support
- Research devlog: https://github.com/thron-j/qovaryx-ai-research
- Discord (Qovaryx community): https://discord.gg/PtuHZDv5ju
- Ko-fi (we cover GPU bills): https://ko-fi.com/tjarvis91
- Qovaryx options decoder runtime: https://huggingface.co/Qovaryx/qovaryx-options-decoder-full-community
If you find a failure mode this card doesn't cover, open a discussion on this repo or come to the Discord β that's how the next crystal corpus gets written.
- Downloads last month
- 13
Model tree for tjarvis91/Q-SheetExtract-50M-Sovereign
Base model
tjarvis91/qovaryx-50m-scratch-base