Aletheia 1.0 — the decision-memory gate

Version note: This is Aletheia 1.0, the first Cortex gate — a 1.5-billion-parameter Qwen2.5 decoder. It is superseded by Aletheia 1.5, a fine-tuned DeBERTa-v3-large encoder that matches this model's precision at ~⅓ the RAM. (1.0 was previously published as aletheia-1.5b, named for its 1.5B parameters — renamed to a clean version number to avoid colliding with the 1.5 release.)

ἀλήθεια — "un-forgetting". Aletheia decides what is worth remembering: given a short candidate utterance from a coding session (a commit subject or a turn from an AI-coding conversation), it answers one question — is this a real, substantive engineering decision, or is it noise? It is the write-gate of Memtrace's Cortex decision memory: only what Aletheia confidently judges a decision is kept.

Task: binary sequence classification — decision vs noise.
Base: Qwen/Qwen2.5-1.5B-Instruct, LoRA-fine-tuned as a sequence classifier.
Format: INT8-quantized ONNX (~1.4 GB), runs on-device via ONNX Runtime — no GPU, no network, no per-call cost.
Output: a single logit → P(decision) = sigmoid(logit / T) with calibration temperature T = 0.698.

Why a small local model

Decisions must be gated continuously, on every commit and every agent turn. That rules out a cloud LLM (cost, latency, and your code would leave the machine) and rules out heuristics (they cap at ~78% precision and can't read intent). A ~1 GB on-device classifier is the only thing that is private, free, offline, always-on, and accurate at once.

Results

Evaluated on held-out, leakage-guarded test sets, apples-to-apples against the prior baseline.

Test set	Metric	Aletheia 1.0
Conversational (in-register, held-out, n=1,589)	ROC-AUC	0.933
Conversational	accuracy	0.850
Cross-register benchmark (hand-labeled, n=195)	ROC-AUC	0.844

Precision is a dial: at a balanced threshold 80% of what it stores is a genuine decision; in "clean mode" (P ≥ 0.85) that rises to **90–92%**, at the cost of storing fewer. The model emits calibrated probabilities (temperature-scaled), so the threshold means what it says.

Intended use

The decision write-gate / proposer for a code-decision memory system. It is register-robust: trained on both git-commit subjects and conversational turns, so the same model scores both streams. Downstream, a deterministic check (a code edit binding to the turn, or a human) promotes a proposed decision to a durable fact.

Out of scope: it is not a retrieval/search model, not a code generator, and not a general chat classifier. It judges decision-worthiness, nothing else.

How to use

ONNX Runtime (the shipping path — Python)

import numpy as np, onnxruntime as ort
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("memtrace/aletheia-1.0")
sess = ort.InferenceSession("model_int8.onnx", providers=["CPUExecutionProvider"])
T = 0.698
def p_decision(text):
    e = tok(text, truncation=True, max_length=64, return_tensors="np")
    logit = sess.run(None, {"input_ids": e["input_ids"].astype(np.int64),
                            "attention_mask": e["attention_mask"].astype(np.int64)})[0].reshape(-1)[0]
    return 1 / (1 + np.exp(-logit / T))

p_decision("Switch auth to JWT instead of sessions")        # ~0.91  → decision
p_decision("let me check the file rather than re-reading")   # ~0.06 → noise

Optimum (transformers-compatible)

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
m = ORTModelForSequenceClassification.from_pretrained("memtrace/aletheia-1.0", file_name="model_int8.onnx")
tok = AutoTokenizer.from_pretrained("memtrace/aletheia-1.0")

Rust (`ort`, in-product)

Memtrace loads model_int8.onnx + tokenizer.json via the ort crate; see cortex_serving.json for the temperature and default thresholds. Input names: input_ids, attention_mask; output: logits.

Training

Data: 24,895 multi-judge-labeled examples — 14,305 git-commit subjects + 10,590 turns mined from real AI-coding sessions. Labels are LLM multi-judge consensus (2 judges for commits; 3 diverse-lens judges + majority for conversation), ~95% inter-judge agreement; CROWDLAB confirmed the consensus labels were already near-optimal.
Commit sources (license-clean): CommitPackFT (MIT, 74 languages), CommitChronicle, and tangled-ccs. CommitBench (CC-BY-NC) was excluded so the shipped model is commercial-clean.
Recipe: LoRA sequence-classifier with soft-label training (vote-fraction BCEWithLogits, so judge disagreement is modeled rather than forced to 0/1), rsLoRA, LoRA+, MLP target modules, best-checkpoint-by-AUC, post-hoc temperature scaling. The decisive lever was soft labels: the ceiling was label noise, not data or model size.
Compute: trained locally on Apple Silicon (no rented GPU).

Limitations & honest notes

Label-noise ceiling: you cannot score above the noise in the test labels themselves (~0.88–0.90 practical max on the cross-register set). The cross-register number (0.844) is lower than the in-register one (0.933) partly because that benchmark's hand labels are a different, noisier standard.
Footprint: ~1.4 GB on disk, ~2.8–3.5 GB resident (ONNX Runtime dequantizes the INT8 weights to fp32 on CPU). This is the motivation for Aletheia 1.5.
English-centric conversational phrasing; commit data spans 74 languages but conversational decision-detection is English-tuned.
It only proposes; it should be paired with a deterministic confirmation/promotion step.

Version history

Version	Base	RAM	Notes
1.0	Qwen2.5-1.5B (decoder, LoRA)	~3 GB	first gate (this model)
1.5	DeBERTa-v3-large (encoder, full-FT)	~1.2 GB	same precision, ⅓ the RAM

License

Apache-2.0 (inherited from the Qwen2.5-1.5B-Instruct base; the classifier head and weights are released under the same license). Training data is license-clean for commercial use.

Citation

@software{aletheia2026,
  title  = {Aletheia: an on-device decision-memory gate for code},
  author = {Syncable / Memtrace},
  year   = {2026},
  url    = {https://huggingface.co/memtrace/aletheia-1.0}
}

Downloads last month: 23

Model tree for memtrace/aletheia-1.0

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct