Instructions to use memtrace/aletheia-1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use memtrace/aletheia-1.0 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="memtrace/aletheia-1.0")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("memtrace/aletheia-1.0") model = AutoModelForSequenceClassification.from_pretrained("memtrace/aletheia-1.0") - Notebooks
- Google Colab
- Kaggle
Aletheia 1.0 — the decision-memory gate
Version note: This is Aletheia 1.0, the first Cortex gate — a 1.5-billion-parameter Qwen2.5 decoder. It is superseded by Aletheia 1.5, a fine-tuned DeBERTa-v3-large encoder that matches this model's precision at ~⅓ the RAM. (1.0 was previously published as
aletheia-1.5b, named for its 1.5B parameters — renamed to a clean version number to avoid colliding with the 1.5 release.)
ἀλήθεια — "un-forgetting". Aletheia decides what is worth remembering: given a short candidate utterance from a coding session (a commit subject or a turn from an AI-coding conversation), it answers one question — is this a real, substantive engineering decision, or is it noise? It is the write-gate of Memtrace's Cortex decision memory: only what Aletheia confidently judges a decision is kept.
- Task: binary sequence classification —
decisionvsnoise. - Base:
Qwen/Qwen2.5-1.5B-Instruct, LoRA-fine-tuned as a sequence classifier. - Format: INT8-quantized ONNX (~1.4 GB), runs on-device via ONNX Runtime — no GPU, no network, no per-call cost.
- Output: a single logit →
P(decision) = sigmoid(logit / T)with calibration temperatureT = 0.698.
Why a small local model
Decisions must be gated continuously, on every commit and every agent turn. That rules out a cloud LLM (cost, latency, and your code would leave the machine) and rules out heuristics (they cap at ~78% precision and can't read intent). A ~1 GB on-device classifier is the only thing that is private, free, offline, always-on, and accurate at once.
Results
Evaluated on held-out, leakage-guarded test sets, apples-to-apples against the prior baseline.
| Test set | Metric | Aletheia 1.0 |
|---|---|---|
| Conversational (in-register, held-out, n=1,589) | ROC-AUC | 0.933 |
| Conversational | accuracy | 0.850 |
| Cross-register benchmark (hand-labeled, n=195) | ROC-AUC | 0.844 |
Precision is a dial: at a balanced threshold 80% of what it stores is a genuine decision;
in "clean mode" (90–92%**, at the cost of storing fewer. The model
emits calibrated probabilities (temperature-scaled), so the threshold means what it says.P ≥ 0.85) that rises to **
Intended use
The decision write-gate / proposer for a code-decision memory system. It is register-robust: trained on both git-commit subjects and conversational turns, so the same model scores both streams. Downstream, a deterministic check (a code edit binding to the turn, or a human) promotes a proposed decision to a durable fact.
Out of scope: it is not a retrieval/search model, not a code generator, and not a general chat classifier. It judges decision-worthiness, nothing else.
How to use
ONNX Runtime (the shipping path — Python)
import numpy as np, onnxruntime as ort
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("memtrace/aletheia-1.0")
sess = ort.InferenceSession("model_int8.onnx", providers=["CPUExecutionProvider"])
T = 0.698
def p_decision(text):
e = tok(text, truncation=True, max_length=64, return_tensors="np")
logit = sess.run(None, {"input_ids": e["input_ids"].astype(np.int64),
"attention_mask": e["attention_mask"].astype(np.int64)})[0].reshape(-1)[0]
return 1 / (1 + np.exp(-logit / T))
p_decision("Switch auth to JWT instead of sessions") # ~0.91 → decision
p_decision("let me check the file rather than re-reading") # ~0.06 → noise
Optimum (transformers-compatible)
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
m = ORTModelForSequenceClassification.from_pretrained("memtrace/aletheia-1.0", file_name="model_int8.onnx")
tok = AutoTokenizer.from_pretrained("memtrace/aletheia-1.0")
Rust (ort, in-product)
Memtrace loads model_int8.onnx + tokenizer.json via the ort crate; see cortex_serving.json
for the temperature and default thresholds. Input names: input_ids, attention_mask; output: logits.
Training
- Data: 24,895 multi-judge-labeled examples — 14,305 git-commit subjects + 10,590 turns mined from real AI-coding sessions. Labels are LLM multi-judge consensus (2 judges for commits; 3 diverse-lens judges + majority for conversation), ~95% inter-judge agreement; CROWDLAB confirmed the consensus labels were already near-optimal.
- Commit sources (license-clean): CommitPackFT (MIT, 74 languages), CommitChronicle, and tangled-ccs. CommitBench (CC-BY-NC) was excluded so the shipped model is commercial-clean.
- Recipe: LoRA sequence-classifier with soft-label training (vote-fraction
BCEWithLogits, so judge disagreement is modeled rather than forced to 0/1),rsLoRA, LoRA+, MLP target modules, best-checkpoint-by-AUC, post-hoc temperature scaling. The decisive lever was soft labels: the ceiling was label noise, not data or model size. - Compute: trained locally on Apple Silicon (no rented GPU).
Limitations & honest notes
- Label-noise ceiling: you cannot score above the noise in the test labels themselves (~0.88–0.90 practical max on the cross-register set). The cross-register number (0.844) is lower than the in-register one (0.933) partly because that benchmark's hand labels are a different, noisier standard.
- Footprint: ~1.4 GB on disk, ~2.8–3.5 GB resident (ONNX Runtime dequantizes the INT8 weights to fp32 on CPU). This is the motivation for Aletheia 1.5.
- English-centric conversational phrasing; commit data spans 74 languages but conversational decision-detection is English-tuned.
- It only proposes; it should be paired with a deterministic confirmation/promotion step.
Version history
| Version | Base | RAM | Notes |
|---|---|---|---|
| 1.0 | Qwen2.5-1.5B (decoder, LoRA) | ~3 GB | first gate (this model) |
| 1.5 | DeBERTa-v3-large (encoder, full-FT) | ~1.2 GB | same precision, ⅓ the RAM |
License
Apache-2.0 (inherited from the Qwen2.5-1.5B-Instruct base; the classifier head and weights are released under the same license). Training data is license-clean for commercial use.
Citation
@software{aletheia2026,
title = {Aletheia: an on-device decision-memory gate for code},
author = {Syncable / Memtrace},
year = {2026},
url = {https://huggingface.co/memtrace/aletheia-1.0}
}
- Downloads last month
- 23
Model tree for memtrace/aletheia-1.0
Dataset used to train memtrace/aletheia-1.0
Evaluation results
- ROC-AUC (in-register, held-out) on Cortex decisions — conversational held-out (n=1,589)self-reported0.933
- Accuracy on Cortex decisions — conversational held-out (n=1,589)self-reported0.850
- ROC-AUC on Cross-register benchmark (n=195)self-reported0.844