Aether Mind v7.1 (unified)
The single tracked Aether model: one in-process (candle) model that generates chat, exposes its own attention for the consciousness (HMS-Phi) track, produces the knowledge-fabric embeddings, and is the artifact the QBC blockchain attests. v7.1 is the first release of the unified generation path, replacing the prior split where chat ran through an out-of-process Ollama 7B (no attention exposed) while phi was measured on a separate in-process 0.5B model.
This repository holds the Sephirot adapter that sits on top of a frozen Qwen2.5-7B-Instruct
(served in-process as Q4_K_M via candle). The base is never modified. The adapter is a small
mixture-of-experts where the 10 experts map 1:1 onto the 10 Sephirot cognitive domains. This is
the corrected approach after v6: the Sephirot structure is a routing adapter on a sound base,
not a replacement for the base attention (the v6 attention-replacement destroyed base capability).
What it is
- Architecture: 10-expert MoE adapter, top-2 routing, LoRA-style low-rank experts
(
up(gelu(down(x))),upzero-initialised so the adapter is an exact identity at init). - Trainable params: 1,182,720 (~2.4 MB BF16). The base 7B stays frozen.
- Hidden size: 3584. Rank: 16. Experts: 10 (Keter to Malkuth). Top-k: 2. Alpha: 16.
- Runs in-process in the Aether Mind (Rust + candle), so the same forward pass that generates a token also yields the attention tensors the phi track reads.
Results (full holdout, 500 samples, per-Sephirot-domain)
Cross-entropy (nats/token) on the held-out Aether corpus, base vs base+adapter. Lower is better. The adapter improves every active domain with zero regressions.
| Sephirot domain | samples | base CE | v7.1 CE | delta |
|---|---|---|---|---|
| 1 Chochmah | 88 | 1.8827 | 1.8539 | -0.0288 |
| 2 Binah | 64 | 1.9706 | 1.9354 | -0.0352 |
| 3 Chesed | 18 | 2.3911 | 2.3641 | -0.0269 |
| 4 Gevurah | 6 | 2.8542 | 2.8255 | -0.0286 |
| 5 Tiferet | 36 | 2.6339 | 2.5890 | -0.0449 |
| 6 Netzach | 28 | 2.6454 | 2.6175 | -0.0279 |
| 7 Hod | 90 | 2.2801 | 2.2364 | -0.0437 |
| 8 Yesod | 84 | 2.5627 | 2.5198 | -0.0428 |
| 9 Malkuth | 86 | 2.1066 | 2.0688 | -0.0379 |
| Aggregate | 500 | 2.2450 | 2.2078 | -0.0373 (-1.66%) |
Domains helped: 9 / 9. Domains hurt: 0. A held-out CE regression guard (ceiling = base + 0.15) was active for the whole run and never tripped, so the base capability is provably intact.
The numbers above are domain-CE deltas on the Aether holdout. General-benchmark numbers (MMLU, GSM8K) are below.
General benchmarks (base vs adapter)
Off-the-shelf lm-eval cannot load the native candle build, so these were produced by a
purpose-built candle harness (aether-v7-eval) that scores the SAME frozen Q4 weights twice,
once with the Sephirot adapter active and once with it off. MMLU is multiple-choice
loglikelihood over the A/B/C/D answer tokens; GSM8K is greedy chain-of-thought generation with
final-number extraction.
| benchmark | n | base | v7.1 (adapter) | change |
|---|---|---|---|---|
| MMLU (all subjects) | 14,042 | 71.28% | 71.17% | -0.11 |
| GSM8K | 625 | 67.8% | 77.8% | +10.0 |
Read this the way it reads: general knowledge is held (MMLU is flat across the full 57-subject set, the regression guard never tripped), and multi-step reasoning improves (GSM8K up ~10 points on a 625-question sample, partly from the adapter following the chain-of-thought and final-answer format more reliably). The adapter does not trade away breadth for the domain gains.
(GSM8K is a 625-of-1319 sample: the full run is generation-bound on a single 12 GB card and the sample is already statistically tight. MMLU is the complete set.)
Training
- Objective: plain cross-entropy domain specialisation (base frozen; no teacher).
- Corpus:
aether-curated-v3(content-addressed export of the live knowledge fabric). - Steps: 3000. Context: 192. LR: 5e-4. Optimizer: AdamW. Precision: BF16.
- Hardware: single RTX 3080 Ti (12 GB). The 7B trains as Q4 with a CPU-dequantised, frozen F32 lm_head so the adapter gradient is differentiable through the final projection while the GPU footprint stays inside 12 GB.
Usage
The adapter is loaded by the Aether Mind binary on top of the Q4_K_M 7B base. It is not a PEFT
adapter and is not meant for transformers; it is consumed by the candle UnifiedModel
(base + SephirotAdapter + manifest) in aether-core. See adapter_config.json for the exact
shape and the QuantumAI-Blockchain/qubitcoin-aether repo for the loader.
Lineage
aether-v5.2-lora -> aether-mind-v6.{0,1,2} (attention-replacement, retired) ->
aether-mind-v7.0 (QLoRA on 7B, Ollama-served) -> aether-v7.1-unified (this release, the
first in-process unified generation model the consciousness track and the chain both measure).