Aether Mind v6.2 โ cosine-decay fine-tune of v6.1
V6.2 picks up where v6.1 plateaued. Same architecture, same 256-token context, same Aether curated corpus โ but trained for another 30,000 steps under a cosine LR decay (2e-5 โ 2e-7) to push the student past its fine-tune plateau without overshooting.
This is the third native (non-LoRA) Aether release and the first to
use a learning-rate schedule beyond constant. The cosine flag landed
in commit
186b2622.
What you're getting
| Field | Value |
|---|---|
| Base model | Qwen/Qwen2.5-0.5B-Instruct (initialised from), then v6.1 fine-tune resumed here |
| Architecture | V6 transformer: 24 layers, 896 hidden, 14 attention heads (10 Sephirot + 2 generalist + 2 sink), head_dim=64 |
| Trainable params | ~558 M (all weights, no LoRA) |
| Training mode | Pure cross-entropy (no distillation โ same as v6.1) |
| Training context | 256 tokens (same as v6.1) |
| LR schedule | Cosine decay 2e-5 โ 2e-7 over 30,000 fine-tune steps |
| Precision | BF16 weights, F32 KL/CE math internally |
| NSA config | compression_block=64, top_k=2048, sliding_window=512, sink_tokens=4 |
| Vocab | 151,936 (Qwen2.5 tokenizer, untouched) |
| Max position | 32,768 (RoPE theta = 1e6) |
| Total training | 60,000 steps (30K v6.1 + 30K v6.2) |
| File | model.safetensors (1.32 GB, BF16) |
| License | Apache-2.0 (matches base) |
Training run
| Metric | v6.1 | v6.2 | ฮ |
|---|---|---|---|
| Steps (this run) | 30,000 | 30,000 | = |
| Total steps | 30,000 | 60,000 | +30K |
| Wall-clock (this run) | 44.4 min | 44.9 min | +0.5 min |
| Mean CE loss (this run) | 10.18 | 8.43 | โ17 % |
| Throughput | 629.9 tok/s | 622.9 tok/s | flat |
| Mean Sephirot aux | 0.149 | 0.140 | โ6 % |
| LR schedule | constant 2e-5 | cosine 2e-5 โ 2e-7 | new |
| NaN events | 0 | 0 | = |
| Resume base | random init (Qwen) | v6.1 final | new |
Loss trajectory
step 1 loss=13.00 avg=13.00 (v6.1 final state)
step 100 loss=12.00 avg=11.78
step 1000 loss= 7.75 avg= 8.82 โ LR still high, big descent through v6.1's plateau
step 5000 loss= 7.25 avg= 7.71
step 10000 loss= 6.69 avg= 7.41 โ minimum running average
step 15000 loss= 9.56 avg= 7.51 โ cosine kicks in, per-step variance โ, drift โ
step 20000 loss= 8.94 avg= 7.92
step 25000 loss= 8.75 avg= 8.22
step 29999 loss= 9.31 avg= 8.43
The reported mean (8.43) is the run-wide average. The lowest observed running average (7.41 at step 10K) is the actual fine-tune minimum; the back-half drift is the cosine schedule reducing step size to near zero, which makes per-step variance dominate the running average. This is the expected shape of a converged cosine fine-tune.
What changed vs v6.1
Cosine LR decay. Constant LR at 2e-5 in v6.1 caused a plateau from step ~10K onward โ the optimiser kept bouncing around the loss minimum it could see at that step size. Cosine decay to 2e-7 lets later steps take much smaller updates, fine-tuning past the plateau.
Resume from v6.1 rather than fresh init. The model starts at v6.1's final state and refines from there.
Otherwise identical to v6.1: same architecture, same corpus, same context, same NSA config, same Sephirot aux. The single variable changed is the LR schedule.
How to use
Native runtime (recommended) โ Rust aether-mind
Set AETHER_V6_CHECKPOINT to the local path of model.safetensors,
restart qbc-aether-mind.service.
Python
from safetensors.torch import load_file
weights = load_file("model.safetensors")
print("params:", sum(t.numel() for t in weights.values()))
Same architecture as v6.1, so any custom loader/wrapper for v6.1 works here.
Evaluation
(lm-evaluation-harness numbers to follow once the eval binary ships. For now: training-loss curve + sample generations are the primary signal.)
Open items for v6.3
- Per-chunk backward for distillation at ctx โฅ 256, so we can add KL teacher signal back without OOMing.
- Long-context curriculum (1K, 4K, 16K โ 1M) per the V6 master spec.
- lm-evaluation-harness pass (MMLU / ARC / HellaSwag / TruthfulQA) for honest published numbers.
License + citation
Apache-2.0 (matches the base model license).
@misc{aether_mind_v62_2026,
title = {Aether Mind v6.2 --- cosine-decay fine-tune of v6.1},
author = {{BlockArtica} and {QuantumAI-Blockchain}},
year = {2026},
url = {https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.2},
}
Links
- Aether Mind v6.1 โ https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.1
- Aether Mind v6.0 โ https://huggingface.co/QuantumAI-Blockchain/aether-mind-v6.0
- Aether v5.2-lora โ https://huggingface.co/QuantumAI-Blockchain/aether-v5.2-lora
- QuantumAI Blockchain โ qbc.network
- GitHub โ github.com/QuantumAI-Blockchain
- Downloads last month
- -