Antahkarana-v2 (36.5M)

The accuracy-recovering version of the Antaḥkaraṇa continual-learning architecture — it matches the state-of-the-art on accuracy and forgets ~3× less, on a fair, multi-seed benchmark.

Author: Deepak Soni · License: MIT · Trained from scratch (WideResNet-28-10, 36.5M params — not a fine-tune of any pretrained model; entirely original work).

This is the research-grade vision model that proves the architecture, and the direct ancestor of the language model deepakdsoni/antahkarana-7B.

📦 Model family

Model	What
antahkarana-v1	the original architecture + v1 vision models — the most stable continual learner (only positive backward transfer)
antahkarana-v2	accuracy-recovering v2 (36.5M) — matches SOTA accuracy at ~3× less forgetting
antahkarana-7B	the architecture scaled to a 7B language model

Why we built v2 (the accuracy problem)

The original Antaḥkaraṇa-v1 was the most stable continual learner in our benchmark — the lowest forgetting of any method and the only one with positive backward transfer (learning new tasks slightly improves old ones). But that stability came at a cost: its raw accuracy (0.643) trailed the SOTA, DER++ (0.804). It sat at the ultra-stable end of the stability–plasticity frontier.

v2 was built to recover that accuracy without giving up the stability — by adding two more faculties from the Vedic model of mind:

Added faculty (Vedic)	Mechanism (ML)	Why it helps
vijñāna-smṛti	dark-knowledge / logit replay	rehearses past tasks' output distributions, transferring understanding (not just labels) → big accuracy lift
viveka	selective consolidation (keep top-k% of importance Ω)	discerns the essential from the inessential → protects what matters without over-rigidity

(plus a guṇa controller that adapts the consolidation strength to how much the model is forgetting.)

What v2 achieved

With the breakthrough configuration (buffer=5120, viveka_keep=0.2, α=0.5), v2 recovers accuracy to SOTA level while keeping forgetting ~3× lower than the SOTA — this is domination of the trade-off, not a compromise:

Method	Accuracy ↑	Forgetting ↓	BWT
DER++ (SOTA)	0.804 ± .014	0.067 ± .017	−0.064
Antaḥkaraṇa-v2 (this model)	0.799 ± .008	0.023 ± .002	−0.015
Antaḥkaraṇa-v1 (stable variant)	0.643	0.017	+0.008

Across the whole field, v2 is the only method that is both accurate and stable — it lands in the "ideal corner":

What we tested (it generalizes)

The result is not a single-dataset artifact — the pattern (≈ DER++ accuracy, far less forgetting) holds across datasets, stream lengths, and model sizes (each a separate, fair, multi-seed run):

Setting	DER++ (acc / forget)	v2 (acc / forget)	v1 (acc / forget)
Split-CIFAR-100, 10 tasks (headline)	0.804 / 0.067	0.799 / 0.023	0.643 / 0.017
Split-CIFAR-100, 20-task lifelong	0.827 / 0.060	0.782 / 0.054	0.638 / 0.040
Split-Tiny-ImageNet, 200-class	0.470 / 0.177	0.456 / 0.108	0.380 / 0.013
Bigger backbone WRN-28-12 (52.6M)	0.790 / 0.070	0.753 / 0.037	0.603 / 0.018

On the hard Tiny-ImageNet, DER++ forgets catastrophically (0.177) while v2 stays at 0.108. Together, v1 (ultra-stable) and v2 (high-accuracy) form a tunable stability↔accuracy family — knobs: alpha (replay distillation weight), viveka_keep (selective-consolidation fraction).

Model details


Architecture	WideResNet-28-10 trunk + per-task linear heads (task-incremental)
Params	36.5M (a 52.6M WRN-28-12 variant was also validated)
Method	Antaḥkaraṇa-v2: saṃskāra (EWC + decay) · guṇa · vijñāna-smṛti (logit replay) · viveka · pramāṇa · turīya
Config	λ=10 · decay=0.7 · buffer=5120 · α=0.5 · viveka_keep=0.2 · 25 epochs/task
Benchmark	Split-CIFAR-100 (10 tasks × 10 classes), trained from scratch, 5 seeds
Formats	`.pt` (weights + saṃskāra Ω/θ*) · `model.safetensors` · `config.json`

Usage

# load_akn.py is included — self-contained, no repo needed
from load_akn import load
model, ckpt = load("antahkarana-v2-36.5M-cifar100-wrn28-10.pt")
logits = model(x, task)        # x: [N,3,32,32] CIFAR-100 tensor; task in [0..9]
print(ckpt["config"]["metrics"])   # honest per-task metrics
print(ckpt["omega"].keys())        # the saṃskāra importance Ω it chose to protect

The bigger picture — where v2 leads

v2's core idea — dark-knowledge (logit) replay — is especially powerful for large models, where knowledge transfers through output distributions. That is exactly why it became the base for the language model: the same mechanisms, ported onto a frozen backbone, produced Antahkarana-7B (continual learning on a 7B LLM, ~3.8× less forgetting than naive LoRA). The scaling path runs 36.5M (this model) → 7B → toward 13B–70B.

License & citation

Released under the MIT License — fully original work, trained from scratch (no pretrained base model). Evaluated on CIFAR-100 (Krizhevsky, 2009) and Tiny-ImageNet.

@misc{antahkaranav2_2026,
  title  = {Antahkarana-v2: Recovering Accuracy in Vedic-Derived Continual Learning},
  author = {Deepak Soni},
  year   = {2026},
  url    = {https://huggingface.co/deepakdsoni/antahkarana-v2}
}

Built on the Upaniṣads, Sāṃkhya, Yoga, Nyāya, and modern ML (PyTorch).

Downloads last month: 28

Safetensors

Model size

36.6M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

deepakdsoni
/

antahkarana-v2