Antahkarana-v2 (36.5M)

The accuracy-recovering version of the Antaḥkaraṇa continual-learning architecture — it matches the state-of-the-art on accuracy and forgets ~3× less, on a fair, multi-seed benchmark.

Author: Deepak Soni · License: MIT · Trained from scratch (WideResNet-28-10, 36.5M params — not a fine-tune of any pretrained model; entirely original work).

This is the research-grade vision model that proves the architecture, and the direct ancestor of the language model deepakdsoni/antahkarana-7B.


📦 Model family

Model What
antahkarana-v1 the original architecture + v1 vision models — the most stable continual learner (only positive backward transfer)
antahkarana-v2 accuracy-recovering v2 (36.5M) — matches SOTA accuracy at ~3× less forgetting
antahkarana-7B the architecture scaled to a 7B language model

Why we built v2 (the accuracy problem)

The original Antaḥkaraṇa-v1 was the most stable continual learner in our benchmark — the lowest forgetting of any method and the only one with positive backward transfer (learning new tasks slightly improves old ones). But that stability came at a cost: its raw accuracy (0.643) trailed the SOTA, DER++ (0.804). It sat at the ultra-stable end of the stability–plasticity frontier.

v2 was built to recover that accuracy without giving up the stability — by adding two more faculties from the Vedic model of mind:

Added faculty (Vedic) Mechanism (ML) Why it helps
vijñāna-smṛti dark-knowledge / logit replay rehearses past tasks' output distributions, transferring understanding (not just labels) → big accuracy lift
viveka selective consolidation (keep top-k% of importance Ω) discerns the essential from the inessential → protects what matters without over-rigidity

(plus a guṇa controller that adapts the consolidation strength to how much the model is forgetting.)


What v2 achieved

What v2 achieved

With the breakthrough configuration (buffer=5120, viveka_keep=0.2, α=0.5), v2 recovers accuracy to SOTA level while keeping forgetting ~3× lower than the SOTA — this is domination of the trade-off, not a compromise:

Method Accuracy ↑ Forgetting ↓ BWT
DER++ (SOTA) 0.804 ± .014 0.067 ± .017 −0.064
Antaḥkaraṇa-v2 (this model) 0.799 ± .008 0.023 ± .002 −0.015
Antaḥkaraṇa-v1 (stable variant) 0.643 0.017 +0.008

Across the whole field, v2 is the only method that is both accurate and stable — it lands in the "ideal corner":

Accuracy vs forgetting frontier


What we tested (it generalizes)

The result is not a single-dataset artifact — the pattern (≈ DER++ accuracy, far less forgetting) holds across datasets, stream lengths, and model sizes (each a separate, fair, multi-seed run):

Setting DER++ (acc / forget) v2 (acc / forget) v1 (acc / forget)
Split-CIFAR-100, 10 tasks (headline) 0.804 / 0.067 0.799 / 0.023 0.643 / 0.017
Split-CIFAR-100, 20-task lifelong 0.827 / 0.060 0.782 / 0.054 0.638 / 0.040
Split-Tiny-ImageNet, 200-class 0.470 / 0.177 0.456 / 0.108 0.380 / 0.013
Bigger backbone WRN-28-12 (52.6M) 0.790 / 0.070 0.753 / 0.037 0.603 / 0.018

On the hard Tiny-ImageNet, DER++ forgets catastrophically (0.177) while v2 stays at 0.108. Together, v1 (ultra-stable) and v2 (high-accuracy) form a tunable stability↔accuracy family — knobs: alpha (replay distillation weight), viveka_keep (selective-consolidation fraction).


Model details

Architecture WideResNet-28-10 trunk + per-task linear heads (task-incremental)
Params 36.5M (a 52.6M WRN-28-12 variant was also validated)
Method Antaḥkaraṇa-v2: saṃskāra (EWC + decay) · guṇa · vijñāna-smṛti (logit replay) · viveka · pramāṇa · turīya
Config λ=10 · decay=0.7 · buffer=5120 · α=0.5 · viveka_keep=0.2 · 25 epochs/task
Benchmark Split-CIFAR-100 (10 tasks × 10 classes), trained from scratch, 5 seeds
Formats .pt (weights + saṃskāra Ω/θ*) · model.safetensors · config.json

Usage

# load_akn.py is included — self-contained, no repo needed
from load_akn import load
model, ckpt = load("antahkarana-v2-36.5M-cifar100-wrn28-10.pt")
logits = model(x, task)        # x: [N,3,32,32] CIFAR-100 tensor; task in [0..9]
print(ckpt["config"]["metrics"])   # honest per-task metrics
print(ckpt["omega"].keys())        # the saṃskāra importance Ω it chose to protect

The bigger picture — where v2 leads

v2's core idea — dark-knowledge (logit) replay — is especially powerful for large models, where knowledge transfers through output distributions. That is exactly why it became the base for the language model: the same mechanisms, ported onto a frozen backbone, produced Antahkarana-7B (continual learning on a 7B LLM, ~3.8× less forgetting than naive LoRA). The scaling path runs 36.5M (this model) → 7B → toward 13B–70B.

License & citation

Released under the MIT License — fully original work, trained from scratch (no pretrained base model). Evaluated on CIFAR-100 (Krizhevsky, 2009) and Tiny-ImageNet.

@misc{antahkaranav2_2026,
  title  = {Antahkarana-v2: Recovering Accuracy in Vedic-Derived Continual Learning},
  author = {Deepak Soni},
  year   = {2026},
  url    = {https://huggingface.co/deepakdsoni/antahkarana-v2}
}

Built on the Upaniṣads, Sāṃkhya, Yoga, Nyāya, and modern ML (PyTorch).

Downloads last month
28
Safetensors
Model size
36.6M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train deepakdsoni/antahkarana-v2