Antahkarana-7B

A lifelong-learning language model that learns new domains without forgetting old ones, and abstains instead of hallucinating when it is unsure — built on Mistral-7B with an AI architecture derived from the 2,500-year-old Vedic model of mind (the antaḥkaraṇa, "inner instrument").

Author: Deepak Soni · License: Apache-2.0 · Base model: Mistral-7B-v0.1

This is a standalone, full-weights 7B model: load it with a plain from_pretrained — no adapter, no PEFT. The continual-learning architecture was trained in via LoRA and then merged into the base weights.


📦 Model family

Model What
antahkarana-v1 the original architecture + v1 vision models — the most stable continual learner (only positive backward transfer)
antahkarana-v2 accuracy-recovering v2 (36.5M) — matches SOTA accuracy at ~3× less forgetting
antahkarana-7B the architecture scaled to a 7B language model

At a glance — what makes this different

Standard fine-tuning suffers catastrophic forgetting: teach a model a new task and it loses the old one. Antahkarana-7B is trained with a small set of cognitive "faculties," each derived from a Vedic concept and implemented as a concrete mechanism:

Faculty (Vedic) Mechanism (ML) What it does
saṃskāra Fisher-importance consolidation + decay over LoRA protects what mattered for old domains → don't forget
vijñāna-smṛti dark-knowledge / exemplar replay rehearses past domains while learning new ones
pramāṇa calibrated-confidence gate abstains ("I'm not sure") instead of hallucinating
manas / buddhi two decorrelated views, cross-teaching safe self-learning from unlabeled data (research track)

How it works

The borrowed mind (Mistral-7B) stays frozen as the stable core (śruti); a small trainable instrument (chitta = LoRA, ~0.2% of params) learns new domains, guided by the faculties — and the pramāṇa gate decides whether to answer or abstain:

Antahkarana-7B architecture

Measured outcome (continual instruction-tuning, 4 domains, 3 seeds): ~3.8× less forgetting than naive LoRA, with higher and far more stable accuracy.

Antahkarana-7B vs naive LoRA


The journey: from a 2,500-year-old architecture to a 7B model

This model is the production endpoint of a multi-stage research-to-engineering program.

1. The architecture. The Vedic tradition describes the mind as an antaḥkaraṇa — an "inner instrument" of distinct faculties (chitta/memory, manas/perception, buddhi/discernment, ahaṃkāra/identity, plus pramāṇa/valid knowledge and the guṇa dynamics). Each faculty was mapped to a concrete, testable ML mechanism.

2. Research validation (vision, 36–52M params). The mechanisms were first proven on continual-learning image benchmarks (Split-CIFAR-100, Split-Tiny-ImageNet) against the field's standard methods (EWC, ER, DER++): the architecture was the most stable learner tested and the only one with positive backward transfer, with a clean ablation showing each Vedic-derived component adds value.

3. Scaling on a frozen modern backbone (E1–E2). On a frozen ViT-B/16, the consolidation works in adapter space, matching the SOTA (DER++) on accuracy while forgetting less, and extends to the harder class-incremental setting with label-free novelty detection (avidyā).

4. Self-learning and memory (E-S, śruti/smṛti/nidrā). The model learns from unlabeled data via decorrelated co-training and reaches near-supervised accuracy from ~2% labels; a complementary study showed an external "smṛti" memory + periodic "sleep" consolidation retains knowledge ~2.4× better than holding it in weights.

5. The 7B model (E7). The architecture was ported to language: frozen Mistral-7B + LoRA + saṃskāra + vijñāna-smṛti + pramāṇa, continually instruction-tuned across four domains with checkpointing, then merged into the standalone 7B model published here.


Results

Continual instruction-tuning — naive LoRA vs Antaḥkaraṇa-LoRA (3-seed mean ± std)

Four text-classification domains learned in sequence (AG News → DBpedia → Emotion → SST-2), each with its own label space, so forgetting is meaningful.

Metric naive LoRA Antaḥkaraṇa (this model)
Final accuracy ↑ 0.849 ± .029 0.882 ± .003
Forgetting 0.053 ± .032 0.014 ± .009 (~3.8× less)
Confidence on known domains 0.841 0.954
Known − unknown confidence gap ↑ 0.467 0.494

Live deployment test (this merged model)

  • General language preserved — correct world-knowledge answers (e.g. capital of Japan → Tokyo; a fluent one-sentence definition of photosynthesis).
  • Continual retention: 8/8 correct across all four domains, including the first one learned — no catastrophic forgetting, demonstrated live.
  • pramāṇa abstention — on a factually neutral input (no sentiment to extract), confidence drops to 0.53 and the model abstains rather than guessing; on clear inputs it stays 0.97–0.99 and answers.

Why this is an innovation in today's AI

Most of modern AI is static: a model is trained once, frozen, and shipped. Teaching it something new means expensive retraining — and naive fine-tuning overwrites old knowledge (catastrophic forgetting). The field's strongest continual-learning methods buy stability only by trading away accuracy, or vice-versa.

Antaḥkaraṇa breaks that trade-off. Across a rigorous benchmark vs the standard methods (EWC, LwF, ER, DER++), it is the only method that lands in the "ideal corner" — high accuracy and very low forgetting — matching the SOTA's accuracy while forgetting ~3× less:

Accuracy vs forgetting frontier

That combination is what makes a model genuinely lifelong: it can keep learning in deployment without expensive retraining and without losing what it already knew — while the pramāṇa gate lets it say "I don't know" instead of hallucinating. A static, occasionally-confident model becomes a living, honest one. That is the shift the architecture is reaching for.

Potential — and where it needs to adapt

What this architecture could unlock:

  • Lifelong enterprise models — absorb new products, policies, and data continuously, without retraining the base or forgetting prior knowledge.
  • Trustworthy / high-stakes AI — calibrated abstention (pramāṇa) for medical, legal, and financial settings where "I'm not sure" is safer than a confident guess.
  • Label-efficient & self-learning — learns from unlabeled data (co-training), reaching near-supervised accuracy from as little as ~2% labels — cutting annotation cost dramatically.
  • Personal / on-device AI — a tiny adapter (~160 MB) + external memory personalizes a frozen base to a user, privacy-preserving, with no full retraining.
  • Agentic memory — the śruti (stable core) / smṛti (external memory) / nidrā (sleep-consolidation) design gives agents that accumulate experience over time.

Where it still needs to adapt (honest roadmap):

  • Beyond classification — the LLM evaluation here is classification framed as generation; it needs extension to open-ended instruction-following and longer, more realistic domain streams.
  • Sharper pramāṇa — the abstention gate works but is over-confident on adversarial nonsense; it needs stronger calibration (e.g. conformal / ensemble methods) at scale.
  • Scale & breadth — validated on 4 domains and 7B; next is longer continual streams, established continual-LLM benchmarks, and larger models (13B–70B).
  • Self-learning + memory at LLM scale — co-training and the smṛti/nidrā memory are proven in vision and small setups; integrating them into the LLM continual loop is the next build.
  • Conditional compute — a guṇa-driven mixture-of-experts / early-exit layer (efficiency) is designed but not yet implemented.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("deepakdsoni/antahkarana-7B")
model = AutoModelForCausalLM.from_pretrained(
    "deepakdsoni/antahkarana-7B", dtype=torch.bfloat16, device_map="auto")

prompt = ("Classify the sentiment of this movie review (negative, positive).\n"
          "Text: a heartfelt, beautifully acted triumph.\nAnswer:")
out = model.generate(**tok(prompt, return_tensors="pt").to(model.device),
                     max_new_tokens=4, pad_token_id=tok.eos_token_id)
print(tok.decode(out[0], skip_special_tokens=True))

Requires a GPU for full-precision inference (~15 GB in bf16); 4-bit quantization (bitsandbytes) runs in ~5 GB.


Training details

Base model mistralai/Mistral-7B-v0.1 (frozen)
Adapter LoRA (r=16, α=32) on q/k/v/o_proj; ~13.6M trainable (0.19%)
Method saṃskāra (Fisher Ω + decay) on LoRA · vijñāna-smṛti exemplar replay · pramāṇa confidence gate
Curriculum 4 classification domains in sequence, per-task checkpointing (resumable)
Merge LoRA folded into base via merge_and_unload → standalone full-weights 7B
Precision bfloat16

To continue lifelong-learning (add new domains with saṃskāra protection), use the LoRA adapter + resume workflow rather than this merged checkpoint — merging flattens the LoRA structure.


Limitations & honest notes

  • Continual evaluation is on classification framed as generation (clean, measurable), not open-ended instruction following — a natural next extension.
  • The pramāṇa gate is not perfect: it abstains well on genuinely under-determined input but can still be over-confident on adversarial nonsense; the robust evidence is the calibration AUROC and the in-distribution-vs-unfamiliar confidence gap across many examples.
  • The model inherits the capabilities, biases, and knowledge cutoff of Mistral-7B-v0.1.

License & attribution

Released under the Apache License 2.0. This is a derivative work of Mistral-7B-v0.1 (© Mistral AI, Apache-2.0) — see the NOTICE file. The base 7B weights were used as a frozen foundation and were not trained from scratch. The Antaḥkaraṇa architecture, continual training, and merging are the contribution of the author.

Built on the Upaniṣads, Sāṃkhya, Yoga, Nyāya, and modern ML (PyTorch · Transformers · PEFT).

Citation

@misc{antahkarana7b2026,
  title  = {Antahkarana-7B: Lifelong Learning with a Vedic-Derived Cognitive Architecture},
  author = {Deepak Soni},
  year   = {2026},
  note   = {Built on Mistral-7B-v0.1 (Apache-2.0)},
  url    = {https://huggingface.co/deepakdsoni/antahkarana-7B}
}
Downloads last month
60
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for deepakdsoni/antahkarana-7B

Finetuned
(934)
this model
Quantizations
2 models