NCA 3D Brain β€” Neural Cellular Automata for Language

A radically different neural architecture: a 3D grid of "mini-neurons" that learns language through local wave propagation, inspired by protein folding and biological neural communication.

Wave propagation through the 3D brain
Information propagating as waves through the 3D grid β€” from input (z=0) to output (z=15)

What is this?

A cube of 4,096 "mini-neurons" (16Γ—16Γ—16 grid) where each cell only communicates with its immediate neighbors. Information travels as waves through the cube. No attention mechanism (unlike Transformers), no sequential layers β€” it's a mathematical organism where language understanding emerges from local signal propagation.

35.4M parameters. Generates coherent English phrases of 6+ words.

This is not a Transformer. This is not an RNN. This is a 3D cellular automaton that learned to speak.

Key Results

Metric Value
Eval word accuracy 10.7% (over 30K vocabulary)
Train word accuracy 16.9%
Train/eval gap 1.6x (best generalization of all versions)
Longest coherent output "she started to play together again" (6 words)
Parameters 35.4M
Model size 68 MB
Grid 16Γ—16Γ—16 = 4,096 cells
Cell dimension 256
Training 16 epochs, ~10.7h on NVIDIA B200

Grammar emergence

The model learned grammatical categories without explicit supervision:

"the" + _        β†’ nouns/adjectives (correct category)
"the big" + _    β†’ nouns ("dog", "house", "girl")
"the dog" + _    β†’ verbs ("was", "ran", "had")
"she wanted" + _ β†’ "to" (infinitive structure)

Best generations

"she started to play together again"
"the little girl wanted to play with her parents"
"he said that he was very happy"
"in the morning she went to the garden"

Architecture

INPUT (face z=0)          THINKING (interior)         OUTPUT (face z=15)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Tokens are  β”‚   β†’β†’β†’    β”‚ 3D waves    β”‚    β†’β†’β†’      β”‚ Prediction  β”‚
β”‚ injected    β”‚  waves   β”‚ propagate   β”‚   waves     β”‚ is read     β”‚
β”‚ into skin   β”‚          β”‚ N steps     β”‚             β”‚ from pole   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

How it works

  1. Injection: Input tokens are embedded and injected into the z=0 face of the cube
  2. Propagation: For N steps, each cell updates based on its 26 neighbors via Conv3d
  3. Reading: The opposite face (z=15) is average-pooled and projected to vocabulary logits

Key innovations

  • Dilated convolutions with cycle [1, 2, 4, 8] β€” in 4 steps each cell "sees" the entire grid
  • Synaptic fatigue: 2 dedicated channels that inhibit over-firing cells, preventing repetition
  • Dual chemical pathways: Standard Conv3d + depthwise-separable Conv3d (diversity in transition rules)
  • Gated residual updates: Each cell decides how much to change per step via sigmoid gate

Emergent phenomena

3D brain activation map
3D brain map showing where "good" sentences activate more (red) vs "bad" sentences (blue)

  • Functional hemispheres: x=12 region produces better language than x=6
  • 3 thinking phases: chaos (steps 1-5) β†’ eureka (6-7) β†’ decision (8-15)
  • Grammar in center, semantics in periphery of the grid
  • Semantic clustering: animals, family, nature, objects form distinct spatial clusters
  • Emotion highway: emotional content activates a specific depth layer (z=12)

Grammar vs Semantic channels
Channel specialization: grammar channels (red) vs semantic channels (blue) β€” the model spontaneously separated syntax from meaning

Quick Start

import torch
import torch.nn.functional as F
import json
from model import NCA3D_Fatigue

# Load dictionary
word2num = {k: int(v) for k, v in json.load(open("word_dictionary_30k.json")).items()}
num2word = {v: k for k, v in word2num.items()}

# Load model
model = NCA3D_Fatigue()
model.load_state_dict(torch.load("model_phase4c_v5_fatigue_best.pt", map_location="cpu"))
model.eval()

# Predict next word
context = ["the", "little", "girl"]
ids = [word2num[w] for w in context]
with torch.no_grad():
    logits = model(torch.tensor([ids]), n_steps=15)
    pred_id = logits.argmax(-1).item()
    print(f"'{' '.join(context)}' β†’ '{num2word.get(pred_id, '?')}'")

# Generate a sequence
from inference import generate
print(generate(model, word2num, num2word, ["she", "wanted", "to"], max_words=8))

Model Architecture Details

Component                    Shape                          Params
────────────────────────────────────────────────────────────────────
word_embed                   Embedding(30006, 384)          11.5M
embed_proj                   Linear(384, 256)               98K
pos_embed                    Embedding(52, 256)             13K
init_state                   (1, 256, 16, 16, 16)           1.05M
trans1.conv1 (dilated)       Conv3d(256β†’512, k=3Β³)          3.5M
trans1.conv2 (dilated)       Conv3d(512β†’256, k=3Β³)          3.5M
trans2.dw_conv (dilated)     Conv3d(256β†’256, k=3Β³, groups)  6.9K
trans2.pw_conv               Conv3d(256β†’256, k=1)           65K
gate_conv                    Conv3d(256β†’256, k=1)           65K
norm (GroupNorm)              32 groups, 256 ch              512
out_proj                     256β†’512β†’30006                  15.6M
────────────────────────────────────────────────────────────────────
TOTAL                                                       ~35.4M

Training Details

  • Base model: Continued from v4 (dilated Conv3d + 30K vocab)
  • Datasets: WikiText-103, TinyStories, BookCorpus, IMDB, ROCStories, CNN/DailyMail, TriviaQA, Natural Questions, ELI5 (10 datasets total)
  • Schedule: 3 phases β€” 8epΓ—750K (aggressive) + 5epΓ—500K (consolidation) + 3epΓ—350K (refinement)
  • Learning rates: 5e-4β†’3e-4 | 2e-4β†’1e-4 | 8e-5β†’3e-5
  • Hardware: NVIDIA B200, 178GB VRAM peak
  • Training time: ~10.7 hours
  • Loss: Cross-entropy on 30K word vocabulary, multi-step loss (steps 7-16)

Project History

This model is the result of an extensive research journey:

Phase What Result
1 Arithmetic (8Β³ grid, 499K params) 98.4% on unseen data
2A 15 semantic relations 98.2% test, 87.5% generalization
2B 100 semantic relations 73.4% test (85.5% without "similar")
2B-v3 184 relations (grammar + semantics) 93.5% overall
3B Q&A from relations 85% direct, 75% novel
3C Transitive reasoning 52.5% holdout, 83.3% novel chains
4 Language as arithmetic 50.2% char accuracy, grammar emerges
4B Multi-step loss 55.4% char accuracy
4C-v1β†’v4 Word embeddings, dilated conv, 30K vocab Incremental improvements
4C-v5 Synaptic fatigue + intensive training 10.7% eval, 6+ word coherence

113 documented discoveries across all phases.

Why this matters

Transformers dominate NLP, but they have fundamental constraints:

  • O(nΒ²) attention complexity
  • Fixed depth (always N layers regardless of problem difficulty)
  • No spatial locality between neurons
  • Billions of parameters required

NCA 3D Brain shows that local communication + iterative propagation can produce language-like behavior with:

  • O(n) complexity (each cell only sees neighbors)
  • Variable thinking depth (more steps = more reasoning)
  • Spatial structure with emergent functional zones
  • Orders of magnitude fewer parameters

This is early-stage research. The model doesn't compete with Transformers on quality β€” but it demonstrates that a fundamentally different computational paradigm can learn language structure.

Limitations

  • Accuracy is low compared to any Transformer (10.7% next-word prediction on 30K vocab)
  • Autoregressive generation accumulates errors β€” quality degrades after 6-8 words
  • Embeddings are partially disorganized (Zipf's law β€” rare words get few updates)
  • No extrapolation to longer contexts than trained on
  • CPU inference only (no optimized CUDA kernels)

Citation

@misc{quintela2026nca3d,
  title={NCA 3D Brain: Neural Cellular Automata for Language Processing},
  author={Cristian Quintela},
  year={2026},
  url={https://huggingface.co/killking69/nca3d-brain-v5}
}

Author

Cristian Quintela

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results

  • Eval Word Accuracy (30K vocab) on Multi-dataset (TinyStories, WikiText, BookCorpus, IMDB, Q&A)
    self-reported
    10.700
  • Train Word Accuracy (30K vocab) on Multi-dataset (TinyStories, WikiText, BookCorpus, IMDB, Q&A)
    self-reported
    16.900