Neuromorphic Wikipedia Domain Classifier
This repository hosts the pre-trained vocabulary and synaptic weights for NeuronGuard, a proof of concept that uses cache-aligned Spiking Neural Network (SNN) and neuromorphic event engine written in Rust and exposed to Python.
Note: This was a proof of concept to see if a event driven like neuron inspired neural model could work.
The model is trained on the 44.4 GB 'wikimedia/structured-wikipedia' dataset (10.4M articles total) to classify articles into 5 high-level domains:
- 0: Science & Technology
- 1: Geography & Places
- 2: Biography & People
- 3: History & Events
- 4: Arts & Culture
Using a streaming dataset pipeline (zero disk overhead) and community-curated Infobox template names for labeling, the model was trained on exactly 1,000,000 valid articles (samples) and achieved 93.14% accuracy on a 10,000-article test split, training in 15.85 seconds using local pre-filtered data on a standard Apple Silicon CPU.
Performance Metrics (Standard Apple Silicon CPU)
- Training Set Size: 1,000,000 valid articles (samples)
- Model Size: 320 KB (synaptic weights and vocabulary file)
- Training Time: 15.85 seconds (using local pre-filtered data)
- Inference Latency: Microseconds (~5–15 µs per article)
- Memory Footprint: < 50 MB
- Accuracy: 93.14% (evaluated on a 10,000-article test split)
Comparison: NeuronGuard vs. PyTorch (1,000,000 Samples)
To validate the efficiency of NeuronGuard's neuromorphic architecture, we ran a head-to-head comparison against a standard PyTorch Multi-Layer Perceptron (MLP) on the full Wikipedia dataset (1,000,000 training samples, 10,000 test samples, and a 10,000-word vocabulary).
Head-to-Head Results (Apple Silicon M-Series CPU/GPU)
| Metric | NeuronGuard (SNN) | PyTorch (MLP, 1 Epoch) | PyTorch (MLP, 10 Epochs) | Trade-Off / Insight |
|---|---|---|---|---|
| Hardware Used | Single CPU Core | Apple Silicon MPS GPU | Apple Silicon MPS GPU | NeuronGuard achieves line-rate speed without needing GPU acceleration. |
| Training Time | 15.85 seconds | 16.56 seconds | 163.84 seconds | NeuronGuard is 10.3x faster than PyTorch (10 epochs) on a single CPU core. |
| Data Loading Overhead | 0.00 seconds (on-the-fly) | 16.17 seconds | 16.17 seconds | NeuronGuard trains directly on the stream, bypassing memory loading overhead. |
| Total Pipeline Time | 15.85 seconds | 32.73 seconds | 180.01 seconds | NeuronGuard is 2x to 11.3x faster overall from cold start to fully trained. |
| Model Size | ~320 KB | 2.44 MB | 2.44 MB | NeuronGuard's model size is 7.6x smaller, making it ideal for edge devices. |
| Overall Accuracy | 93.14% | 97.98% | 98.15% | PyTorch's global backpropagation yields slightly higher peak accuracy, but NeuronGuard is within 5.0%. |
Because NeuronGuard trains on the fly via Hebbian plasticity, it completely bypasses the massive dataset loading and memory overhead required by traditional deep learning frameworks.
Architectural Trade-Offs
- NeuronGuard (Neuromorphic SNN):
- Pros: Instant training (single-pass online learning), ultra-low memory footprint (< 350 KB), runs entirely on cheap CPU hardware, zero heavy deep learning dependencies, and accuracy within 5% of deep learning models.
- Cons: Slightly lower peak accuracy due to the lack of iterative global optimization (backpropagation).
- PyTorch (Traditional Deep Learning):
- Pros: High peak accuracy (98%+) due to multi-pass gradient descent and non-linear optimization.
- Cons: Requires dedicated GPU acceleration for fast training, larger model files, and significantly higher memory and dependency overhead.
Technology Overview
NeuronGuard operates on a hardware-conscious, matrix-free neuromorphic design. Rather than relying on traditional deep learning architectures (like Transformers or Feedforward networks), it implements the following core technologies:
- Spiking Neural Network (SNN) Core: Models neural computation using discrete event spikes rather than continuous floating-point activations. Stimuli are processed as temporal events that propagate through synaptic pathways.
- Hebbian-Style Plasticity: Synaptic weights are updated on the fly using simple transactional increments and decrements based on co-activation, completely bypassing backpropagation and gradient storage.
- Cache-Aligned Memory Layout: All neural structures are spatially aligned to exactly 64-byte boundaries (matching standard CPU cache lines). This maximizes L1/L2 cache hit rates, prevents cache thrashing, and eliminates false sharing during parallel execution.
- GIL-Free Parallelism: Drops the Python Global Interpreter Lock (GIL) during stream processing to execute concurrent, lock-free evaluations across background worker threads.
- Guard/Lease Transactional Pattern: Implements transactional, lock-free leases on specific memory addresses using atomic compare-and-swap (CAS) operations for safe concurrent weight mutations.
- Flat, Pointerless Serialization: Synaptic weights are stored as flat, contiguous binary arrays, enabling sub-millisecond serialization and deserialization directly to and from disk.
How to Load and Use in Python
To load and run inference using these pre-trained weights, make sure you have neuronguard installed:
pip install neuronguard
Then, run the following Python script:
import base64
import os
import re
import neuronguard as ng
DOMAINS = [
"Science & Technology",
"Geography & Places",
"Biography & People",
"History & Events",
"Arts & Culture",
]
def tokenize(text):
return re.findall(r"\b\w+\b", text.lower())
def load_weights_from_txt(field, path):
with open(path, "r", encoding="utf-8") as f_txt:
b64_data = f_txt.read().encode("utf-8")
data = base64.b64decode(b64_data)
temp_bin = path + ".tmp"
with open(temp_bin, "wb") as f_bin:
f_bin.write(data)
field.load_weights(temp_bin)
if os.path.exists(temp_bin):
os.remove(temp_bin)
# Note: I'm using `.txt` files for
# weights and vocab as Huggingface
# does not allow `.bin` files.
# Load vocabulary
vocab_map = {}
vocab_list = []
with open("wikipedia_vocab.txt", "r", encoding="utf-8") as f:
for idx, line in enumerate(f):
word = line.strip()
vocab_map[word] = idx
vocab_list.append(word)
# Initialize and load NeuronGuardField
vocab_size = len(vocab_list)
num_experts = len(DOMAINS)
field = ng.NeuronGuardField(sensory_count=vocab_size, motor_count=num_experts)
load_weights_from_txt(field, "wikipedia_weights.txt")
# Run Inference
text = "The new software update introduces a high-performance compiler and optimized memory allocation algorithms for quantum computing processors."
tokens = tokenize(text)
word_indices = [vocab_map[t] for t in tokens if t in vocab_map]
field.reset_potentials()
if word_indices:
field.process_stream_sync(word_indices)
potentials = field.get_potentials()
predicted_idx = potentials.index(max(potentials))
print(f"Routed Domain: {DOMAINS[predicted_idx]}")
Open Source & Community
- GitHub Repository: etoxin/NeuronGuard
- PyPI Package: pip install neuronguard