Neuromorphic Wikipedia Domain Classifier

This repository hosts the pre-trained vocabulary and synaptic weights for NeuronGuard, a proof of concept that uses cache-aligned Spiking Neural Network (SNN) and neuromorphic event engine written in Rust and exposed to Python.

Note: This was a proof of concept to see if a event driven like neuron inspired neural model could work.

The model is trained on the 44.4 GB 'wikimedia/structured-wikipedia' dataset (10.4M articles total) to classify articles into 5 high-level domains:

  • 0: Science & Technology
  • 1: Geography & Places
  • 2: Biography & People
  • 3: History & Events
  • 4: Arts & Culture

Using a streaming dataset pipeline (zero disk overhead) and community-curated Infobox template names for labeling, the model was trained on exactly 1,000,000 valid articles (samples) and achieved 93.14% accuracy on a 10,000-article test split, training in 15.85 seconds using local pre-filtered data on a standard Apple Silicon CPU.


Performance Metrics (Standard Apple Silicon CPU)

  • Training Set Size: 1,000,000 valid articles (samples)
  • Model Size: 320 KB (synaptic weights and vocabulary file)
  • Training Time: 15.85 seconds (using local pre-filtered data)
  • Inference Latency: Microseconds (~5–15 µs per article)
  • Memory Footprint: < 50 MB
  • Accuracy: 93.14% (evaluated on a 10,000-article test split)

Comparison: NeuronGuard vs. PyTorch (1,000,000 Samples)

To validate the efficiency of NeuronGuard's neuromorphic architecture, we ran a head-to-head comparison against a standard PyTorch Multi-Layer Perceptron (MLP) on the full Wikipedia dataset (1,000,000 training samples, 10,000 test samples, and a 10,000-word vocabulary).

Head-to-Head Results (Apple Silicon M-Series CPU/GPU)

Metric NeuronGuard (SNN) PyTorch (MLP, 1 Epoch) PyTorch (MLP, 10 Epochs) Trade-Off / Insight
Hardware Used Single CPU Core Apple Silicon MPS GPU Apple Silicon MPS GPU NeuronGuard achieves line-rate speed without needing GPU acceleration.
Training Time 15.85 seconds 16.56 seconds 163.84 seconds NeuronGuard is 10.3x faster than PyTorch (10 epochs) on a single CPU core.
Data Loading Overhead 0.00 seconds (on-the-fly) 16.17 seconds 16.17 seconds NeuronGuard trains directly on the stream, bypassing memory loading overhead.
Total Pipeline Time 15.85 seconds 32.73 seconds 180.01 seconds NeuronGuard is 2x to 11.3x faster overall from cold start to fully trained.
Model Size ~320 KB 2.44 MB 2.44 MB NeuronGuard's model size is 7.6x smaller, making it ideal for edge devices.
Overall Accuracy 93.14% 97.98% 98.15% PyTorch's global backpropagation yields slightly higher peak accuracy, but NeuronGuard is within 5.0%.

Because NeuronGuard trains on the fly via Hebbian plasticity, it completely bypasses the massive dataset loading and memory overhead required by traditional deep learning frameworks.

Architectural Trade-Offs

  • NeuronGuard (Neuromorphic SNN):
    • Pros: Instant training (single-pass online learning), ultra-low memory footprint (< 350 KB), runs entirely on cheap CPU hardware, zero heavy deep learning dependencies, and accuracy within 5% of deep learning models.
    • Cons: Slightly lower peak accuracy due to the lack of iterative global optimization (backpropagation).
  • PyTorch (Traditional Deep Learning):
    • Pros: High peak accuracy (98%+) due to multi-pass gradient descent and non-linear optimization.
    • Cons: Requires dedicated GPU acceleration for fast training, larger model files, and significantly higher memory and dependency overhead.

Technology Overview

NeuronGuard operates on a hardware-conscious, matrix-free neuromorphic design. Rather than relying on traditional deep learning architectures (like Transformers or Feedforward networks), it implements the following core technologies:

  • Spiking Neural Network (SNN) Core: Models neural computation using discrete event spikes rather than continuous floating-point activations. Stimuli are processed as temporal events that propagate through synaptic pathways.
  • Hebbian-Style Plasticity: Synaptic weights are updated on the fly using simple transactional increments and decrements based on co-activation, completely bypassing backpropagation and gradient storage.
  • Cache-Aligned Memory Layout: All neural structures are spatially aligned to exactly 64-byte boundaries (matching standard CPU cache lines). This maximizes L1/L2 cache hit rates, prevents cache thrashing, and eliminates false sharing during parallel execution.
  • GIL-Free Parallelism: Drops the Python Global Interpreter Lock (GIL) during stream processing to execute concurrent, lock-free evaluations across background worker threads.
  • Guard/Lease Transactional Pattern: Implements transactional, lock-free leases on specific memory addresses using atomic compare-and-swap (CAS) operations for safe concurrent weight mutations.
  • Flat, Pointerless Serialization: Synaptic weights are stored as flat, contiguous binary arrays, enabling sub-millisecond serialization and deserialization directly to and from disk.

How to Load and Use in Python

To load and run inference using these pre-trained weights, make sure you have neuronguard installed:

pip install neuronguard

Then, run the following Python script:

import base64
import os
import re
import neuronguard as ng

DOMAINS = [
    "Science & Technology",
    "Geography & Places",
    "Biography & People",
    "History & Events",
    "Arts & Culture",
]

def tokenize(text):
    return re.findall(r"\b\w+\b", text.lower())

def load_weights_from_txt(field, path):
    with open(path, "r", encoding="utf-8") as f_txt:
        b64_data = f_txt.read().encode("utf-8")
    data = base64.b64decode(b64_data)
    temp_bin = path + ".tmp"
    with open(temp_bin, "wb") as f_bin:
        f_bin.write(data)
    field.load_weights(temp_bin)
    if os.path.exists(temp_bin):
        os.remove(temp_bin)

# Note: I'm using `.txt` files for 
# weights and vocab as Huggingface 
# does not allow `.bin` files.

# Load vocabulary
vocab_map = {}
vocab_list = []
with open("wikipedia_vocab.txt", "r", encoding="utf-8") as f:
    for idx, line in enumerate(f):
        word = line.strip()
        vocab_map[word] = idx
        vocab_list.append(word)

# Initialize and load NeuronGuardField
vocab_size = len(vocab_list)
num_experts = len(DOMAINS)
field = ng.NeuronGuardField(sensory_count=vocab_size, motor_count=num_experts)
load_weights_from_txt(field, "wikipedia_weights.txt")

# Run Inference
text = "The new software update introduces a high-performance compiler and optimized memory allocation algorithms for quantum computing processors."
tokens = tokenize(text)
word_indices = [vocab_map[t] for t in tokens if t in vocab_map]

field.reset_potentials()
if word_indices:
    field.process_stream_sync(word_indices)

potentials = field.get_potentials()
predicted_idx = potentials.index(max(potentials))
print(f"Routed Domain: {DOMAINS[predicted_idx]}")

Open Source & Community

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train etoxin/neuronguard-wikipedia-classifier