Harry00
/

MLE-Morpho-Logic-Engine

Model card Files Files and versions

xet

Community

Harry00 commited on 22 days ago

Commit

51b2aff

verified ·

1 Parent(s): ebaf2ce

docs: comprehensive README with architecture, benchmarks, and examples

Browse files

Files changed (1) hide show

README.md +312 -0

README.md ADDED Viewed

	@@ -0,0 +1,312 @@

+# MLE — Morpho-Logic Engine
+> **A novel energy-based reasoning AI architecture, CPU-native, gradient-free, built on hyperdimensional computing.**
+[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Tests: 7/7](https://img.shields.io/badge/tests-7%2F7%20passing-brightgreen.svg)]()
+```
+███╗   ███╗ ██╗     ███████╗
+████╗ ████║ ██║     ██╔════╝
+██╔████╔██║ ██║     █████╗      Morpho-Logic Engine v0.1.0
+██║╚██╔╝██║ ██║     ██╔══╝      Energy-Based Reasoning AI
+██║ ╚═╝ ██║ ███████╗███████╗
+╚═╝     ╚═╝ ╚══════╝╚══════╝
+```
+---
+## 🧠 What is MLE?
+MLE is a **new class of reasoning engine** that replaces neural network backpropagation with energy-based dynamics operating on hyperdimensional binary vectors. It draws from:
+- **Kanerva's Sparse Distributed Memory** — memory indexed by proximity in Hamming space
+- **Holographic Reduced Representations** (Plate 1995) — circular convolution for semantic binding
+- **Modern Hopfield Networks** (Ramsauer et al. 2020) — energy-based pattern completion
+- **Binary Spatter Codes** — ultra-fast XOR binding for CPU-native computation
+The result is a system that can reason about concepts, solve analogies, compose meanings, and traverse knowledge graphs — all **without GPU, without gradients, without training** — using pure bitwise operations optimized for CPU SIMD instructions.
+---
+## 🏗️ Architecture
+```
+┌─────────────┐    ┌──────────────┐    ┌──────────────┐    ┌─────────────┐
+│   Query     │───▶│   Routing    │───▶│   Binding    │───▶│   Energy    │
+│  Encoder    │    │  (JIT Beam)  │    │  (Compose)   │    │  (Relax)    │
+│             │    │  Top-500     │    │  XOR / FFT   │    │  Hopfield + │
+│  str→4096b  │    │  LSH+Expand  │    │  Conv Circ.  │    │  Bit-flip   │
+└─────────────┘    └──────────────┘    └──────────────┘    └─────────────┘
+       │                                                          │
+       │           ┌──────────────┐    ┌──────────────┐          │
+       └───────────│   Response   │◀───│   Decode     │◀─────────┘
+                   │              │    │   NN + Role   │
+                   └──────────────┘    └──────────────┘
+```
+### 5 Modules
+| Module | File | Description |
+|--------|------|-------------|
+| **`memory`** | `sparse_address_table.py` | Sparse Address Table: 4096-bit binary vectors, SoA layout, LSH index (32 tables × 8-bit signatures), multi-probe search, activation tracking |
+| **`routing`** | `recursive_jit_router.py` | Recursive JIT Routing: LSH init → beam-500 refinement → neighbor expansion → convergence check. Multi-hop chaining for chain-of-thought |
+| **`binding`** | `semantic_binding.py` | Dual binding: **Binary (XOR)** O(N/64) exact recovery + **HRR (FFT)** O(N log N) approximate. Triple encoding, analogy queries, sequence binding via permutation |
+| **`energy`** | `energy_model.py` | Composite energy: compatibility + binding coherence + sparsity + smoothness. **Hopfield** (continuous, attention-based) + **Binary relaxation** (simulated annealing). Hybrid mode for best results |
+| **`inference`** | `reasoning_engine.py` | Full pipeline: encode → route → bind → relax → decode. Association, analogy, composition, structured queries. Multi-step reasoning with convergence detection |
+### SIMD-Optimized Core (`utils/simd_ops.py`)
+All core operations are backed by a **GCC-compiled C library** with `-march=native` for automatic SIMD vectorization:
+```c
+// Compiles to AVX-512 VPOPCNTQ or AVX2 POPCNT automatically
+int hamming_single(const uint64_t *a, const uint64_t *b, int n) {
+    int cnt = 0;
+    for (int i = 0; i < n; i++)
+        cnt += __builtin_popcountll(a[i] ^ b[i]);
+    return cnt;
+}
+```
+| Operation | Throughput | Notes |
+|-----------|-----------|-------|
+| Hamming distance (single) | ~100M ops/s | 64 × `POPCNT` per pair |
+| Hamming batch (100K vectors) | 25-41M vecs/s | Vectorized XOR + popcount |
+| Top-500 selection | O(N log K) | Max-heap in C |
+| Binary bind (XOR) | ~95K ops/s | 64 × `XOR` per op |
+| HRR bind (FFT) | ~10K ops/s | `numpy.fft.rfft` |
+**Fallback**: Pure NumPy LUT-based popcount when GCC isn't available — portable across all platforms.
+---
+## 📐 Core Concepts
+### 4096-bit Binary Vectors
+Every concept, relation, and memory address is a **4096-bit binary vector** stored as 64 × `uint64` words (512 bytes):
+```python
+# Each vector: 4096 bits = 512 bytes = 64 × uint64
+# Storage layout: Structure of Arrays (SoA) for cache locality
+# addresses: (N, 64) uint64 — contiguous, cache-aligned
+# contents:  (N, 64) uint64 — separate for SIMD batch ops
+```
+**Key property**: Random vectors have Hamming distance ≈ 2048 (50%). Semantic similarity is encoded as deviation from this baseline. Vectors with distance << 2048 are "similar"; vectors at ~2048 are orthogonal.
+### Sparse Address Table
+Memory entries are indexed by binary vectors. Access uses **Hamming distance** as the proximity metric:
+```python
+from mle import SparseAddressTable
+sat = SparseAddressTable(capacity=100_000)
+sat.store(address_vec, content_vec, metadata={'name': 'cat'})
+results = sat.query_nearest(query_vec, k=10)  # [(index, distance), ...]
+```
+**LSH Index**: 32 hash tables with 8-bit random-bit-sampling signatures. Multi-probe search (1-bit and 2-bit flips) for high recall. Sub-linear search time for large memories.
+### Binding Operations
+**Binary (XOR)**: `bind(A, B) = A ⊕ B`. Self-inverse (exact recovery), quasi-orthogonal to inputs, O(N/64).
+**HRR (FFT)**: `bind(A, B) = IFFT(FFT(A) · FFT(B))`. Circular convolution, approximate recovery, similarity-preserving, O(N log N).
+```python
+from mle.binding import BinaryBinding
+# Encode: "king IS_A man"
+triple = BinaryBinding.encode_triple(king_vec, is_a_vec, man_vec)
+# Decode: recover man from triple
+decoded = BinaryBinding.unbind(BinaryBinding.unbind(triple, king_vec), is_a_vec)
+# decoded == man_vec (exact with XOR!)
+# Analogy: king:man :: queen:?
+query = BinaryBinding.create_analogy_query(king_vec, man_vec, queen_vec)
+# query ≈ woman_vec (find nearest in codebook)
+```
+### Energy-Based Reasoning
+**No backpropagation. No gradients stored.** Reasoning is energy minimization:
+```
+E(state) = α·E_compat + β·E_binding + γ·E_sparse + δ·E_smooth
+```
+| Component | Formula | Purpose |
+|-----------|---------|---------|
+| Compatibility | -Σ wᵢ · sim(state, contextᵢ) | State agrees with activated memories |
+| Binding coherence | Σ hamming(unbind(bᵢ, rᵢ), fᵢ) / N | Stored relations remain intact |
+| Sparsity | ‖activations‖₁ | Focused, not diffuse, activation |
+| Smoothness | hamming(current, previous) / N | Stable reasoning trajectory |
+**Two-phase minimization**:
+1. **Hopfield update**: `ξ_new = X @ softmax(β · X^T @ ξ)` — fast coarse convergence via attention over patterns
+2. **Binary relaxation**: bit-flip search with simulated annealing — fine discrete refinement
+---
+## 🚀 Quick Start
+```python
+from mle import MorphoLogicEngine
+# Initialize
+engine = MorphoLogicEngine(beam_width=500, energy_mode='hybrid')
+# Build knowledge
+engine.add_concept("cat")
+engine.add_concept("dog")
+engine.add_concept("animal")
+engine.add_relation("cat", "is_a", "animal")
+engine.add_relation("dog", "is_a", "animal")
+# Reason
+result = engine.reason("cat", max_steps=3)
+print(result['response']['nearest_concepts'])
+# → [('cat', 0.99), ('animal', 0.75), ...]
+# Associations
+assocs = engine.associate("cat", top_k=5)
+# → [('cat_is_a_animal', 0.74), ('dog', 0.52), ...]
+# Analogy: king:man :: queen:?
+analogy = engine.solve_analogy("king", "man", "queen")
+print(analogy['codebook_ranking'][:3])
+# Composition: water + animal → ?
+comp = engine.compose("water", "animal")
+print(comp['response']['nearest_concepts'][:3])
+```
+---
+## 📊 Benchmarks
+Measured on a 2-vCPU machine (cpu-basic), single-threaded:
+### SIMD Throughput
+| Corpus Size | Batch Hamming | Top-500 |
+|-------------|--------------|---------|
+| 1,000 | 0.04ms (28M/s) | 0.06ms |
+| 10,000 | 0.29ms (35M/s) | 0.32ms |
+| 100,000 | 4.56ms (22M/s) | 4.79ms |
+### Routing Latency
+| Memory Size | Avg Latency | P99 | Candidates |
+|-------------|-------------|-----|------------|
+| 1,000 | 3.8ms | 5.4ms | 953 |
+| 10,000 | 2.5ms | 3.2ms | 3,335 |
+| 50,000 | 2.7ms | 3.5ms | 2,679 |
+### Memory Efficiency
+- **1,024 bytes/entry** (512 address + 512 content)
+- **1,000 entries = 1 MB**
+- **100,000 entries = 100 MB**
+### Binding Performance
+- Binary (XOR): **95,000 ops/sec**
+- HRR (FFT): **10,500 ops/sec**
+---
+## 🧪 Tests
+```bash
+pip install numpy scipy
+python -m mle.tests.test_full_system
+```
+**7/7 test groups passing:**
+- ✅ SIMD Operations (correctness + performance)
+- ✅ Memory & LSH (storage, retrieval, 100% cluster recall)
+- ✅ Routing (beam width, convergence, scalability)
+- ✅ Binding (XOR exact recovery, HRR approximate recovery, triple encoding)
+- ✅ Energy Convergence (monotonic decrease, Hopfield attention concentration)
+- ✅ Reasoning (association, query, analogy, composition, structured queries)
+- ✅ Integration (500+ concept KB, batch queries, memory efficiency)
+---
+## 🎯 Demo
+```bash
+python -m mle.demo
+```
+Runs a full demonstration with 40+ concepts, 42 relations, and tests for concept queries, associations, analogies, compositions, structured queries, and multi-step reasoning.
+---
+## 📁 Project Structure
+```
+mle/
+├── __init__.py                    # Package init, public API
+├── demo.py                        # Interactive demonstration
+├── utils/
+│   ├── __init__.py
+│   └── simd_ops.py               # SIMD C library + NumPy fallback
+├── memory/
+│   ├── __init__.py
+│   └── sparse_address_table.py   # SparseAddressTable + HammingLSH
+├── routing/
+│   ├── __init__.py
+│   └── recursive_jit_router.py   # RecursiveJITRouter
+├── binding/
+│   ├── __init__.py
+│   └── semantic_binding.py       # HRRBinding + BinaryBinding + BindingEngine
+├── energy/
+│   ├── __init__.py
+│   └── energy_model.py           # EnergyFunction + Relaxation + Hopfield
+├── inference/
+│   ├── __init__.py
+│   └── reasoning_engine.py       # ReasoningEngine (full pipeline)
+└── tests/
+    ├── __init__.py
+    └── test_full_system.py       # Comprehensive test suite
+```
+---
+## 🔬 Theoretical Foundations
+| Paper | Contribution to MLE |
+|-------|-------------------|
+| Kanerva (1988) "Sparse Distributed Memory" | Binary vector addressing, Hamming distance proximity |
+| Plate (1995) "Holographic Reduced Representations" | Circular convolution binding, FFT implementation |
+| Gayler (2003) "Vector Symbolic Architectures" | XOR binding (BSC), majority-vote bundling |
+| Ramsauer et al. (2020) "Hopfield Networks Is All You Need" | Modern Hopfield energy, exponential capacity, attention ≡ update rule |
+| Frady et al. (2021) "SDM and Transformers" | SDM Hamming threshold ≈ transformer attention |
+| Thomas et al. (2023) "Efficient HDC with Static Optimization" | Optimal BSC dimensions, analytical thresholds |
+| Langford et al. (2024) "Linear Codes for HDC" | GF(2) factorization, 100% XOR recovery |
+---
+## 🛤️ Roadmap
+- [ ] **Persistent storage**: Serialize memory to disk (mmap for instant loading)
+- [ ] **Learned embeddings**: Pre-encode concepts from text corpora (word2vec → binary projection)
+- [ ] **Multi-threaded SIMD**: Parallel batch Hamming with OpenMP
+- [ ] **Graph walk reasoning**: Follow relation chains for multi-hop inference
+- [ ] **Incremental learning**: Hebbian-style weight updates from experience
+- [ ] **Benchmark suite**: Standardized reasoning tasks (bAbI, CLUTRR, etc.)
+---
+## 📜 License
+MIT
+---
+## 🙏 Acknowledgments
+Inspired by the vision of frugal, explainable AI that reasons rather than retrieves. Built on decades of research in hyperdimensional computing, energy-based models, and sparse distributed memory.