LatentBridge: Telepathic Multi-Agent Communication

Disclaimer: This repository is an experimental proof-of-concept created exclusively for personal study and research. It is not intended for production use. The results and capabilities discussed here are hypothetical, and the system has not been rigorously evaluated against standard benchmarks. The code is shared "as is" to explore alternative architectural ideas in neural networks. Claims about improved reasoning or latency are theoretical and lack formal empirical verification.

Introduction: Multi-Agent Systems (MAS)

In modern AI, a Multi-Agent System (MAS) involves multiple LLM instances (agents) working together to solve complex problems. Typically, agents communicate by generating text and reading each other's outputs. For example, an "Analyzer Agent" writes a long chain-of-thought, and a "Speaker Agent" reads that text to produce a final concise answer. While effective, this text-based communication is slow, consumes large amounts of the context window, and forces models to externalize every single thought into tokens.

LatentBridge proposes a radical alternative: what if agents could communicate telepathically?

LatentBridge is a lightweight, standalone PyTorch implementation of Latent Space Communication for Multi-Agent Systems. It allows two instances of an LLM (e.g., Qwen 3.5 4B) to communicate their "thoughts" without generating visible text tokens. Instead, they share their intermediate neural activations directly.

The Architecture: How It Works

Today, if you want an LLM to give a complex answer, you must force it to "think out loud" (Chain-of-Thought). This consumes thousands of visible tokens (the <think> blocks), drastically slows down the response (latency), and saturates the Context Window.

LatentBridge eliminates the need to print text to think: it moves reasoning entirely into the latent (vector) space of the neural network.

The architecture relies on two instances of the same base model, assuming two different roles via System Prompts:

Agent B (The Thinker): Analyzes the problem deeply in the background.
Agent A (The Speaker): Provides the final concise answer.

Here is the step-by-step mechanism of the Parallel Injection:

1. Context Capture (Global Intuition)

Agent B reads the user prompt and performs a forward pass. Instead of making it generate text, we extract its Hidden States (internal neural activations). Specifically, we capture only the final token (H_B[:, -1:, :]). Because of Self-Attention, the last token computed by an LLM contains a dense, compressed, mathematical summary of the entire sentence and reasoning process. It is a "vector of pure intuition."

2. Deep Layer Injection (Hooks)

Instead of merging minds at the very beginning (word embeddings) or at the very end (probabilities), LatentBridge hooks into the deep intermediate layers of the model (e.g., Layers 11, 19, and 27). This is where the neural network processes logical abstraction, complex syntax, and problem-solving.

3. The Safety Translation ($W_{ab}$)

If you were to brutally add Agent B's vector into Agent A, you would destroy the manifold (the stable mathematical space) of the network, causing Agent A to output gibberish. To prevent this, LatentBridge uses a trainable neural projection (an MLP or linear matrix, $W_{ab}$). This network acts as a simultaneous translator: it takes B's intuition and remaps it so that it is mathematically "digestible" by Agent A's LayerNorm.

4. The Dynamic Gate (The Smart Valve)

This is the masterpiece of the system. During generation, Agent A creates one word at a time. For every single word generated, the Dynamic Gate computes an equation: $Gate = \sigma(W_{a}H_A + W_{b}H_B + bias)$ We use a Sigmoid function (yielding a value between 0 and 1) as a "valve" controlling the injection:

Agent A dynamically decides, token by token, if it needs Agent B's help.
If Agent A is writing obvious words ("The", "answer", "is"), the Gate closes near zero to avoid distortion.
If Agent A reaches a crucial logical crossroad, the Gate opens to 1, and the mathematical solution thought by B flows into A's calculations to guide the correct word.

5. Autoregressive Decay

The more Agent A speaks, the more its sentence makes sense, and the less it needs B's initial intuition. A decay_rate gradually lowers the injection token after token, allowing Agent A to finish its sentence independently with perfect grammar.

Quick Start

This repository provides everything you need to test LatentBridge out of the box with zero API servers or complex setups.

Prerequisites

pip install torch transformers accelerate

Note: The scripts are optimized to use Scaled Dot Product Attention (sdpa) and run completely on GPU (.to("cuda")). An 8GB+ GPU is recommended.

1. The Logic Puzzle Test

A logic test designed to attempt verifying if a 4B model might improve its deductive reasoning without writing out explicit Chain-of-Thought text. Agent B processes the prompt in the background, and Agent A attempts to output the final correct box letter based on the latent context.

python test_logic_puzzle.py

2. The Code Architecture Review

An experimental scenario testing large context processing. The script automatically loads up to 25,000 characters of source code from your local directory. Agent B is prompted to act as a Staff Engineer to build an internal representation of the code. Agent A is then prompted to act as a Senior Architect, attempting to translate that internal representation into an architectural summary.

python test_coding_arch.py

(Tip: You can change the target_dir variable in the script to point to any coding project on your computer!)

Technical Details

Base Model: Qwen/Qwen3.5-4B
Training Methodology (Knowledge Distillation): The neural bridge (bridge_weights.pt) was trained by distilling the explicit Chain-of-Thought (CoT) reasoning from Claude Opus 4.6. The MLP projections and dynamic gates learned to map the complex, multi-step logical deductions generated by Opus 4.6 directly into the latent space of the 4B model.
Trainable Parameters: The bridge_weights.pt contains the trained MLP projections and dynamic gates for layers 11, 19, 27.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for massimolauri/LatentBridge-4B

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Finetuned

(333)

this model