Codette-Reasoning / PHASE2_SUMMARY.md
Raiff1982's picture
Upload 78 files
d574a3d verified

Phase 2 Implementation Summary

Status: COMPLETE ✓

All Phase 2 components have been successfully implemented, integrated, and validated.


What Was Built

1. MemoryWeighting Engine (reasoning_forge/memory_weighting.py)

  • Purpose: Score adapter performance and weight future adapter selection based on historical memory

  • Key Components:

    • AdapterWeight dataclass: Tracks adapter metrics (coherence, conflict success, recency, composite weight)
    • MemoryWeighting class: Main engine for weight computation and selection
  • Key Features:

    • compute_weights(): Aggregates memory cocoons per adapter, computes composite weights [0, 2.0]
      • Base coherence contribution: ±0.5 (mean coherence from past uses)
      • Conflict success contribution: ±0.3 (% of "tension" memories with coherence > 0.7)
      • Recency contribution: ±0.2 (exponential decay with ~7 day half-life)
    • select_primary(): Choose best adapter for specific conflict context
    • get_boosted_confidence(): Modulate router confidence based on weight (soft boost: -50% to +50%)
    • explain_weight(): Expose weight breakdown for debugging/transparency
    • get_all_weights(): Export full weighting state
  • Output: Weight scores [0, 2.0] where:

    • 0.5 = Poor adapter (suppress by 50%)
    • 1.0 = Average adapter (neutral)
    • 2.0 = Excellent adapter (boost by 100%)

2. TokenConfidenceEngine Enhancement (reasoning_forge/token_confidence.py)

  • Phase 2 Upgrade: Wired living_memory into learning signal computation
  • Enhanced _compute_learning_signal() method:
    • Now queries memory for past responses by agent
    • Weights recent memories higher (exponential decay with 168-hour half-life)
    • Computes weighted average of historical coherence
    • Signal ranges [0.5, 1.0] based on past performance
  • Impact: 4th confidence signal (learning signal) now accesses actual historical data instead of neutral fallback

3. ForgeEngine Integration (reasoning_forge/forge_engine.py)

  • Modified __init__() (lines 52-88):
    • Now accepts living_memory parameter (defaults to None for backward compat)
    • Accepts enable_memory_weighting parameter (defaults to True)
    • Passes living_memory to TokenConfidenceEngine
    • Initializes MemoryWeighting if memory provided
  • Enhanced forge_with_debate() (lines 294-313):
    • After Round 0 conflict detection, stores top 5 conflicts in memory
    • Stores resolution outcomes for later analysis
    • Creates resolution_outcome dict with conflict metadata
  • Backward Compatible: ForgeEngine works without memory (memory_weighting=None, token_confidence learning signal =0.5)

4. Conflict → Adapter Learning Bridge

  • Data Flow:
    Debate with Conflict Detection
           ↓
    Conflicts stored in LivingMemoryKernel
           ↓
    MemoryCocoon with:
      - agent_pair (e.g., "Newton,Quantum")
      - conflict_type (contradiction/emphasis/framework)
      - coherence outcome
      - tension metric
           ↓
    MemoryWeighting aggregates per adapter
           ↓
    Next query: Router uses memory weights to boost/suppress adapters
    

Test Results

Phase 2 End-to-End Test Output (from test_phase2_e2e.py):

[OK] PASS: MemoryWeighting Initialization
[OK] PASS: ForgeEngine with Living Memory
[OK] PASS: forge_with_debate() Storage
[OK] PASS: Memory Weight Explanations

Total: 4/4 tests passed

Validation Results:

  • [OK] MemoryWeighting computes weights [0, 2.0] correctly
  • [OK] Memory cocoons stored with conflict metadata
  • [OK] Tensions tagged and indexed for recall
  • [OK] Token confidence queries memory for learning signal
  • [OK] ForgeEngine initializes with/without memory (backward compatible)
  • [OK] Weight explanations expose all components

How to Use Phase 2

Quick Start with Memory-Weighted Routing

from reasoning_forge.forge_engine import ForgeEngine
from reasoning_forge.living_memory import LivingMemoryKernel

# Create memory kernel
memory = LivingMemoryKernel(max_memories=100)

# Initialize forge with memory-weighted adapter selection
forge = ForgeEngine(
    living_memory=memory,
    enable_memory_weighting=True
)

# Run debate (conflicts stored automatically)
result = forge.forge_with_debate(
    "Complex multi-perspective question",
    debate_rounds=1
)

# Access memory weighting
weights = forge.memory_weighting.get_all_weights()
print(f"Adapter weights: {weights}")

# Explain a specific weight
explanation = forge.memory_weighting.explain_weight("newton")
print(explanation)

Access Memory-Stored Conflicts

# Recall conflicts by emotional tag
tensions = memory.recall_by_emotion("tension", limit=10)
for cocoon in tensions:
    print(f"Conflict: {cocoon.title}")
    print(f"  Coherence: {cocoon.coherence:.3f}")
    print(f"  Agents: {cocoon.adapter_used}")

Query Learning Signal from Memory

# TokenConfidenceEngine now uses real historical data
scores = forge.token_confidence.score_tokens(
    agent_response,
    agent_name="newton",
    peer_responses={...}
)

# learning_signal component now includes adaptive boost
# based on Newton's historical coherence

Files Created/Modified

New Files (1)

  • reasoning_forge/memory_weighting.py (400 lines)

Modified Files (3)

  • reasoning_forge/forge_engine.py (+~30 lines for init + conflict storage)
  • reasoning_forge/token_confidence.py (+~20 lines for recency weighting)
  • test_phase2_e2e.py (220 lines - validation script)

Architecture: Memory-Cost Loop

Debate Cycle N
    ↓
Phase 1: Conflict Detection (existing)
    - Detects conflicts between agent perspectives
    - Scores by confidence + opposition
    ↓
Phase 2: Memory Storage (NEW)
    - Store top 5 conflicts in LivingMemoryKernel
    - Tag with emotional_tag="tension"
    - Track agent pair, type, and final coherence
    ↓
Phase 2: Memory Weighting (NEW)
    - MemoryWeighting queries memory
    - Computes per-adapter performance scores
    - Base coherence, conflict success, recency signals
    ↓
Debate Cycle N+1
    ↓
Phase 2: Adapter Selection (OPTIONAL)
    - Router uses memory weights to modulate confidence
    - High-performing adapters get +50% boost
    - Poor adapters get -50% suppression
    ↓
Phase 1: Token Confidence (ENHANCED)
    - Learning signal now queries memory (not just neutral 0.5)
    - Boosts confidence for agents with high historical coherence
    ↓
Improved multi-perspective reasoning through learning

Key Design Decisions

  1. Weight Range [0, 2.0]: Allows significant boost/suppression without breaking router confidence scores
  2. Soft Boost Strategy: Memory weights modulate existing router confidence, preserving keyword intelligence
  3. Recency Decay: ~7 day half-life prevents old, outdated memories from dominating
  4. Conflict Success Rate: Prioritizes adapters that handled high-tension moments well
  5. Backward Compatibility: ForgeEngine works without memory (living_memory=None)

Success Criteria Met

  • MemoryWeighting computes weights [0, 2.0] correctly
  • Memory cocoons store conflict metadata
  • Living_memory wired into TokenConfidenceEngine
  • ForgeEngine accepts memory parameter
  • Conflict→Adapter learning pathway established
  • Recency weighting implemented (7-day half-life)
  • Weight explanations expose all components
  • End-to-end test passes all 4 validations
  • Backward compatible (no breaking changes)

What's Next (Phase 3+)

  1. Strict Memory-Only Routing (optional):

    • Ignore keywords entirely
    • Select adapters purely by memory weight
    • Pure learning approach (higher risk, higher reward)
  2. Conflict → Resolution Feedback:

    • Track if conflicts were actually resolved
    • Boost adapters that resolve conflicts more effectively
    • Multi-round learning (not just single-round)
  3. Semantic Conflict Clustering:

    • Group similar recurring conflicts
    • Identify systematic weaknesses (e.g., "Quantum agents struggle with deterministic questions")
    • Targeted adapter boosting by conflict class
  4. Probabilistic Routing:

    • Sample adapters by weight (not just pick best)
    • Enables exploration vs exploitation
    • Learn from failures, not just successes
  5. Cross-Query Memory:

    • Link queries to past conflicts
    • Recognize when similar conflicts arise
    • Pre-select adapters before round 0

Code Quality

  • Tested: All components validated via end-to-end test
  • Documented: Docstrings on all public methods
  • Dataclasses: Type-safe with @dataclass
  • Error Handling: Graceful fallbacks (no memory → neutral weights)
  • No Dependencies: Uses only existing imports (numpy, json, time, math)
  • Backward Compatible: ForgeEngine/TokenConfidenceEngine work without memory

Notes for Implementation

  1. Adapter Naming: Currently stores as agent pairs (e.g., "Newton,Quantum"). For adapter-specific routing, need to track actual adapter names from inference layer.
  2. Weight Update Frequency: Default 1 hour (update_interval_hours). Can tune based on memory size and query frequency.
  3. Conflict Retention: Top 5 conflicts stored per debate (configurable). Tune based on memory budget (max_memories=100).
  4. Soft Boost Modulation: Currently -50% to +50% via weight_modifier = (weight - 1.0) / 2.0. Can adjust range in AdapterRouter integration.

Integration with Existing Systems

Integrates with:

  • Phase 1: Conflict detection (uses conflicts as learning signal)
  • EpistemicMetrics: Coherence/tension metrics (returned in metadata)
  • LivingMemoryKernel: Stores/recalls conflicts as cocoons
  • TokenConfidenceEngine: Uses memory for 4th signal

Compatible with:

  • AdapterRouter (ready for memory-weighted confidence boost)
  • TrustCalibrator (independent, can use weights as secondary signal)
  • SynthesisEngine (no changes needed)

Generated: 2026-03-19 Status: Ready for Phase 3 or production deployment