Codette-Reasoning / PHASE2_SUMMARY.md

Raiff1982

Upload 78 files

d574a3d verified 1 day ago

preview code

raw

history blame contribute delete

10.5 kB

Phase 2 Implementation Summary

Status: COMPLETE ✓

All Phase 2 components have been successfully implemented, integrated, and validated.

What Was Built

1. MemoryWeighting Engine (`reasoning_forge/memory_weighting.py`)

Purpose: Score adapter performance and weight future adapter selection based on historical memory
Key Components:
- AdapterWeight dataclass: Tracks adapter metrics (coherence, conflict success, recency, composite weight)
- MemoryWeighting class: Main engine for weight computation and selection
Key Features:
- compute_weights(): Aggregates memory cocoons per adapter, computes composite weights [0, 2.0]
  - Base coherence contribution: ±0.5 (mean coherence from past uses)
  - Conflict success contribution: ±0.3 (% of "tension" memories with coherence > 0.7)
  - Recency contribution: ±0.2 (exponential decay with ~7 day half-life)
- select_primary(): Choose best adapter for specific conflict context
- get_boosted_confidence(): Modulate router confidence based on weight (soft boost: -50% to +50%)
- explain_weight(): Expose weight breakdown for debugging/transparency
- get_all_weights(): Export full weighting state
Output: Weight scores [0, 2.0] where:
- 0.5 = Poor adapter (suppress by 50%)
- 1.0 = Average adapter (neutral)
- 2.0 = Excellent adapter (boost by 100%)

2. TokenConfidenceEngine Enhancement (`reasoning_forge/token_confidence.py`)

Phase 2 Upgrade: Wired living_memory into learning signal computation
Enhanced _compute_learning_signal() method:
- Now queries memory for past responses by agent
- Weights recent memories higher (exponential decay with 168-hour half-life)
- Computes weighted average of historical coherence
- Signal ranges [0.5, 1.0] based on past performance
Impact: 4th confidence signal (learning signal) now accesses actual historical data instead of neutral fallback

3. ForgeEngine Integration (`reasoning_forge/forge_engine.py`)

Modified __init__() (lines 52-88):
- Now accepts living_memory parameter (defaults to None for backward compat)
- Accepts enable_memory_weighting parameter (defaults to True)
- Passes living_memory to TokenConfidenceEngine
- Initializes MemoryWeighting if memory provided
Enhanced forge_with_debate() (lines 294-313):
- After Round 0 conflict detection, stores top 5 conflicts in memory
- Stores resolution outcomes for later analysis
- Creates resolution_outcome dict with conflict metadata
Backward Compatible: ForgeEngine works without memory (memory_weighting=None, token_confidence learning signal =0.5)

4. Conflict → Adapter Learning Bridge

Data Flow:

Debate with Conflict Detection
       ↓
Conflicts stored in LivingMemoryKernel
       ↓
MemoryCocoon with:
  - agent_pair (e.g., "Newton,Quantum")
  - conflict_type (contradiction/emphasis/framework)
  - coherence outcome
  - tension metric
       ↓
MemoryWeighting aggregates per adapter
       ↓
Next query: Router uses memory weights to boost/suppress adapters

Test Results

Phase 2 End-to-End Test Output (from test_phase2_e2e.py):

[OK] PASS: MemoryWeighting Initialization
[OK] PASS: ForgeEngine with Living Memory
[OK] PASS: forge_with_debate() Storage
[OK] PASS: Memory Weight Explanations

Total: 4/4 tests passed

Validation Results:

[OK] MemoryWeighting computes weights [0, 2.0] correctly
[OK] Memory cocoons stored with conflict metadata
[OK] Tensions tagged and indexed for recall
[OK] Token confidence queries memory for learning signal
[OK] ForgeEngine initializes with/without memory (backward compatible)
[OK] Weight explanations expose all components

How to Use Phase 2

Quick Start with Memory-Weighted Routing

from reasoning_forge.forge_engine import ForgeEngine
from reasoning_forge.living_memory import LivingMemoryKernel

# Create memory kernel
memory = LivingMemoryKernel(max_memories=100)

# Initialize forge with memory-weighted adapter selection
forge = ForgeEngine(
    living_memory=memory,
    enable_memory_weighting=True
)

# Run debate (conflicts stored automatically)
result = forge.forge_with_debate(
    "Complex multi-perspective question",
    debate_rounds=1
)

# Access memory weighting
weights = forge.memory_weighting.get_all_weights()
print(f"Adapter weights: {weights}")

# Explain a specific weight
explanation = forge.memory_weighting.explain_weight("newton")
print(explanation)

Access Memory-Stored Conflicts

# Recall conflicts by emotional tag
tensions = memory.recall_by_emotion("tension", limit=10)
for cocoon in tensions:
    print(f"Conflict: {cocoon.title}")
    print(f"  Coherence: {cocoon.coherence:.3f}")
    print(f"  Agents: {cocoon.adapter_used}")

Query Learning Signal from Memory

# TokenConfidenceEngine now uses real historical data
scores = forge.token_confidence.score_tokens(
    agent_response,
    agent_name="newton",
    peer_responses={...}
)

# learning_signal component now includes adaptive boost
# based on Newton's historical coherence

Files Created/Modified

New Files (1)

reasoning_forge/memory_weighting.py (400 lines)

Modified Files (3)

reasoning_forge/forge_engine.py (+~30 lines for init + conflict storage)
reasoning_forge/token_confidence.py (+~20 lines for recency weighting)
test_phase2_e2e.py (220 lines - validation script)

Architecture: Memory-Cost Loop

Debate Cycle N
    ↓
Phase 1: Conflict Detection (existing)
    - Detects conflicts between agent perspectives
    - Scores by confidence + opposition
    ↓
Phase 2: Memory Storage (NEW)
    - Store top 5 conflicts in LivingMemoryKernel
    - Tag with emotional_tag="tension"
    - Track agent pair, type, and final coherence
    ↓
Phase 2: Memory Weighting (NEW)
    - MemoryWeighting queries memory
    - Computes per-adapter performance scores
    - Base coherence, conflict success, recency signals
    ↓
Debate Cycle N+1
    ↓
Phase 2: Adapter Selection (OPTIONAL)
    - Router uses memory weights to modulate confidence
    - High-performing adapters get +50% boost
    - Poor adapters get -50% suppression
    ↓
Phase 1: Token Confidence (ENHANCED)
    - Learning signal now queries memory (not just neutral 0.5)
    - Boosts confidence for agents with high historical coherence
    ↓
Improved multi-perspective reasoning through learning

Key Design Decisions

Weight Range [0, 2.0]: Allows significant boost/suppression without breaking router confidence scores
Soft Boost Strategy: Memory weights modulate existing router confidence, preserving keyword intelligence
Recency Decay: ~7 day half-life prevents old, outdated memories from dominating
Conflict Success Rate: Prioritizes adapters that handled high-tension moments well
Backward Compatibility: ForgeEngine works without memory (living_memory=None)

Success Criteria Met

MemoryWeighting computes weights [0, 2.0] correctly
Memory cocoons store conflict metadata
Living_memory wired into TokenConfidenceEngine
ForgeEngine accepts memory parameter
Conflict→Adapter learning pathway established
Recency weighting implemented (7-day half-life)
Weight explanations expose all components
End-to-end test passes all 4 validations
Backward compatible (no breaking changes)

What's Next (Phase 3+)

Strict Memory-Only Routing (optional):
- Ignore keywords entirely
- Select adapters purely by memory weight
- Pure learning approach (higher risk, higher reward)
Conflict → Resolution Feedback:
- Track if conflicts were actually resolved
- Boost adapters that resolve conflicts more effectively
- Multi-round learning (not just single-round)
Semantic Conflict Clustering:
- Group similar recurring conflicts
- Identify systematic weaknesses (e.g., "Quantum agents struggle with deterministic questions")
- Targeted adapter boosting by conflict class
Probabilistic Routing:
- Sample adapters by weight (not just pick best)
- Enables exploration vs exploitation
- Learn from failures, not just successes
Cross-Query Memory:
- Link queries to past conflicts
- Recognize when similar conflicts arise
- Pre-select adapters before round 0

Code Quality

Tested: All components validated via end-to-end test
Documented: Docstrings on all public methods
Dataclasses: Type-safe with @dataclass
Error Handling: Graceful fallbacks (no memory → neutral weights)
No Dependencies: Uses only existing imports (numpy, json, time, math)
Backward Compatible: ForgeEngine/TokenConfidenceEngine work without memory

Notes for Implementation

Adapter Naming: Currently stores as agent pairs (e.g., "Newton,Quantum"). For adapter-specific routing, need to track actual adapter names from inference layer.
Weight Update Frequency: Default 1 hour (update_interval_hours). Can tune based on memory size and query frequency.
Conflict Retention: Top 5 conflicts stored per debate (configurable). Tune based on memory budget (max_memories=100).
Soft Boost Modulation: Currently -50% to +50% via weight_modifier = (weight - 1.0) / 2.0. Can adjust range in AdapterRouter integration.

Integration with Existing Systems

Integrates with:

Phase 1: Conflict detection (uses conflicts as learning signal)
EpistemicMetrics: Coherence/tension metrics (returned in metadata)
LivingMemoryKernel: Stores/recalls conflicts as cocoons
TokenConfidenceEngine: Uses memory for 4th signal

Compatible with:

AdapterRouter (ready for memory-weighted confidence boost)
TrustCalibrator (independent, can use weights as secondary signal)
SynthesisEngine (no changes needed)

Generated: 2026-03-19 Status: Ready for Phase 3 or production deployment