Spaces:

MCP-1st-Birthday
/

rewardpilot-web-ui

Running

App Files Files Community

rewardpilot-web-ui / docs /agent_reasoning.md

sammy786

Create agent_reasoning.md

dd916d8 verified 15 days ago

preview code

raw

history blame contribute delete

25.6 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

# Agent Reasoning Flow Guide

## Overview

RewardPilot uses a multi-stage reasoning process powered by Claude 3.5 Sonnet (planning) and Gemini 2.0 Flash (synthesis). This guide explains how the agent thinks through complex credit card optimization decisions.

## Why Multi-LLM Architecture?

| Stage | LLM | Reason |
|-------|-----|--------|
| **Planning** | Claude 3.5 Sonnet | Best at strategic thinking, tool use |
| **Synthesis** | Gemini 2.0 Flash | Fast context processing, cost-effective |
| **Verification** | GPT-4o | High accuracy for critical decisions |

**Cost Comparison:**
- Single GPT-4o: $0.15 per recommendation
- Multi-LLM: $0.03 per recommendation (5x cheaper)

---

## Four-Phase Reasoning Process

┌─────────────────────────────────────────────────────────┐ │ USER TRANSACTION │ │ "Whole Foods, $127.50, Groceries" │ └────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ PHASE 1: PLANNING │ │ (Claude 3.5 Sonnet) │ │ │ │ Input: Transaction context │ │ Output: Execution strategy │ │ │ │ Questions: │ │ 1. What category is this? (Groceries) │ │ 2. Which cards have grocery bonuses? │ │ 3. Are there spending caps to check? │ │ 4. Need to forecast future spending? │ │ 5. Any special merchant restrictions? │ │ │ │ Strategy: │ │ - Call Smart Wallet MCP (get card recommendations) │ │ - Call RAG MCP (check merchant acceptance) │ │ - Call Forecast MCP (check cap status) │ └────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ PHASE 2: EXECUTION │ │ (Parallel MCP Server Calls) │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Smart Wallet │ │ Rewards RAG │ │ Forecast │ │ │ │ MCP │ │ MCP │ │ MCP │ │ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────────────────────────────────────────────────┐ │ │ │ Results: │ │ │ │ - Amex Gold: 4x = $5.10 │ │ │ │ - Citi Custom: 5% but cap hit │ │ │ │ - Chase Freedom: Not in grocery quarter │ │ │ │ │ │ │ │ - Merchant: Amex accepted at Whole Foods │ │ │ │ │ │ │ │ - Forecast: $450/$500 cap remaining this month │ │ │ └──────────────────────────────────────────────────┘ │ └────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ PHASE 3: REASONING │ │ (Gemini 2.0 Flash Exp) │ │ │ │ Input: All MCP results + transaction context │ │ Output: Synthesized explanation │ │ │ │ Reasoning Chain: │ │ │ │ 1. Compare Rewards: │ │ - Amex Gold: 4x points = $5.10 cash value │ │ - Citi Custom Cash: Would be 5% ($6.38) but │ │ monthly cap already hit │ │ - Winner: Amex Gold ($5.10 > $1.28) │ │ │ │ 2. Check Constraints: │ │ - Amex accepted at Whole Foods? ✅ Yes │ │ - Annual cap status? $2,450/$25,000 (safe) │ │ - Foreign transaction fee? ✅ None │ │ │ │ 3. Future Optimization: │ │ - Forecast shows 3 more grocery trips this month │ │ - Total: $127.50 × 3 = $382.50 │ │ - Rewards: $382.50 × 4% = $15.30 │ │ - Recommendation: Continue using Amex Gold │ │ │ │ 4. Alternative Scenarios: │ │ - If Citi cap not hit: Use Citi ($6.38 > $5.10) │ │ - If at Costco: Use Citi (Amex not accepted) │ │ - If annual cap near: Switch to Citi next month │ │ │ │ Confidence: 95% (high certainty) │ └────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ PHASE 4: RESPONSE FORMATTING │ │ (Structured Output) │ │ │ │ { │ │ "recommended_card": { │ │ "card_id": "c_amex_gold", │ │ "card_name": "American Express Gold", │ │ "issuer": "American Express" │ │ }, │ │ "rewards": { │ │ "points_earned": 510, │ │ "cash_value": 5.10, │ │ "earn_rate": "4x points" │ │ }, │ │ "reasoning": "Amex Gold offers 4x points...", │ │ "confidence": 0.95, │ │ "alternatives": [...], │ │ "warnings": [...] │ │ } │ └─────────────────────────────────────────────────────────┘


---

## Phase 1: Planning (Claude 3.5 Sonnet)

### Implementation

```python
from anthropic import Anthropic

anthropic = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

async def create_execution_plan(transaction: dict) -> dict:
    """
    Claude analyzes transaction and creates execution strategy
    """
    
    prompt = f"""
You are a credit card optimization expert. Analyze this transaction and create an execution plan.

Transaction:
- Merchant: {transaction['merchant']}
- Category: {transaction['category']}
- Amount: ${transaction['amount_usd']}
- MCC Code: {transaction['mcc']}
- User ID: {transaction['user_id']}

Available MCP servers:
1. smart_wallet - Analyzes user's cards and calculates rewards
2. rewards_rag - Semantic search of card benefits and restrictions
3. spend_forecast - Predicts spending and cap warnings

Your task:
1. Determine which MCP servers to call
2. Prioritize the calls (some may depend on others)
3. Identify key decision factors
4. Set confidence threshold for recommendation

Return a JSON plan with:
{{
  "strategy": "optimization approach (e.g., 'max_rewards', 'cap_aware')",
  "mcp_calls": [
    {{
      "service": "smart_wallet",
      "priority": 1,
      "reason": "Need to know available cards and base rewards"
    }},
    {{
      "service": "rewards_rag",
      "priority": 2,
      "reason": "Check if merchant accepts top card"
    }},
    {{
      "service": "spend_forecast",
      "priority": 3,
      "reason": "Verify monthly cap status"
    }}
  ],
  "decision_factors": [
    "reward_rate",
    "merchant_acceptance",
    "spending_caps",
    "annual_fees"
  ],
  "confidence_threshold": 0.85,
  "complexity": "medium"
}}
"""
    
    response = anthropic.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=2048,
        temperature=0.3,  # Lower temperature for consistent planning
        messages=[{
            "role": "user",
            "content": prompt
        }]
    )
    
    # Parse JSON response
    plan = json.loads(response.content[0].text)
    
    return plan

Example Plans

Simple Transaction

{
  "strategy": "max_rewards",
  "mcp_calls": [
    {
      "service": "smart_wallet",
      "priority": 1,
      "reason": "Straightforward category bonus"
    }
  ],
  "decision_factors": ["reward_rate"],
  "confidence_threshold": 0.90,
  "complexity": "low"
}

Complex Transaction

{
  "strategy": "cap_aware_optimization",
  "mcp_calls": [
    {
      "service": "smart_wallet",
      "priority": 1,
      "reason": "Get all card options"
    },
    {
      "service": "spend_forecast",
      "priority": 2,
      "reason": "Check if near monthly/annual caps"
    },
    {
      "service": "rewards_rag",
      "priority": 3,
      "reason": "Verify merchant acceptance for top 2 cards"
    }
  ],
  "decision_factors": [
    "reward_rate",
    "spending_caps",
    "merchant_acceptance",
    "future_spending"
  ],
  "confidence_threshold": 0.80,
  "complexity": "high"
}

Phase 2: Execution (Parallel MCP Calls)

Implementation

import asyncio
import httpx

async def execute_mcp_calls(plan: dict, transaction: dict) -> dict:
    """
    Execute MCP calls based on plan
    """
    
    # Sort by priority
    sorted_calls = sorted(
        plan["mcp_calls"],
        key=lambda x: x["priority"]
    )
    
    results = {}
    
    # Execute in priority order (can parallelize same priority)
    current_priority = sorted_calls[0]["priority"]
    priority_group = []
    
    for call in sorted_calls:
        if call["priority"] == current_priority:
            priority_group.append(call)
        else:
            # Execute current priority group in parallel
            group_results = await execute_priority_group(
                priority_group,
                transaction
            )
            results.update(group_results)
            
            # Move to next priority
            current_priority = call["priority"]
            priority_group = [call]
    
    # Execute final group
    if priority_group:
        group_results = await execute_priority_group(
            priority_group,
            transaction
        )
        results.update(group_results)
    
    return results

async def execute_priority_group(calls: list, transaction: dict) -> dict:
    """Execute MCP calls of same priority in parallel"""
    
    tasks = []
    for call in calls:
        if call["service"] == "smart_wallet":
            tasks.append(call_smart_wallet(transaction))
        elif call["service"] == "rewards_rag":
            tasks.append(call_rewards_rag(transaction))
        elif call["service"] == "spend_forecast":
            tasks.append(call_forecast(transaction))
    
    results = await asyncio.gather(*tasks)
    
    return dict(zip([c["service"] for c in calls], results))

async def call_smart_wallet(transaction: dict) -> dict:
    """Call Smart Wallet MCP"""
    async with httpx.AsyncClient(timeout=30.0) as client:
        response = await client.post(
            f"{MCP_ENDPOINTS['smart_wallet']}/analyze",
            json=transaction
        )
        response.raise_for_status()
        return response.json()

# Similar for other MCP servers...

Phase 3: Reasoning (Gemini 2.0 Flash)

Implementation

import google.generativeai as genai

genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel("gemini-2.0-flash-exp")

async def synthesize_reasoning(
    transaction: dict,
    mcp_results: dict,
    plan: dict
) -> str:
    """
    Gemini synthesizes all information into coherent explanation
    """
    
    prompt = f"""
You are a credit card optimization expert. Synthesize the following information into a clear recommendation.

Transaction:
{json.dumps(transaction, indent=2)}

MCP Results:
{json.dumps(mcp_results, indent=2)}

Decision Factors (in order of importance):
{json.dumps(plan['decision_factors'], indent=2)}

Your task:
1. Compare all card options on the decision factors
2. Identify the optimal card with clear reasoning
3. Explain why alternatives are suboptimal
4. Provide any warnings or caveats
5. Suggest future optimizations

Format your response as:

## Recommended Card
[Card name and key benefit]

## Reasoning
[Step-by-step logic]

## Comparison
[Table comparing top 3 options]

## Warnings
[Any caveats or cap warnings]

## Future Optimization
[How to maximize rewards going forward]

Be specific with numbers and percentages.
"""
    
    response = model.generate_content(
        prompt,
        generation_config={
            "temperature": 0.7,
            "max_output_tokens": 2048
        }
    )
    
    return response.text

Example Reasoning Output

## Recommended Card
**American Express Gold** - 4x points on U.S. supermarkets

## Reasoning

1. **Reward Rate Comparison:**
   - Amex Gold: 4x points = $5.10 cash value (1.3 cpp transfer)
   - Citi Custom Cash: Would be 5% = $6.38, but monthly cap hit
   - Chase Freedom Flex: 1x points = $1.28 (not grocery quarter)
   
   Winner: Amex Gold ($5.10 actual rewards)

2. **Merchant Acceptance:**
   - Whole Foods accepts American Express ✅
   - No foreign transaction fees ✅

3. **Spending Cap Status:**
   - Current: $2,450 / $25,000 annual cap (9.8% used)
   - This transaction: $127.50 (0.5% of cap)
   - Safe to use ✅

4. **Future Spending Forecast:**
   - Predicted 3 more grocery trips this month ($382.50 total)
   - Projected rewards: $15.30
   - Still well under annual cap

## Comparison

| Card | Earn Rate | Rewards | Cap Status | Accepted? |
|------|-----------|---------|------------|-----------|
| **Amex Gold** | 4x | **$5.10** | 9.8% used | ✅ Yes |
| Citi Custom Cash | 5% | $1.28 | Cap hit | ✅ Yes |
| Chase Freedom Flex | 1x | $1.28 | N/A | ✅ Yes |

## Warnings

⚠️ **Citi Custom Cash Cap Hit**: You've reached the $500 monthly limit on Citi Custom Cash. It will reset on Feb 1st. Consider using it for non-grocery purchases this month.

⚠️ **Annual Cap Tracking**: You're at $2,450/$25,000 on Amex Gold's supermarket bonus. At current pace, you'll hit the cap in November. Plan to switch to Citi Custom Cash after that.

## Future Optimization

1. **This Month**: Continue using Amex Gold for groceries (best rate)
2. **Next Month**: Switch to Citi Custom Cash (5% > 4x after cap resets)
3. **After $25k Cap**: Use Citi Custom Cash or Chase Freedom (if grocery quarter)
4. **Consider**: Blue Cash Preferred (6% groceries, no cap) if spending exceeds $25k/year

**Estimated Annual Savings**: $523 by following this strategy vs. using single card

Phase 4: Response Formatting

Implementation

from pydantic import BaseModel
from typing import List, Optional

class RecommendedCard(BaseModel):
    card_id: str
    card_name: str
    issuer: str

class Rewards(BaseModel):
    points_earned: int
    cash_value: float
    earn_rate: str

class Alternative(BaseModel):
    card_name: str
    rewards: float
    reason: str

class FinalRecommendation(BaseModel):
    recommended_card: RecommendedCard
    rewards: Rewards
    reasoning: str
    confidence: float
    alternatives: List[Alternative]
    warnings: List[str]
    processing_time_ms: float

def format_recommendation(
    mcp_results: dict,
    reasoning: str,
    processing_time: float
) -> FinalRecommendation:
    """Format final response"""
    
    smart_wallet_result = mcp_results["smart_wallet"]
    best_card = smart_wallet_result["recommended_card"]
    
    # Extract alternatives
    alternatives = []
    for card in smart_wallet_result["all_cards_comparison"][1:4]:
        alternatives.append(Alternative(
            card_name=card["card_name"],
            rewards=card["rewards"],
            reason=card.get("note", "Lower rewards rate")
        ))
    
    # Extract warnings
    warnings = []
    if "forecast" in mcp_results:
        warnings.extend(mcp_results["forecast"].get("warnings", []))
    
    return FinalRecommendation(
        recommended_card=RecommendedCard(**best_card),
        rewards=Rewards(**smart_wallet_result["rewards"]),
        reasoning=reasoning,
        confidence=calculate_confidence(mcp_results),
        alternatives=alternatives,
        warnings=warnings,
        processing_time_ms=processing_time
    )

Advanced Reasoning Patterns

1. Chain-of-Thought Reasoning

prompt = """
Let's think through this step-by-step:

Step 1: Identify the category
- Merchant: {merchant}
- MCC: {mcc}
- Likely category: ?

Step 2: List cards with bonuses in this category
- Card A: X% on category
- Card B: Y points per dollar
- Card C: Z% cashback

Step 3: Calculate actual rewards
- Card A: ${amount} × X% = $?
- Card B: ${amount} × Y points × $0.01 = $?
- Card C: ${amount} × Z% = $?

Step 4: Check constraints
- Is Card A accepted at merchant?
- Is Card B near spending cap?
- Does Card C have annual fee?

Step 5: Make recommendation
Based on steps 1-4, the best card is...
"""

2. Self-Consistency

# Generate multiple reasoning paths
reasoning_paths = []
for i in range(5):
    response = model.generate_content(prompt, temperature=0.8)
    reasoning_paths.append(response.text)

# Vote on most common recommendation
from collections import Counter
recommendations = [extract_card(path) for path in reasoning_paths]
most_common = Counter(recommendations).most_common(1)[0][0]

# Use the reasoning path that led to most common answer
final_reasoning = next(
    path for path in reasoning_paths 
    if extract_card(path) == most_common
)

3. Reflection & Verification

# Initial recommendation
initial_rec = await generate_recommendation(transaction, mcp_results)

# Self-critique
critique_prompt = f"""
Review this credit card recommendation:

{initial_rec}

Are there any errors or oversights?
- Did we miss a better card?
- Are the math calculations correct?
- Did we consider all constraints?
- Is the reasoning sound?

If you find issues, provide corrections.
"""

critique = model.generate_content(critique_prompt)

# Refine if needed
if "error" in critique.text.lower() or "issue" in critique.text.lower():
    final_rec = await refine_recommendation(initial_rec, critique.text)
else:
    final_rec = initial_rec

Confidence Scoring

def calculate_confidence(mcp_results: dict) -> float:
    """
    Calculate confidence score based on multiple factors
    """
    
    confidence = 1.0
    
    # Factor 1: Reward difference (higher difference = higher confidence)
    best_reward = mcp_results["smart_wallet"]["recommended_card"]["rewards"]
    second_best = mcp_results["smart_wallet"]["all_cards_comparison"][1]["rewards"]
    
    reward_gap = (best_reward - second_best) / best_reward
    if reward_gap < 0.1:  # Less than 10% difference
        confidence *= 0.8
    
    # Factor 2: Merchant acceptance certainty
    if "rewards_rag" in mcp_results:
        rag_confidence = mcp_results["rewards_rag"]["sources"][0]["relevance_score"]
        confidence *= rag_confidence
    
    # Factor 3: Cap warnings
    if "forecast" in mcp_results:
        if mcp_results["forecast"].get("warnings"):
            confidence *= 0.9
    
    # Factor 4: Data freshness
    # (Lower confidence for stale data)
    
    return round(confidence, 2)

Error Handling & Fallbacks

async def recommend_with_fallback(transaction: dict):
    """Graceful degradation if MCP servers fail"""
    
    try:
        # Try full reasoning pipeline
        plan = await create_execution_plan(transaction)
        mcp_results = await execute_mcp_calls(plan, transaction)
        reasoning = await synthesize_reasoning(transaction, mcp_results, plan)
        return format_recommendation(mcp_results, reasoning)
    
    except Exception as e:
        logger.error(f"Full pipeline failed: {e}")
        
        try:
            # Fallback: Use only Smart Wallet MCP
            result = await call_smart_wallet(transaction)
            return format_simple_recommendation(result)
        
        except Exception as e2:
            logger.error(f"Fallback failed: {e2}")
            
            # Last resort: Rule-based recommendation
            return rule_based_recommendation(transaction)

def rule_based_recommendation(transaction: dict):
    """Simple rule-based fallback"""
    
    rules = {
        "Groceries": "Amex Gold (4x points)",
        "Dining": "Amex Gold (4x points)",
        "Travel": "Chase Sapphire Reserve (3x points)",
        "Gas": "Costco Anywhere Visa (4% cashback)",
        "Default": "Citi Double Cash (2% on everything)"
    }
    
    category = transaction["category"]
    recommended = rules.get(category, rules["Default"])
    
    return {
        "recommended_card": recommended,
        "reasoning": f"Based on category rules for {category}",
        "confidence": 0.60,  # Lower confidence for rule-based
        "warnings": ["Recommendation based on simplified rules (MCP servers unavailable)"]
    }

Testing & Evaluation

Unit Tests

import pytest

@pytest.mark.asyncio
async def test_planning_phase():
    """Test Claude's planning logic"""
    transaction = {
        "merchant": "Whole Foods",
        "category": "Groceries",
        "amount_usd": 127.50,
        "mcc": "5411"
    }
    
    plan = await create_execution_plan(transaction)
    
    assert "strategy" in plan
    assert "mcp_calls" in plan
    assert len(plan["mcp_calls"]) > 0
    assert plan["confidence_threshold"] >= 0.5

@pytest.mark.asyncio
async def test_reasoning_phase():
    """Test Gemini's synthesis"""
    mcp_results = {
        "smart_wallet": {
            "recommended_card": {"card_name": "Amex Gold"},
            "rewards": {"cash_value": 5.10}
        }
    }
    
    reasoning = await synthesize_reasoning({}, mcp_results, {})
    
    assert "Amex Gold" in reasoning
    assert "$5.10" in reasoning

Integration Tests

@pytest.mark.asyncio
async def test_end_to_end_recommendation():
    """Test full recommendation pipeline"""
    transaction = {
        "user_id": "test_user",
        "merchant": "Whole Foods",
        "category": "Groceries",
        "amount_usd": 127.50,
        "mcc": "5411"
    }
    
    result = await recommend_with_fallback(transaction)
    
    assert result["recommended_card"]["card_name"]
    assert result["rewards"]["cash_value"] > 0
    assert result["confidence"] >= 0.5
    assert len(result["reasoning"]) > 100

Related Documentation:

---