A newer version of the Gradio SDK is available:
6.1.0
# Agent Reasoning Flow Guide
## Overview
RewardPilot uses a multi-stage reasoning process powered by Claude 3.5 Sonnet (planning) and Gemini 2.0 Flash (synthesis). This guide explains how the agent thinks through complex credit card optimization decisions.
## Why Multi-LLM Architecture?
| Stage | LLM | Reason |
|-------|-----|--------|
| **Planning** | Claude 3.5 Sonnet | Best at strategic thinking, tool use |
| **Synthesis** | Gemini 2.0 Flash | Fast context processing, cost-effective |
| **Verification** | GPT-4o | High accuracy for critical decisions |
**Cost Comparison:**
- Single GPT-4o: $0.15 per recommendation
- Multi-LLM: $0.03 per recommendation (5x cheaper)
---
## Four-Phase Reasoning Process
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β USER TRANSACTION β β "Whole Foods, $127.50, Groceries" β ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β PHASE 1: PLANNING β β (Claude 3.5 Sonnet) β β β β Input: Transaction context β β Output: Execution strategy β β β β Questions: β β 1. What category is this? (Groceries) β β 2. Which cards have grocery bonuses? β β 3. Are there spending caps to check? β β 4. Need to forecast future spending? β β 5. Any special merchant restrictions? β β β β Strategy: β β - Call Smart Wallet MCP (get card recommendations) β β - Call RAG MCP (check merchant acceptance) β β - Call Forecast MCP (check cap status) β ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β PHASE 2: EXECUTION β β (Parallel MCP Server Calls) β β β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β β Smart Wallet β β Rewards RAG β β Forecast β β β β MCP β β MCP β β MCP β β β ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ β β β β β β β βΌ βΌ βΌ β β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Results: β β β β - Amex Gold: 4x = $5.10 β β β β - Citi Custom: 5% but cap hit β β β β - Chase Freedom: Not in grocery quarter β β β β β β β β - Merchant: Amex accepted at Whole Foods β β β β β β β β - Forecast: $450/$500 cap remaining this month β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β PHASE 3: REASONING β β (Gemini 2.0 Flash Exp) β β β β Input: All MCP results + transaction context β β Output: Synthesized explanation β β β β Reasoning Chain: β β β β 1. Compare Rewards: β β - Amex Gold: 4x points = $5.10 cash value β β - Citi Custom Cash: Would be 5% ($6.38) but β β monthly cap already hit β β - Winner: Amex Gold ($5.10 > $1.28) β β β β 2. Check Constraints: β β - Amex accepted at Whole Foods? β Yes β β - Annual cap status? $2,450/$25,000 (safe) β β - Foreign transaction fee? β None β β β β 3. Future Optimization: β β - Forecast shows 3 more grocery trips this month β β - Total: $127.50 Γ 3 = $382.50 β β - Rewards: $382.50 Γ 4% = $15.30 β β - Recommendation: Continue using Amex Gold β β β β 4. Alternative Scenarios: β β - If Citi cap not hit: Use Citi ($6.38 > $5.10) β β - If at Costco: Use Citi (Amex not accepted) β β - If annual cap near: Switch to Citi next month β β β β Confidence: 95% (high certainty) β ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β PHASE 4: RESPONSE FORMATTING β β (Structured Output) β β β β { β β "recommended_card": { β β "card_id": "c_amex_gold", β β "card_name": "American Express Gold", β β "issuer": "American Express" β β }, β β "rewards": { β β "points_earned": 510, β β "cash_value": 5.10, β β "earn_rate": "4x points" β β }, β β "reasoning": "Amex Gold offers 4x points...", β β "confidence": 0.95, β β "alternatives": [...], β β "warnings": [...] β β } β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
---
## Phase 1: Planning (Claude 3.5 Sonnet)
### Implementation
```python
from anthropic import Anthropic
anthropic = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
async def create_execution_plan(transaction: dict) -> dict:
"""
Claude analyzes transaction and creates execution strategy
"""
prompt = f"""
You are a credit card optimization expert. Analyze this transaction and create an execution plan.
Transaction:
- Merchant: {transaction['merchant']}
- Category: {transaction['category']}
- Amount: ${transaction['amount_usd']}
- MCC Code: {transaction['mcc']}
- User ID: {transaction['user_id']}
Available MCP servers:
1. smart_wallet - Analyzes user's cards and calculates rewards
2. rewards_rag - Semantic search of card benefits and restrictions
3. spend_forecast - Predicts spending and cap warnings
Your task:
1. Determine which MCP servers to call
2. Prioritize the calls (some may depend on others)
3. Identify key decision factors
4. Set confidence threshold for recommendation
Return a JSON plan with:
{{
"strategy": "optimization approach (e.g., 'max_rewards', 'cap_aware')",
"mcp_calls": [
{{
"service": "smart_wallet",
"priority": 1,
"reason": "Need to know available cards and base rewards"
}},
{{
"service": "rewards_rag",
"priority": 2,
"reason": "Check if merchant accepts top card"
}},
{{
"service": "spend_forecast",
"priority": 3,
"reason": "Verify monthly cap status"
}}
],
"decision_factors": [
"reward_rate",
"merchant_acceptance",
"spending_caps",
"annual_fees"
],
"confidence_threshold": 0.85,
"complexity": "medium"
}}
"""
response = anthropic.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=2048,
temperature=0.3, # Lower temperature for consistent planning
messages=[{
"role": "user",
"content": prompt
}]
)
# Parse JSON response
plan = json.loads(response.content[0].text)
return plan
Example Plans
Simple Transaction
{
"strategy": "max_rewards",
"mcp_calls": [
{
"service": "smart_wallet",
"priority": 1,
"reason": "Straightforward category bonus"
}
],
"decision_factors": ["reward_rate"],
"confidence_threshold": 0.90,
"complexity": "low"
}
Complex Transaction
{
"strategy": "cap_aware_optimization",
"mcp_calls": [
{
"service": "smart_wallet",
"priority": 1,
"reason": "Get all card options"
},
{
"service": "spend_forecast",
"priority": 2,
"reason": "Check if near monthly/annual caps"
},
{
"service": "rewards_rag",
"priority": 3,
"reason": "Verify merchant acceptance for top 2 cards"
}
],
"decision_factors": [
"reward_rate",
"spending_caps",
"merchant_acceptance",
"future_spending"
],
"confidence_threshold": 0.80,
"complexity": "high"
}
Phase 2: Execution (Parallel MCP Calls)
Implementation
import asyncio
import httpx
async def execute_mcp_calls(plan: dict, transaction: dict) -> dict:
"""
Execute MCP calls based on plan
"""
# Sort by priority
sorted_calls = sorted(
plan["mcp_calls"],
key=lambda x: x["priority"]
)
results = {}
# Execute in priority order (can parallelize same priority)
current_priority = sorted_calls[0]["priority"]
priority_group = []
for call in sorted_calls:
if call["priority"] == current_priority:
priority_group.append(call)
else:
# Execute current priority group in parallel
group_results = await execute_priority_group(
priority_group,
transaction
)
results.update(group_results)
# Move to next priority
current_priority = call["priority"]
priority_group = [call]
# Execute final group
if priority_group:
group_results = await execute_priority_group(
priority_group,
transaction
)
results.update(group_results)
return results
async def execute_priority_group(calls: list, transaction: dict) -> dict:
"""Execute MCP calls of same priority in parallel"""
tasks = []
for call in calls:
if call["service"] == "smart_wallet":
tasks.append(call_smart_wallet(transaction))
elif call["service"] == "rewards_rag":
tasks.append(call_rewards_rag(transaction))
elif call["service"] == "spend_forecast":
tasks.append(call_forecast(transaction))
results = await asyncio.gather(*tasks)
return dict(zip([c["service"] for c in calls], results))
async def call_smart_wallet(transaction: dict) -> dict:
"""Call Smart Wallet MCP"""
async with httpx.AsyncClient(timeout=30.0) as client:
response = await client.post(
f"{MCP_ENDPOINTS['smart_wallet']}/analyze",
json=transaction
)
response.raise_for_status()
return response.json()
# Similar for other MCP servers...
Phase 3: Reasoning (Gemini 2.0 Flash)
Implementation
import google.generativeai as genai
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel("gemini-2.0-flash-exp")
async def synthesize_reasoning(
transaction: dict,
mcp_results: dict,
plan: dict
) -> str:
"""
Gemini synthesizes all information into coherent explanation
"""
prompt = f"""
You are a credit card optimization expert. Synthesize the following information into a clear recommendation.
Transaction:
{json.dumps(transaction, indent=2)}
MCP Results:
{json.dumps(mcp_results, indent=2)}
Decision Factors (in order of importance):
{json.dumps(plan['decision_factors'], indent=2)}
Your task:
1. Compare all card options on the decision factors
2. Identify the optimal card with clear reasoning
3. Explain why alternatives are suboptimal
4. Provide any warnings or caveats
5. Suggest future optimizations
Format your response as:
## Recommended Card
[Card name and key benefit]
## Reasoning
[Step-by-step logic]
## Comparison
[Table comparing top 3 options]
## Warnings
[Any caveats or cap warnings]
## Future Optimization
[How to maximize rewards going forward]
Be specific with numbers and percentages.
"""
response = model.generate_content(
prompt,
generation_config={
"temperature": 0.7,
"max_output_tokens": 2048
}
)
return response.text
Example Reasoning Output
## Recommended Card
**American Express Gold** - 4x points on U.S. supermarkets
## Reasoning
1. **Reward Rate Comparison:**
- Amex Gold: 4x points = $5.10 cash value (1.3 cpp transfer)
- Citi Custom Cash: Would be 5% = $6.38, but monthly cap hit
- Chase Freedom Flex: 1x points = $1.28 (not grocery quarter)
Winner: Amex Gold ($5.10 actual rewards)
2. **Merchant Acceptance:**
- Whole Foods accepts American Express β
- No foreign transaction fees β
3. **Spending Cap Status:**
- Current: $2,450 / $25,000 annual cap (9.8% used)
- This transaction: $127.50 (0.5% of cap)
- Safe to use β
4. **Future Spending Forecast:**
- Predicted 3 more grocery trips this month ($382.50 total)
- Projected rewards: $15.30
- Still well under annual cap
## Comparison
| Card | Earn Rate | Rewards | Cap Status | Accepted? |
|------|-----------|---------|------------|-----------|
| **Amex Gold** | 4x | **$5.10** | 9.8% used | β
Yes |
| Citi Custom Cash | 5% | $1.28 | Cap hit | β
Yes |
| Chase Freedom Flex | 1x | $1.28 | N/A | β
Yes |
## Warnings
β οΈ **Citi Custom Cash Cap Hit**: You've reached the $500 monthly limit on Citi Custom Cash. It will reset on Feb 1st. Consider using it for non-grocery purchases this month.
β οΈ **Annual Cap Tracking**: You're at $2,450/$25,000 on Amex Gold's supermarket bonus. At current pace, you'll hit the cap in November. Plan to switch to Citi Custom Cash after that.
## Future Optimization
1. **This Month**: Continue using Amex Gold for groceries (best rate)
2. **Next Month**: Switch to Citi Custom Cash (5% > 4x after cap resets)
3. **After $25k Cap**: Use Citi Custom Cash or Chase Freedom (if grocery quarter)
4. **Consider**: Blue Cash Preferred (6% groceries, no cap) if spending exceeds $25k/year
**Estimated Annual Savings**: $523 by following this strategy vs. using single card
Phase 4: Response Formatting
Implementation
from pydantic import BaseModel
from typing import List, Optional
class RecommendedCard(BaseModel):
card_id: str
card_name: str
issuer: str
class Rewards(BaseModel):
points_earned: int
cash_value: float
earn_rate: str
class Alternative(BaseModel):
card_name: str
rewards: float
reason: str
class FinalRecommendation(BaseModel):
recommended_card: RecommendedCard
rewards: Rewards
reasoning: str
confidence: float
alternatives: List[Alternative]
warnings: List[str]
processing_time_ms: float
def format_recommendation(
mcp_results: dict,
reasoning: str,
processing_time: float
) -> FinalRecommendation:
"""Format final response"""
smart_wallet_result = mcp_results["smart_wallet"]
best_card = smart_wallet_result["recommended_card"]
# Extract alternatives
alternatives = []
for card in smart_wallet_result["all_cards_comparison"][1:4]:
alternatives.append(Alternative(
card_name=card["card_name"],
rewards=card["rewards"],
reason=card.get("note", "Lower rewards rate")
))
# Extract warnings
warnings = []
if "forecast" in mcp_results:
warnings.extend(mcp_results["forecast"].get("warnings", []))
return FinalRecommendation(
recommended_card=RecommendedCard(**best_card),
rewards=Rewards(**smart_wallet_result["rewards"]),
reasoning=reasoning,
confidence=calculate_confidence(mcp_results),
alternatives=alternatives,
warnings=warnings,
processing_time_ms=processing_time
)
Advanced Reasoning Patterns
1. Chain-of-Thought Reasoning
prompt = """
Let's think through this step-by-step:
Step 1: Identify the category
- Merchant: {merchant}
- MCC: {mcc}
- Likely category: ?
Step 2: List cards with bonuses in this category
- Card A: X% on category
- Card B: Y points per dollar
- Card C: Z% cashback
Step 3: Calculate actual rewards
- Card A: ${amount} Γ X% = $?
- Card B: ${amount} Γ Y points Γ $0.01 = $?
- Card C: ${amount} Γ Z% = $?
Step 4: Check constraints
- Is Card A accepted at merchant?
- Is Card B near spending cap?
- Does Card C have annual fee?
Step 5: Make recommendation
Based on steps 1-4, the best card is...
"""
2. Self-Consistency
# Generate multiple reasoning paths
reasoning_paths = []
for i in range(5):
response = model.generate_content(prompt, temperature=0.8)
reasoning_paths.append(response.text)
# Vote on most common recommendation
from collections import Counter
recommendations = [extract_card(path) for path in reasoning_paths]
most_common = Counter(recommendations).most_common(1)[0][0]
# Use the reasoning path that led to most common answer
final_reasoning = next(
path for path in reasoning_paths
if extract_card(path) == most_common
)
3. Reflection & Verification
# Initial recommendation
initial_rec = await generate_recommendation(transaction, mcp_results)
# Self-critique
critique_prompt = f"""
Review this credit card recommendation:
{initial_rec}
Are there any errors or oversights?
- Did we miss a better card?
- Are the math calculations correct?
- Did we consider all constraints?
- Is the reasoning sound?
If you find issues, provide corrections.
"""
critique = model.generate_content(critique_prompt)
# Refine if needed
if "error" in critique.text.lower() or "issue" in critique.text.lower():
final_rec = await refine_recommendation(initial_rec, critique.text)
else:
final_rec = initial_rec
Confidence Scoring
def calculate_confidence(mcp_results: dict) -> float:
"""
Calculate confidence score based on multiple factors
"""
confidence = 1.0
# Factor 1: Reward difference (higher difference = higher confidence)
best_reward = mcp_results["smart_wallet"]["recommended_card"]["rewards"]
second_best = mcp_results["smart_wallet"]["all_cards_comparison"][1]["rewards"]
reward_gap = (best_reward - second_best) / best_reward
if reward_gap < 0.1: # Less than 10% difference
confidence *= 0.8
# Factor 2: Merchant acceptance certainty
if "rewards_rag" in mcp_results:
rag_confidence = mcp_results["rewards_rag"]["sources"][0]["relevance_score"]
confidence *= rag_confidence
# Factor 3: Cap warnings
if "forecast" in mcp_results:
if mcp_results["forecast"].get("warnings"):
confidence *= 0.9
# Factor 4: Data freshness
# (Lower confidence for stale data)
return round(confidence, 2)
Error Handling & Fallbacks
async def recommend_with_fallback(transaction: dict):
"""Graceful degradation if MCP servers fail"""
try:
# Try full reasoning pipeline
plan = await create_execution_plan(transaction)
mcp_results = await execute_mcp_calls(plan, transaction)
reasoning = await synthesize_reasoning(transaction, mcp_results, plan)
return format_recommendation(mcp_results, reasoning)
except Exception as e:
logger.error(f"Full pipeline failed: {e}")
try:
# Fallback: Use only Smart Wallet MCP
result = await call_smart_wallet(transaction)
return format_simple_recommendation(result)
except Exception as e2:
logger.error(f"Fallback failed: {e2}")
# Last resort: Rule-based recommendation
return rule_based_recommendation(transaction)
def rule_based_recommendation(transaction: dict):
"""Simple rule-based fallback"""
rules = {
"Groceries": "Amex Gold (4x points)",
"Dining": "Amex Gold (4x points)",
"Travel": "Chase Sapphire Reserve (3x points)",
"Gas": "Costco Anywhere Visa (4% cashback)",
"Default": "Citi Double Cash (2% on everything)"
}
category = transaction["category"]
recommended = rules.get(category, rules["Default"])
return {
"recommended_card": recommended,
"reasoning": f"Based on category rules for {category}",
"confidence": 0.60, # Lower confidence for rule-based
"warnings": ["Recommendation based on simplified rules (MCP servers unavailable)"]
}
Testing & Evaluation
Unit Tests
import pytest
@pytest.mark.asyncio
async def test_planning_phase():
"""Test Claude's planning logic"""
transaction = {
"merchant": "Whole Foods",
"category": "Groceries",
"amount_usd": 127.50,
"mcc": "5411"
}
plan = await create_execution_plan(transaction)
assert "strategy" in plan
assert "mcp_calls" in plan
assert len(plan["mcp_calls"]) > 0
assert plan["confidence_threshold"] >= 0.5
@pytest.mark.asyncio
async def test_reasoning_phase():
"""Test Gemini's synthesis"""
mcp_results = {
"smart_wallet": {
"recommended_card": {"card_name": "Amex Gold"},
"rewards": {"cash_value": 5.10}
}
}
reasoning = await synthesize_reasoning({}, mcp_results, {})
assert "Amex Gold" in reasoning
assert "$5.10" in reasoning
Integration Tests
@pytest.mark.asyncio
async def test_end_to_end_recommendation():
"""Test full recommendation pipeline"""
transaction = {
"user_id": "test_user",
"merchant": "Whole Foods",
"category": "Groceries",
"amount_usd": 127.50,
"mcc": "5411"
}
result = await recommend_with_fallback(transaction)
assert result["recommended_card"]["card_name"]
assert result["rewards"]["cash_value"] > 0
assert result["confidence"] >= 0.5
assert len(result["reasoning"]) > 100
Related Documentation:
---