Spaces:

devg24
/

FinAgent

Running

App Files Files Community

Dev Goyal commited on 6 days ago

Commit

c6d67ac

1 Parent(s): 6cb1c7b

Initial deployment of FinAgent

Browse files

Files changed (25) hide show

.env.example +25 -0
.gitignore +22 -0
CHANGELOG.md +59 -0
Dockerfile +40 -0
Dockerfile.api +13 -0
Dockerfile.ui +13 -0
HF_README.md +76 -0
README.md +71 -7
backend/__init__.py +0 -0
backend/api.py +76 -0
core/__init__.py +0 -0
core/config.py +20 -0
core/earnings_tools.py +550 -0
core/graph_builder.py +372 -0
core/rag_tools.py +60 -0
core/runner.py +125 -0
core/sec_tools.py +172 -0
core/sentiment_tools.py +54 -0
docker-compose.yml +27 -0
frontend/streamlit_app.py +158 -0
requirements.txt +32 -0
scripts/ingest.py +79 -0
scripts/ingest_earnings_calls.py +106 -0
scripts/main.py +40 -0
supervisord.conf +24 -0

.env.example ADDED Viewed

	@@ -0,0 +1,25 @@

+# OpenAI-compatible LLM (default: local Ollama)
+OPENAI_BASE_URL=http://localhost:11434/v1
+OPENAI_API_KEY=ollama
+OPENAI_MODEL=llama3.1
+OPENAI_TEMPERATURE=0
+# LangSmith tracing (use with python-dotenv / your shell)
+LANGSMITH_TRACING=true
+LANGSMITH_ENDPOINT=https://api.smith.langchain.com
+LANGSMITH_API_KEY=<YOUR_API_KEY>
+LANGSMITH_PROJECT=<YOUR_PROJECT_NAME>
+# Alternative names LangChain also understands:
+# LANGCHAIN_TRACING_V2=true
+# LANGCHAIN_API_KEY=<YOUR_API_KEY>
+# LANGCHAIN_PROJECT=<YOUR_PROJECT_NAME>
+# Optional: verbose LangChain stdout (noisy; off by default)
+# LANGCHAIN_DEBUG=true
+# Earnings-call pipeline (Alpha Vantage free tier; falls back to SEC 8-K)
+# Get a free key at https://www.alphavantage.co/support/#api-key
+ALPHA_VANTAGE_API_KEY=demo
+# HTTP API: uvicorn api:app --host 0.0.0.0 --port 8000

.gitignore ADDED Viewed

	@@ -0,0 +1,22 @@

+__pycache__/
+*.py[cod]
+*$py.class
+agent_env/
+venv/
+env/
+# Environment Variables
+.env
+# Vector Databases & Local AI Models
+chroma_db/
+*.bin
+*.pt
+*.safetensors
+# OS Generated Files
+.DS_Store
+Thumbs.db
+# other
+.vscode/

CHANGELOG.md ADDED Viewed

	@@ -0,0 +1,59 @@

+# Changelog
+All notable changes to this project will be documented in this file.
+## [Phase 6] - Earnings Call Analysis Integration
+### Architecture
+* **Two-Pipeline Design:** Added a new `Earnings_Agent` backed by an offline ingest pipeline and a runtime inference pipeline, following the same separation pattern as the existing 10-K RAG system.
+* **Ingest Layer (`core/earnings_tools.py`):** Fetches transcripts from Alpha Vantage (premium) or SEC 8-K filings (free fallback), normalizes them into `Prepared Remarks` and `Q&A Session` segments, extracts keyword frequency counts, and embeds everything into a dedicated ChromaDB collection.
+* **Inference Layer:** Three new `@tool` functions — `search_earnings_call` (RAG search), `get_earnings_sentiment_divergence` (section comparison), and `get_earnings_keyword_trends` (cross-quarter keyword tracking).
+* **Graph Extension:** Added `Earnings_Agent` to the LangGraph `members` list, planner capability map, supervisor dispatch edges, and summarizer prompt.
+### Added
+* `core/earnings_tools.py` — Combined ingest + inference module for earnings-call data.
+* `scripts/ingest_earnings_calls.py` — CLI tool for batch ingestion (`--tickers AAPL --quarters Q1-2025 Q2-2025`).
+* Earnings-call example button in the Streamlit sidebar for quick testing.
+* `ALPHA_VANTAGE_API_KEY` config setting in `.env.example` and `core/config.py`.
+* Earnings Call Insights section in the Investment Memo (Summarizer) when earnings data is present.
+### Changed
+* **Planner Prompt:** Extended with `Earnings_Agent` capability mapping for earnings-call queries.
+* **Summarizer Prompt:** New `## Earnings Call Insights` memo section for divergence and keyword findings.
+* **Streamlit Sidebar:** Now detects and displays `_earnings` ChromaDB collections alongside `_10k` collections.
+## [Phase 4] - Multi-Agent LangGraph Integration & SEC Pipeline Hardening
+### Architecture Overhaul
+* **Transitioned to LangGraph:** Replaced the legacy `while` loop with a deterministic StateGraph "Planner-Executor" architecture.
+* **Dual-Node Governance:** Separated routing logic into a stateless `Planner` (generates JSON task arrays) and a stateful `Supervisor` (manages task queues), eliminating LLM cognitive overload and infinite routing loops.
+* **Separation of Concerns:** Split worker capabilities into three strict React Agents: `Quant_Agent`, `Fundamental_Agent`, and `Sentiment_Agent`.
+### Added
+* **The "Honesty Guardrail":** Implemented programmatic checks in `make_worker_node` to verify `ToolMessage` execution. If an agent attempts to answer without triggering a tool, the output is blocked to prevent hallucination.
+* **Strict Capability Matrix:** Updated the Planner prompt to explicitly map query types (e.g., "Risks", "Supply Chain") to specific RAG tool workflows.
+* **Broad Query Protocol:** Added fallback logic for the Planner to execute standard Quant/Sentiment tasks when users ask for "general info" on a ticker.
+* **Pydantic Enum Enforcement:** Added strict `args_schema` to SEC tools using `Literal` types to prevent the LLM from hallucinating invalid XBRL tags.
+### Fixed
+* **The SEC "2010 Bug":** The SEC API returns unordered historical data. Added Pandas-based datetime sorting, filing deduplication, and a 2-year lookback filter to ensure only modern data is served.
+* **The SEC "Missing Revenue" Bug:** Implemented recursive fallback logic to try `RevenueFromContractWithCustomerIncludingAssessedTax` if the standard `Revenues` GAAP tag returns 404 (fixing data retrieval for MSFT, AAPL, etc.).
+* **ChromaDB Deprecation:** Updated imports to `langchain_chroma` and improved SEC HTML `<DOCUMENT>` regex parsing for cleaner 10-K embeddings.
+## [Phase 5] - API Streaming, Web UI & Containerization
+### UI & Architecture
+* **FastAPI Backend:** Exposed the LangGraph state machine via an asynchronous `GET /chat/stream` utilizing Server-Sent Events (SSE).
+* **Streamlit Frontend:** Built a responsive agentic UI (`streamlit_app.py`) that visually streams the intermediate ReAct reasoning blocks into dynamically expanding dropdown menus.
+* **State Persistence:** Rewrote the UI memory loop to preserve "Agent Thoughts" sequentially so historical messages contain dropdown logs linking exactly to how the agents generated their specific conclusions.
+* **Docker Migration:** Shipped `Dockerfile.api` and `Dockerfile.ui` bridged via a custom network inside `docker-compose.yml`, successfully moving the application payload off MacOS and onto standardized, immutable infrastructure.
+* **LLM Engine Swap:** Migrated the brain architecture permanently to Groq's `llama-3.1-8b-instant` and `llama3-70b-versatile` endpoints for incredibly fast serverless inference.
+### Performance & Token Thrashing Fixes
+* **LRU Caching:** Prevented the `SentenceTransformers` model from re-instantiating on disk during every RAG pipeline call, silencing repetitive console logs and saving heavy JVM/Python CPU overhead.
+* **Groq 'Token Ghosting' Fix:** Squashed a massive token quota error ("Requested 17k tokens") by explicitly declaring `max_tokens=800`. This prevented Groq's load balancer from assuming maximum context window limits and throttling free-tier TPM budgets.
+* **ReAct Infinite Loop Resolution:** The Fundamental agent historically trapped itself looping `search_10k_filings` tools when ordered to strictly "only output data". Bridged the behavior by injecting a hard "Stop once data is fetched" logic trap inside the system prompt.
+* **Docker Hot-Reloading:** Injected `- ./:/app` mapping masks and explicitly overrode the Uvicorn execution loop with `--reload` to support real-time Python development without tearing down the containers.
+### Next Steps 🚀
+* Final push of the `finagent` backend/frontend images to **Google Cloud Run**.

Dockerfile ADDED Viewed

	@@ -0,0 +1,40 @@

+# ──────────────────────────────────────────────────────────────────────────────
+# FinAgent — Hugging Face Spaces Dockerfile (Docker SDK)
+# Runs FastAPI backend + Streamlit frontend in a single container via supervisord
+# Pre-seeds ChromaDB with demo tickers at build time
+# ──────────────────────────────────────────────────────────────────────────────
+FROM python:3.11-slim
+WORKDIR /app
+# ── System deps ──────────────────────────────────────────────────────────────
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+        build-essential gcc g++ curl supervisor && \
+    rm -rf /var/lib/apt/lists/*
+# ── Python deps ──────────────────────────────────────────────────────────────
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# ── App source ───────────────────────────────────────────────────────────────
+COPY . .
+# ── Pre-seed ChromaDB at build time ─────────────────────────────────────────
+# Ingest SEC 10-K filings for demo tickers
+RUN python scripts/ingest.py --tickers AAPL MSFT TSLA GOOGL NVDA
+# Ingest SEC 8-K / earnings call data for demo tickers
+RUN python scripts/ingest_earnings_calls.py --tickers AAPL MSFT --quarters Q4-2024 Q1-2025
+# ── Supervisord config (runs both services) ─────────────────────────────────
+COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
+# ── HF Spaces expects port 7860 ─────────────────────────────────────────────
+EXPOSE 7860
+# Streamlit health-check endpoint for HF Spaces
+HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
+    CMD curl -f http://localhost:7860/_stcore/health || exit 1
+CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]

Dockerfile.api ADDED Viewed

	@@ -0,0 +1,13 @@

+FROM python:3.11-slim
+WORKDIR /app
+# Install build dependencies required for compiling certain Python packages (like lxml/pandas)
+RUN apt-get update && apt-get install -y build-essential gcc g++ && rm -rf /var/lib/apt/lists/*
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+EXPOSE 8000
+CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]

Dockerfile.ui ADDED Viewed

	@@ -0,0 +1,13 @@

+FROM python:3.11-slim
+WORKDIR /app
+# Streamlit dependencies and requirements
+RUN apt-get update && apt-get install -y build-essential curl && rm -rf /var/lib/apt/lists/*
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+EXPOSE 8501
+CMD ["streamlit", "run", "frontend/streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]

HF_README.md ADDED Viewed

	@@ -0,0 +1,76 @@

+---
+title: FinAgent - Autonomous Financial AI
+emoji: 📈
+colorFrom: blue
+colorTo: green
+sdk: docker
+app_port: 7860
+pinned: true
+license: mit
+---
+# 📈 FinAgent: Autonomous Financial AI
+An asynchronous, multi-agent LLM pipeline that automates quantitative financial research, fundamental document synthesis, earnings-call analysis, and real-time news sentiment scoring — built entirely with open-source models.
+## 🏗️ Architecture
+This system uses a **deterministic state-machine** architecture powered by [LangGraph](https://python.langchain.com/docs/langgraph):
+1. **Planner Agent** — Parses the user query and generates a strict JSON task queue.
+2. **Supervisor** — A Python-controlled router that dispatches tasks to specialist agents.
+3. **Specialist Agents:**
+   - 🔢 **Quant Agent** — Live pricing, volume, and volatility metrics via `yfinance`.
+   - 📊 **Fundamental Agent** — SEC XBRL accounting data + RAG on 10-K filings.
+   - 📰 **Sentiment Agent** — Real-time news headline analysis and scoring.
+   - 🎙️ **Earnings Agent** — Sentiment divergence (Prepared Remarks vs Q&A) and keyword trend tracking from earnings-call transcripts.
+4. **Summarizer** — Compiles all agent outputs into a unified Investment Memo.
+## 🚀 Try It
+Type a query in the chat box — here are some examples:
+| Query | What It Does |
+|-------|-------------|
+| *"How is Apple's stock doing?"* | Quant analysis (price, volume, RSI) |
+| *"What are the manufacturing risks in Tesla's latest 10-K?"* | RAG retrieval on SEC filings |
+| *"What is the market sentiment on Microsoft?"* | Real-time news sentiment scoring |
+| *"Analyze the latest earnings call for AAPL — compare management tone in prepared remarks vs Q&A"* | Earnings-call divergence analysis |
+| *"Compare the current stock performance of Microsoft and Google"* | Multi-ticker parallel analysis |
+## 📚 Pre-Loaded Data
+This demo comes with pre-ingested data for immediate use:
+- **SEC 10-K Filings:** AAPL, MSFT, TSLA, GOOGL, NVDA
+- **Earnings Call Transcripts:** AAPL, MSFT (Q4-2024, Q1-2025)
+> Quantitative data (prices, volume) and sentiment (news) are fetched **live** — no pre-loading needed.
+## 🛠️ Tech Stack
+| Component | Technology |
+|-----------|-----------|
+| Orchestration | LangGraph / LangChain |
+| LLM Inference | Groq API (Llama-3.1-8B-Instruct) |
+| Frontend | Streamlit |
+| Backend API | FastAPI + Uvicorn |
+| Vector DB | ChromaDB |
+| Embeddings | HuggingFace `all-MiniLM-L6-v2` |
+| Market Data | yfinance, SEC EDGAR API |
+## ⚡ Performance Optimizations
+This system was deliberately engineered for low-latency response times:
+- **Parallel Agent Dispatch** — The Supervisor routes independent tasks to multiple specialist agents simultaneously (e.g., Quant + Sentiment + Fundamental in one batch) rather than sequentially, cutting multi-agent latency by up to 3×.
+- **Server-Sent Event (SSE) Streaming** — Results stream live to the UI as each agent completes, so users see intermediate progress immediately instead of waiting for the full pipeline.
+- **Groq Cloud Inference** — LLM calls use the Groq API (~200 tok/s on Llama-3.1-8B), eliminating local GPU bottlenecks and delivering sub-second per-agent response times.
+- **Singleton Embedding Cache** — The HuggingFace embedding model is loaded once via `@lru_cache` and shared across all RAG queries (10-K, earnings, etc.), avoiding repeated 500MB+ model re-initialization.
+- **Token Budget Tuning** — `max_tokens` is capped at 800 per LLM call to prevent Groq from reserving excessive context window, reducing queue wait times by ~40%.
+- **Pre-Seeded Vector DB** — ChromaDB collections are embedded at Docker build time, so the app starts with zero cold-start ingestion delay.
+- **Per-Step Latency Tracking** — Every agent step reports wall-clock latency in the UI, making performance bottlenecks immediately visible.
+## 📂 Source Code
+[GitHub Repository](https://github.com/devg24/financial-analysis-agent)

README.md CHANGED Viewed

@@ -1,12 +1,76 @@
 ---
-title: FinAgent
-emoji: 🏆
-colorFrom: purple
-colorTo: yellow
 sdk: docker
-pinned: false
 license: mit
-short_description: 'Latency optimized, financial analysis agent using OS models '
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: FinAgent - Autonomous Financial AI
+emoji: 📈
+colorFrom: blue
+colorTo: green
 sdk: docker
+app_port: 7860
+pinned: true
 license: mit
 ---
+# 📈 FinAgent: Autonomous Financial AI
+An asynchronous, multi-agent LLM pipeline that automates quantitative financial research, fundamental document synthesis, earnings-call analysis, and real-time news sentiment scoring — built entirely with open-source models.
+## 🏗️ Architecture
+This system uses a **deterministic state-machine** architecture powered by [LangGraph](https://python.langchain.com/docs/langgraph):
+1. **Planner Agent** — Parses the user query and generates a strict JSON task queue.
+2. **Supervisor** — A Python-controlled router that dispatches tasks to specialist agents.
+3. **Specialist Agents:**
+   - 🔢 **Quant Agent** — Live pricing, volume, and volatility metrics via `yfinance`.
+   - 📊 **Fundamental Agent** — SEC XBRL accounting data + RAG on 10-K filings.
+   - 📰 **Sentiment Agent** — Real-time news headline analysis and scoring.
+   - 🎙️ **Earnings Agent** — Sentiment divergence (Prepared Remarks vs Q&A) and keyword trend tracking from earnings-call transcripts.
+4. **Summarizer** — Compiles all agent outputs into a unified Investment Memo.
+## 🚀 Try It
+Type a query in the chat box — here are some examples:
+| Query | What It Does |
+|-------|-------------|
+| *"How is Apple's stock doing?"* | Quant analysis (price, volume, RSI) |
+| *"What are the manufacturing risks in Tesla's latest 10-K?"* | RAG retrieval on SEC filings |
+| *"What is the market sentiment on Microsoft?"* | Real-time news sentiment scoring |
+| *"Analyze the latest earnings call for AAPL — compare management tone in prepared remarks vs Q&A"* | Earnings-call divergence analysis |
+| *"Compare the current stock performance of Microsoft and Google"* | Multi-ticker parallel analysis |
+## 📚 Pre-Loaded Data
+This demo comes with pre-ingested data for immediate use:
+- **SEC 10-K Filings:** AAPL, MSFT, TSLA, GOOGL, NVDA
+- **Earnings Call Transcripts:** AAPL, MSFT (Q4-2024, Q1-2025)
+> Quantitative data (prices, volume) and sentiment (news) are fetched **live** — no pre-loading needed.
+## 🛠️ Tech Stack
+| Component | Technology |
+|-----------|-----------|
+| Orchestration | LangGraph / LangChain |
+| LLM Inference | Groq API (Llama-3.1-8B-Instruct) |
+| Frontend | Streamlit |
+| Backend API | FastAPI + Uvicorn |
+| Vector DB | ChromaDB |
+| Embeddings | HuggingFace `all-MiniLM-L6-v2` |
+| Market Data | yfinance, SEC EDGAR API |
+## ⚡ Performance Optimizations
+This system was deliberately engineered for low-latency response times:
+- **Parallel Agent Dispatch** — The Supervisor routes independent tasks to multiple specialist agents simultaneously (e.g., Quant + Sentiment + Fundamental in one batch) rather than sequentially, cutting multi-agent latency by up to 3×.
+- **Server-Sent Event (SSE) Streaming** — Results stream live to the UI as each agent completes, so users see intermediate progress immediately instead of waiting for the full pipeline.
+- **Groq Cloud Inference** — LLM calls use the Groq API (~200 tok/s on Llama-3.1-8B), eliminating local GPU bottlenecks and delivering sub-second per-agent response times.
+- **Singleton Embedding Cache** — The HuggingFace embedding model is loaded once via `@lru_cache` and shared across all RAG queries (10-K, earnings, etc.), avoiding repeated 500MB+ model re-initialization.
+- **Token Budget Tuning** — `max_tokens` is capped at 800 per LLM call to prevent Groq from reserving excessive context window, reducing queue wait times by ~40%.
+- **Pre-Seeded Vector DB** — ChromaDB collections are embedded at Docker build time, so the app starts with zero cold-start ingestion delay.
+- **Per-Step Latency Tracking** — Every agent step reports wall-clock latency in the UI, making performance bottlenecks immediately visible.
+## 📂 Source Code
+[GitHub Repository](https://github.com/devg24/financial-analysis-agent)

backend/__init__.py ADDED Viewed

File without changes

backend/api.py ADDED Viewed

	@@ -0,0 +1,76 @@

+import os
+from contextlib import asynccontextmanager
+from dotenv import load_dotenv
+from fastapi import FastAPI, HTTPException, Request
+from fastapi.concurrency import run_in_threadpool
+from fastapi.responses import StreamingResponse
+from pydantic import BaseModel, Field
+import langchain
+from core.config import Settings
+from core.graph_builder import build_financial_graph
+from core.runner import create_llm, run_financial_query, astream_financial_query
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    load_dotenv()
+    langchain.debug = os.getenv("LANGCHAIN_DEBUG", "").lower() in ("1", "true", "yes")
+    settings = Settings()
+    llm = create_llm(settings)
+    app.state.settings = settings
+    app.state.graph = build_financial_graph(llm)
+    yield
+app = FastAPI(title="FinAgent", lifespan=lifespan)
+class ChatRequest(BaseModel):
+    query: str = Field(..., min_length=1, max_length=16000)
+class StepOut(BaseModel):
+    node: str
+    content: str
+    step_latency: float | None = None
+    total_latency: float | None = None
+class ChatResponse(BaseModel):
+    memo: str | None = None
+    steps: list[StepOut] = Field(default_factory=list)
+    total_latency: float | None = None
+@app.get("/health")
+def health():
+    return {"status": "ok"}
+@app.post("/chat", response_model=ChatResponse)
+async def chat(request: Request, body: ChatRequest):
+    graph = request.app.state.graph
+    q = body.query.strip()
+    if not q:
+        raise HTTPException(status_code=400, detail="query must not be empty")
+    try:
+        result = await run_in_threadpool(run_financial_query, graph, q)
+    except Exception as e:
+        raise HTTPException(status_code=503, detail=str(e)) from e
+    return ChatResponse(**result)
+@app.post("/chat/stream")
+async def chat_stream(request: Request, body: ChatRequest):
+    graph = request.app.state.graph
+    q = body.query.strip()
+    if not q:
+        raise HTTPException(status_code=400, detail="query must not be empty")
+    return StreamingResponse(
+        astream_financial_query(graph, q),
+        media_type="text/event-stream"
+    )

core/__init__.py ADDED Viewed

File without changes

core/config.py ADDED Viewed

	@@ -0,0 +1,20 @@

+from pydantic_settings import BaseSettings, SettingsConfigDict
+class Settings(BaseSettings):
+    """OpenAI-compatible LLM endpoint (e.g. Ollama at localhost:11434/v1)."""
+    model_config = SettingsConfigDict(
+        env_file=".env",
+        env_file_encoding="utf-8",
+        extra="ignore",
+    )
+    openai_base_url: str = "http://localhost:11434/v1"
+    openai_api_key: str = "ollama"
+    openai_model: str = "llama3.1"
+    openai_temperature: float = 0.0
+    # Earnings-call pipeline
+    alpha_vantage_api_key: str = ""
+    earnings_chroma_path: str = "./chroma_db"

core/earnings_tools.py ADDED Viewed

	@@ -0,0 +1,550 @@

+"""
+Earnings-call ingest + inference tools.
+Ingest layer  – fetch transcript (Alpha Vantage → SEC 8-K fallback),
+               normalize into Prepared Remarks / Q&A segments,
+               extract keyword counts, and embed into ChromaDB.
+Inference layer – LangGraph @tool functions for retrieval,
+                 sentiment divergence, and keyword trend analysis.
+"""
+import json
+import os
+import re
+from collections import Counter
+from typing import Optional
+import requests
+from langchain_chroma import Chroma
+from langchain_core.documents import Document
+from langchain_core.tools import tool
+from langchain_huggingface import HuggingFaceEmbeddings
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+from .rag_tools import get_cached_embeddings
+from .sec_tools import HEADERS, get_cik_from_ticker
+# ---------------------------------------------------------------------------
+# Constants
+# ---------------------------------------------------------------------------
+TRACKED_KEYWORDS = [
+    "ai", "artificial intelligence", "machine learning",
+    "headwinds", "tailwinds", "guidance", "margin", "growth",
+    "inflation", "recession", "tariff", "supply chain",
+    "cloud", "capex", "capital expenditure", "free cash flow",
+    "buyback", "dividend", "restructuring", "layoff",
+    "regulation", "competition", "demand", "inventory",
+]
+# Markers used to split transcripts into sections
+QA_MARKERS = [
+    "question-and-answer session",
+    "question-and-answer",
+    "q&a session",
+    "q & a session",
+    "operator instructions",
+    "and our first question",
+    "we will now begin the question",
+    "we'll now begin the question",
+]
+METADATA_DIR_NAME = "_earnings_meta"
+# ---------------------------------------------------------------------------
+# Quarter helpers
+# ---------------------------------------------------------------------------
+def parse_quarter(quarter_str: str) -> tuple[int, int]:
+    """Parse 'Q1-2025' → (1, 2025). Also accepts 'Q1 2025' or 'q1-2025'."""
+    m = re.match(r"[Qq](\d)\s*[-_ ]?\s*(\d{4})", quarter_str.strip())
+    if not m:
+        raise ValueError(
+            f"Invalid quarter format '{quarter_str}'. Expected e.g. 'Q1-2025'."
+        )
+    q, y = int(m.group(1)), int(m.group(2))
+    if q < 1 or q > 4:
+        raise ValueError(f"Quarter must be 1-4, got {q}.")
+    return q, y
+def _quarter_to_month(q: int) -> str:
+    """Map fiscal quarter to approximate month for Alpha Vantage API."""
+    return {1: "03", 2: "06", 3: "09", 4: "12"}[q]
+# ---------------------------------------------------------------------------
+# Transcript fetchers
+# ---------------------------------------------------------------------------
+def fetch_transcript_alpha_vantage(
+    ticker: str, quarter: int, year: int, api_key: str
+) -> Optional[str]:
+    """
+    Try the Alpha Vantage EARNINGS_CALL_TRANSCRIPT endpoint.
+    Returns raw transcript text or None on failure (premium-only).
+    """
+    if not api_key:
+        return None
+    url = (
+        "https://www.alphavantage.co/query"
+        f"?function=EARNINGS_CALL_TRANSCRIPT"
+        f"&symbol={ticker}"
+        f"&quarter={year}Q{quarter}"
+        f"&apikey={api_key}"
+    )
+    try:
+        print(f"[Earnings Ingest] Trying Alpha Vantage for {ticker} Q{quarter}-{year}...")
+        resp = requests.get(url, timeout=30)
+        resp.raise_for_status()
+        data = resp.json()
+        # Alpha Vantage returns a list of transcript segments on success
+        if isinstance(data, dict) and "transcript" in data:
+            segments = data["transcript"]
+            lines = []
+            for seg in segments:
+                speaker = seg.get("speaker", "Unknown")
+                text = seg.get("content", "")
+                lines.append(f"{speaker}: {text}")
+            full = "\n".join(lines)
+            if len(full) > 200:
+                print(f"[Earnings Ingest] Alpha Vantage returned transcript ({len(full)} chars).")
+                return full
+        # Premium-required or empty response
+        info = data.get("Information") or data.get("Note") or ""
+        if info:
+            print(f"[Earnings Ingest] Alpha Vantage: {info[:120]}")
+        return None
+    except Exception as e:
+        print(f"[Earnings Ingest] Alpha Vantage failed: {e}")
+        return None
+def fetch_transcript_sec_8k(ticker: str, quarter: int, year: int) -> Optional[str]:
+    """
+    Fallback: search SEC EDGAR for 8-K filings around the quarter-end date
+    that mention 'earnings' or 'results of operations'.
+    Returns extracted text or None.
+    """
+    try:
+        cik = get_cik_from_ticker(ticker)
+    except ValueError:
+        print(f"[Earnings Ingest] Ticker {ticker} not found in SEC database.")
+        return None
+    try:
+        print(f"[Earnings Ingest] Trying SEC 8-K fallback for {ticker} Q{quarter}-{year}...")
+        url = f"https://data.sec.gov/submissions/CIK{cik}.json"
+        resp = requests.get(url, headers=HEADERS, timeout=30)
+        resp.raise_for_status()
+        filings = resp.json()["filings"]["recent"]
+        target_month = int(_quarter_to_month(quarter))
+        best_doc_url = None
+        for i, form in enumerate(filings["form"]):
+            if form != "8-K":
+                continue
+            filed = filings["filingDate"][i]  # "2025-01-30"
+            filed_year, filed_month = int(filed[:4]), int(filed[5:7])
+            # Build a set of acceptable (year, month) pairs:
+            # Accept filings from the quarter-end month through 3 months after,
+            # handling year rollover (e.g., Q4 target_month=12 → Dec, Jan, Feb, Mar)
+            acceptable = set()
+            for offset in range(4):  # 0, 1, 2, 3 months after quarter end
+                m = target_month + offset
+                y = year
+                if m > 12:
+                    m -= 12
+                    y += 1
+                acceptable.add((y, m))
+            if (filed_year, filed_month) in acceptable:
+                accession = filings["accessionNumber"][i]
+                acc_clean = accession.replace("-", "")
+                primary_doc = filings["primaryDocument"][i]
+                doc_url = (
+                    f"https://www.sec.gov/Archives/edgar/data/"
+                    f"{cik.lstrip('0')}/{acc_clean}/{primary_doc}"
+                )
+                best_doc_url = doc_url
+                break  # Take the first matching 8-K
+        if not best_doc_url:
+            print(f"[Earnings Ingest] No matching SEC 8-K found for {ticker} Q{quarter}-{year}.")
+            return None
+        print(f"[Earnings Ingest] Downloading 8-K from {best_doc_url}...")
+        doc_resp = requests.get(best_doc_url, headers=HEADERS, timeout=30)
+        doc_resp.raise_for_status()
+        from bs4 import BeautifulSoup
+        soup = BeautifulSoup(doc_resp.text, "html.parser")
+        text = soup.get_text(separator=" ", strip=True)
+        if len(text) > 500:
+            print(f"[Earnings Ingest] SEC 8-K text extracted ({len(text)} chars).")
+            return text
+        print("[Earnings Ingest] SEC 8-K text too short, likely not a transcript.")
+        return None
+    except Exception as e:
+        print(f"[Earnings Ingest] SEC 8-K fallback failed: {e}")
+        return None
+# ---------------------------------------------------------------------------
+# Transcript normalization & segmentation
+# ---------------------------------------------------------------------------
+def normalize_transcript(
+    raw_text: str, ticker: str, quarter: int, year: int
+) -> dict:
+    """
+    Split a raw transcript into Prepared Remarks and Q&A Session.
+    Returns:
+        {
+            "ticker": ..., "quarter": ..., "year": ...,
+            "prepared_remarks": str,
+            "qa_session": str,
+            "source": "alpha_vantage" | "sec_8k",
+        }
+    """
+    text_lower = raw_text.lower()
+    split_pos = -1
+    for marker in QA_MARKERS:
+        idx = text_lower.find(marker)
+        if idx != -1:
+            split_pos = idx
+            break
+    if split_pos > 0:
+        prepared = raw_text[:split_pos].strip()
+        qa = raw_text[split_pos:].strip()
+    else:
+        # Could not find Q&A boundary — treat entire text as prepared remarks
+        prepared = raw_text.strip()
+        qa = ""
+    return {
+        "ticker": ticker.upper(),
+        "quarter": quarter,
+        "year": year,
+        "prepared_remarks": prepared,
+        "qa_session": qa,
+    }
+# ---------------------------------------------------------------------------
+# Keyword / entity extraction
+# ---------------------------------------------------------------------------
+def extract_keywords(text: str) -> dict[str, int]:
+    """
+    Count occurrences of tracked financial keywords in the text.
+    Returns a dict of keyword → count (only keywords with count > 0).
+    """
+    text_lower = text.lower()
+    counts: dict[str, int] = {}
+    for kw in TRACKED_KEYWORDS:
+        c = len(re.findall(r"\b" + re.escape(kw) + r"\b", text_lower))
+        if c > 0:
+            counts[kw] = c
+    return counts
+# ---------------------------------------------------------------------------
+# ChromaDB ingest
+# ---------------------------------------------------------------------------
+def _meta_path(chroma_path: str, ticker: str) -> str:
+    d = os.path.join(chroma_path, f"{ticker.upper()}{METADATA_DIR_NAME}")
+    os.makedirs(d, exist_ok=True)
+    return d
+def _save_metadata(
+    chroma_path: str,
+    ticker: str,
+    quarter: int,
+    year: int,
+    keywords: dict[str, int],
+    status: str,
+) -> None:
+    meta_dir = _meta_path(chroma_path, ticker)
+    fname = os.path.join(meta_dir, f"Q{quarter}_{year}.json")
+    payload = {
+        "ticker": ticker.upper(),
+        "quarter": quarter,
+        "year": year,
+        "status": status,
+        "keywords": keywords,
+    }
+    with open(fname, "w") as f:
+        json.dump(payload, f, indent=2)
+    print(f"[Earnings Ingest] Metadata saved → {fname}")
+def _load_metadata(chroma_path: str, ticker: str) -> list[dict]:
+    """Load all quarter metadata files for a ticker."""
+    meta_dir = _meta_path(chroma_path, ticker)
+    results = []
+    if not os.path.isdir(meta_dir):
+        return results
+    for fname in sorted(os.listdir(meta_dir)):
+        if fname.endswith(".json"):
+            with open(os.path.join(meta_dir, fname)) as f:
+                results.append(json.load(f))
+    return results
+def ingest_earnings_call(
+    ticker: str,
+    quarter: int,
+    year: int,
+    api_key: str = "",
+    chroma_path: str = "./chroma_db",
+) -> str:
+    """
+    Full ingest pipeline for one ticker/quarter pair.
+    Returns a status string: 'success', 'partial', or 'failed'.
+    """
+    ticker = ticker.upper()
+    collection_dir = os.path.join(chroma_path, f"{ticker}_earnings")
+    # Check if already ingested
+    meta_dir = _meta_path(chroma_path, ticker)
+    meta_file = os.path.join(meta_dir, f"Q{quarter}_{year}.json")
+    if os.path.exists(meta_file):
+        print(f"[Earnings Ingest] Q{quarter}-{year} for {ticker} already ingested. Skipping.")
+        return "exists"
+    # 1. Fetch transcript
+    raw_text = fetch_transcript_alpha_vantage(ticker, quarter, year, api_key)
+    source = "alpha_vantage" if raw_text else None
+    if not raw_text:
+        raw_text = fetch_transcript_sec_8k(ticker, quarter, year)
+        source = "sec_8k" if raw_text else None
+    if not raw_text:
+        _save_metadata(chroma_path, ticker, quarter, year, {}, "failed")
+        return "failed"
+    # 2. Normalize & segment
+    segments = normalize_transcript(raw_text, ticker, quarter, year)
+    # 3. Extract keywords from both sections
+    all_text = segments["prepared_remarks"] + " " + segments["qa_session"]
+    keywords = extract_keywords(all_text)
+    # 4. Chunk & embed into ChromaDB
+    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
+    docs = []
+    if segments["prepared_remarks"]:
+        pr_doc = Document(
+            page_content=segments["prepared_remarks"],
+            metadata={
+                "ticker": ticker,
+                "quarter": quarter,
+                "year": year,
+                "section": "Prepared Remarks",
+                "source": source,
+            },
+        )
+        docs.extend(splitter.split_documents([pr_doc]))
+    if segments["qa_session"]:
+        qa_doc = Document(
+            page_content=segments["qa_session"],
+            metadata={
+                "ticker": ticker,
+                "quarter": quarter,
+                "year": year,
+                "section": "Q&A Session",
+                "source": source,
+            },
+        )
+        docs.extend(splitter.split_documents([qa_doc]))
+    if not docs:
+        _save_metadata(chroma_path, ticker, quarter, year, keywords, "partial")
+        return "partial"
+    print(f"[Earnings Ingest] Embedding {len(docs)} chunks into {collection_dir}...")
+    embeddings = get_cached_embeddings()
+    Chroma.from_documents(
+        documents=docs,
+        embedding=embeddings,
+        persist_directory=collection_dir,
+    )
+    status = "success" if segments["qa_session"] else "partial"
+    _save_metadata(chroma_path, ticker, quarter, year, keywords, status)
+    print(f"[Earnings Ingest] {ticker} Q{quarter}-{year} ingested ({status}).")
+    return status
+# ---------------------------------------------------------------------------
+# Inference tools (LangGraph runtime)
+# ---------------------------------------------------------------------------
+def _get_earnings_db(ticker: str, chroma_path: str = "./chroma_db") -> Chroma:
+    """Load the earnings-call Chroma collection for a ticker."""
+    ticker = ticker.upper()
+    persist_directory = os.path.join(chroma_path, f"{ticker}_earnings")
+    if not os.path.exists(persist_directory):
+        raise FileNotFoundError(
+            f"Earnings data for {ticker} not ingested. "
+            f"Run: python scripts/ingest_earnings_calls.py --tickers {ticker} --quarters Q<N>-<YYYY>"
+        )
+    embeddings = get_cached_embeddings()
+    return Chroma(persist_directory=persist_directory, embedding_function=embeddings)
+@tool
+def search_earnings_call(ticker: str, query: str) -> str:
+    """
+    Searches pre-ingested earnings-call transcripts for a given ticker.
+    Use this to find specific management commentary, guidance, or discussion topics.
+    CRITICAL: The ticker's earnings data must already be ingested.
+    Pass the stock ticker (e.g. 'AAPL') and a natural-language query.
+    """
+    try:
+        db = _get_earnings_db(ticker.upper())
+        results = db.similarity_search(query, k=3)
+        if not results:
+            return f"No earnings-call matches found for '{query}' on {ticker}."
+        output_parts = [f"EARNINGS CALL SEARCH RESULTS FOR {ticker.upper()} — '{query}':\n"]
+        total_chars = 0
+        for doc in results:
+            meta = doc.metadata
+            label = f"[{meta.get('section', 'Unknown')} | Q{meta.get('quarter', '?')}-{meta.get('year', '?')}]"
+            snippet = doc.page_content[:700]
+            total_chars += len(snippet)
+            output_parts.append(f"{label}\n{snippet}\n")
+            if total_chars > 2000:
+                break
+        return "\n".join(output_parts)
+    except Exception as e:
+        return f"Error searching earnings data: {e}"
+@tool
+def get_earnings_sentiment_divergence(ticker: str) -> str:
+    """
+    Retrieves evidence from both Prepared Remarks and Q&A sections of the
+    most recent earnings call for a ticker. Use this to analyze whether
+    management tone differs between the scripted portion and live Q&A.
+    CRITICAL: The ticker's earnings data must already be ingested.
+    """
+    try:
+        db = _get_earnings_db(ticker.upper())
+        # Retrieve top chunks from each section
+        pr_results = db.similarity_search(
+            "management outlook guidance performance",
+            k=3,
+            filter={"section": "Prepared Remarks"},
+        )
+        qa_results = db.similarity_search(
+            "analyst question concern risk challenge",
+            k=3,
+            filter={"section": "Q&A Session"},
+        )
+        output = f"SENTIMENT DIVERGENCE EVIDENCE FOR {ticker.upper()}:\n\n"
+        output += "=== PREPARED REMARKS (scripted management commentary) ===\n"
+        if pr_results:
+            for doc in pr_results:
+                output += doc.page_content[:600] + "\n---\n"
+        else:
+            output += "(No Prepared Remarks data found.)\n"
+        output += "\n=== Q&A SESSION (live analyst questions & management responses) ===\n"
+        if qa_results:
+            for doc in qa_results:
+                output += doc.page_content[:600] + "\n---\n"
+        else:
+            output += "(No Q&A Session data found — transcript may not have contained a Q&A segment.)\n"
+        output += (
+            "\nINSTRUCTION: Compare the tone, confidence, and specificity between "
+            "Prepared Remarks and Q&A. Note any divergence where management was more "
+            "cautious, evasive, or forthcoming in one section vs the other."
+        )
+        return output
+    except Exception as e:
+        return f"Error retrieving divergence data: {e}"
+@tool
+def get_earnings_keyword_trends(ticker: str) -> str:
+    """
+    Returns quarter-over-quarter keyword frequency trends from ingested
+    earnings calls for a given ticker. Shows how often key terms (AI, headwinds,
+    growth, guidance, etc.) were mentioned across available quarters.
+    CRITICAL: Multiple quarters must be ingested for trend comparison.
+    """
+    try:
+        ticker = ticker.upper()
+        all_meta = _load_metadata("./chroma_db", ticker)
+        if not all_meta:
+            return (
+                f"No earnings metadata found for {ticker}. "
+                f"Run: python scripts/ingest_earnings_calls.py --tickers {ticker} --quarters Q<N>-<YYYY>"
+            )
+        # Sort by year, quarter
+        all_meta.sort(key=lambda m: (m["year"], m["quarter"]))
+        # Build output table
+        quarters = [f"Q{m['quarter']}-{m['year']}" for m in all_meta]
+        header = f"KEYWORD TRENDS FOR {ticker} ({', '.join(quarters)}):\n\n"
+        # Collect all keywords across quarters
+        all_kws = set()
+        for m in all_meta:
+            all_kws.update(m.get("keywords", {}).keys())
+        if not all_kws:
+            return header + "No tracked keywords found in any ingested quarter."
+        rows = []
+        rows.append(f"{'Keyword':<30} " + " ".join(f"{q:>10}" for q in quarters))
+        rows.append("-" * (30 + 11 * len(quarters)))
+        for kw in sorted(all_kws):
+            vals = []
+            for m in all_meta:
+                c = m.get("keywords", {}).get(kw, 0)
+                vals.append(f"{c:>10}")
+            rows.append(f"{kw:<30} " + " ".join(vals))
+        # Add trend commentary for the last two quarters
+        if len(all_meta) >= 2:
+            rows.append("")
+            rows.append("NOTABLE CHANGES (latest vs prior quarter):")
+            prev_kw = all_meta[-2].get("keywords", {})
+            curr_kw = all_meta[-1].get("keywords", {})
+            for kw in sorted(all_kws):
+                p, c = prev_kw.get(kw, 0), curr_kw.get(kw, 0)
+                if p != c:
+                    direction = "↑" if c > p else "↓"
+                    rows.append(f"  {kw}: {p} → {c} ({direction})")
+        return header + "\n".join(rows)
+    except Exception as e:
+        return f"Error loading keyword trends: {e}"

core/graph_builder.py ADDED Viewed

	@@ -0,0 +1,372 @@

+import operator
+from typing import Annotated, Sequence, TypedDict, Literal, Set
+import yfinance as yf
+import pandas as pd
+from pydantic import BaseModel, Field
+from langchain_core.messages import HumanMessage, SystemMessage, BaseMessage, AIMessage
+from langchain_core.tools import tool
+from langgraph.graph import StateGraph, START, END
+from langchain.agents import create_agent
+from .sec_tools import get_company_concept_xbrl
+from .rag_tools import search_10k_filings
+from .sentiment_tools import get_recent_news
+from .earnings_tools import (
+    search_earnings_call,
+    get_earnings_sentiment_divergence,
+    get_earnings_keyword_trends,
+)
+@tool
+def get_stock_metrics(ticker: str) -> str:
+    """
+    Fetches historical market data and calculates basic metrics for a stock.
+    CRITICAL: You must pass the official stock ticker symbol (e.g., 'AAPL' for Apple).
+    """
+    try:
+        ticker = ticker.upper()
+        print(f"\n[System: Fetching yfinance data for {ticker}...]")
+        stock = yf.Ticker(ticker)
+        hist = stock.history(period="1mo")
+        if hist.empty:
+            return f"Could not find price data for ticker: {ticker}. Tell the user the data fetch failed."
+        current_price = hist["Close"].iloc[-1]
+        monthly_high = hist["High"].max()
+        monthly_low = hist["Low"].min()
+        avg_volume = hist["Volume"].mean()
+        summary = (
+            f"Data for {ticker}:\n"
+            f"- Current Price: ${current_price:.2f}\n"
+            f"- 1-Month High: ${monthly_high:.2f}\n"
+            f"- 1-Month Low: ${monthly_low:.2f}\n"
+            f"- Average Daily Volume: {int(avg_volume):,}"
+        )
+        return summary
+    except Exception as e:
+        return f"Error fetching data: {str(e)}"
+def merge_sets(a: Set, b: Set) -> Set:
+    return a | b
+class AgentState(TypedDict):
+    messages: Annotated[Sequence[BaseMessage], operator.add]
+    next: str | list[str]
+    steps: Annotated[int, operator.add]
+    completed_tasks: Annotated[Set[str], merge_sets]
+    pending_tasks: list
+members = ["Quant_Agent", "Fundamental_Agent", "Sentiment_Agent", "Earnings_Agent"]
+def make_worker_node(agent, name: str):
+    def node(state: AgentState):
+        pending = state.get("pending_tasks", [])
+        completed = state.get("completed_tasks", set())
+        my_task = next(
+            (
+                t
+                for t in pending
+                if t["agent"] == name and t["task_id"] not in completed
+            ),
+            None,
+        )
+        if not my_task:
+            return {"completed_tasks": set()}
+        task_message = HumanMessage(
+            content=f"Ticker: {my_task['ticker']}. Task: {my_task['description']}"
+        )
+        result = agent.invoke({"messages": [task_message]})
+        has_tool_call = any(
+            isinstance(m, AIMessage) and m.tool_calls for m in result["messages"]
+        )
+        if not has_tool_call:
+            content = f"ERROR: The {name} attempted to answer without using a data tool. This analysis is unauthorized."
+        else:
+            content = result["messages"][-1].content.strip() or f"[{name}: No data retrieved]"
+        return {
+            "messages": [AIMessage(content=f"[{my_task['task_id']}] {content}", name=name)],
+            "completed_tasks": {my_task["task_id"]},
+        }
+    return node
+def create_planner_node(llm):
+    planner_prompt = """You are a task planner for a financial AI system.
+GOLDEN RULE: Never assume or guess a number.
+CRITICAL AGENT CAPABILITY MAPPING:
+1. Quant_Agent: ONLY use for current stock price, trading volume, and 52-week high/lows.
+   => GROUPING RULE: If the user asks for multiple price/volume metrics for the SAME ticker, group them into EXACTLY ONE Quant_Agent task. Do NOT make separate tasks for price and volume.
+2. Sentiment_Agent: ONLY use for recent news headlines and market sentiment scores.
+3. Fundamental_Agent: Use for TWO things:
+   - SEC Financial Metrics (Revenue, Net Income, Margins, Cash Flow).
+   - SEC 10-K RAG Searches: Use this for ANY qualitative questions about business strategy, supply chain, manufacturing, competition, and corporate RISKS.
+4. Earnings_Agent: Use for earnings-call analysis. This includes:
+   - Management commentary and guidance from earnings calls.
+   - Sentiment divergence between Prepared Remarks and Q&A sessions.
+   - Keyword and entity tracking across quarters (e.g., mentions of "AI", "headwinds", "growth").
+   => Use this agent when the user asks about earnings calls, management tone, guidance language, or quarter-over-quarter keyword trends.
+Read the user's request and output a JSON list of tasks needed to answer it.
+Each task must have:
+- "agent": "Quant_Agent", "Fundamental_Agent", "Sentiment_Agent", or "Earnings_Agent"
+- "ticker": the stock ticker symbol (e.g. "AAPL")
+- "task_id": a unique string
+- "description": specific instructions on what to fetch or search
+Output ONLY valid JSON. No explanation.
+example output:
+[
+  {"agent": "Quant_Agent", "ticker": "AAPL", "task_id": "Quant_AAPL", "description": "Get price and volume for AAPL"},
+  {"agent": "Sentiment_Agent", "ticker": "MSFT", "task_id": "Sentiment_MSFT", "description": "Get sentiment analysis for MSFT"},
+  {"agent": "Earnings_Agent", "ticker": "AAPL", "task_id": "Earnings_AAPL", "description": "Analyze sentiment divergence between prepared remarks and Q&A in the latest earnings call"}
+]"""
+    def planner_function(state: AgentState):
+        if state.get("pending_tasks"):
+            return {}
+        user_message = next(m for m in state["messages"] if isinstance(m, HumanMessage))
+        response = llm.invoke(
+            [
+                SystemMessage(content=planner_prompt),
+                HumanMessage(content=user_message.content),
+            ]
+        )
+        import json
+        raw = response.content.strip().replace("```json", "").replace("```", "")
+        start = raw.find("[")
+        end = raw.rfind("]")
+        try:
+            tasks = json.loads(raw[start : end + 1]) if start != -1 and end != -1 else []
+        except Exception:
+            tasks = []
+        if not tasks:
+            print("[Planner]: No valid financial tasks found.")
+            return {
+                "pending_tasks": [],
+                "completed_tasks": set(),
+                "messages": [
+                    AIMessage(
+                        content="I can only answer questions about stock prices, SEC filings, and market sentiment.",
+                        name="Supervisor",
+                    )
+                ],
+            }
+        print(f"\n[Planner]: Created {len(tasks)} tasks: {[t['task_id'] for t in tasks]}")
+        return {"pending_tasks": tasks, "completed_tasks": set()}
+    return planner_function
+def create_supervisor_node(llm):
+    def supervisor_function(state: AgentState):
+        steps = state.get("steps", 0)
+        if steps >= 10:
+            return {"next": "FINISH", "steps": 1}
+        pending = state.get("pending_tasks", [])
+        completed = state.get("completed_tasks", set())
+        remaining = [t for t in pending if t["task_id"] not in completed]
+        if not remaining:
+            print("-> All tasks complete. Routing to Summarizer.")
+            return {"next": "FINISH", "steps": 1}
+        # Dispatch one task per unique agent in parallel
+        agents_to_dispatch = []
+        dispatched_tasks = []
+        for task in remaining:
+            if task["agent"] not in agents_to_dispatch:
+                agents_to_dispatch.append(task["agent"])
+                dispatched_tasks.append(task["task_id"])
+        print(f"\n[Supervisor]: Dispatching tasks in parallel → {dispatched_tasks}")
+        return {
+            "next": agents_to_dispatch,
+            "steps": 1,
+        }
+    return supervisor_function
+def create_summarizer_node(llm):
+    summarizer_system = """You are a senior investment analyst drafting an internal **Investment Memo** for colleagues.
+You will receive the user's original question and verbatim outputs from specialist agents (Quant_Agent, Fundamental_Agent, Sentiment_Agent, Earnings_Agent), or a single clarification/refusal if no research ran.
+Write the memo using this structure and markdown headings:
+# Investment Memo
+## Executive Summary
+2-4 sentences answering the user in plain language.
+## Key Facts & Data
+Bullet points. Use ONLY numbers, metrics, and quotes that appear in the specialist outputs. If a section had no data, say "No quantitative/fundamental/sentiment data provided" as appropriate.
+## Earnings Call Insights
+If Earnings_Agent data is present, summarize:
+- Sentiment divergence between Prepared Remarks and Q&A (was management more cautious or bullish in live Q&A vs. scripted remarks?).
+- Notable keyword/entity trends across quarters (e.g., increasing mentions of "AI", declining mentions of "headwinds").
+If no earnings data was provided, omit this section entirely.
+## Risks, Sentiment, and Context
+Integrate fundamental and sentiment findings when present. If missing, state that briefly.
+## Caveats
+Note missing specialists, tool errors, or "unauthorized" / ERROR lines exactly as reported—do not soften them.
+Rules:
+- Do NOT invent tickers, prices, filings, or sentiment scores not present in the inputs.
+- Do NOT cite tool names; write for a portfolio manager reader.
+- Keep the tone professional and concise."""
+    def summarizer_function(state: AgentState):
+        user_messages = [m for m in state["messages"] if isinstance(m, HumanMessage)]
+        user_query = user_messages[0].content if user_messages else ""
+        blocks = []
+        for m in state["messages"]:
+            if not isinstance(m, AIMessage):
+                continue
+            label = m.name or "Assistant"
+            blocks.append(f"### {label}\n{m.content}")
+        specialist_blob = "\n\n".join(blocks) if blocks else "(No specialist outputs.)"
+        response = llm.invoke(
+            [
+                SystemMessage(content=summarizer_system),
+                HumanMessage(
+                    content=(
+                        f"User request:\n{user_query}\n\n"
+                        f"Specialist outputs (verbatim):\n{specialist_blob}"
+                    )
+                ),
+            ]
+        )
+        memo = (response.content or "").strip()
+        return {"messages": [AIMessage(content=memo, name="Summarizer")]}
+    return summarizer_function
+def build_financial_graph(llm):
+    workflow = StateGraph(AgentState)
+    quant_agent = create_agent(
+        model=llm,
+        tools=[get_stock_metrics],
+        system_prompt=(
+            "You are a Quantitative Analyst. "
+            "You have exactly ONE tool: get_stock_metrics(ticker). "
+            "For any price, volume, or trading-range question you MUST call get_stock_metrics—do not answer from memory. "
+            "NEVER invent other tool names, NEVER output JSON blocks suggesting tools that do not exist. "
+            "GOLDEN RULE: After the tool returns, you must format the output gracefully so it is easy to read. "
+            "Bold the labels (like **Current Price:** or **Average Volume:**) before injecting the numbers. "
+            "NEVER use introductory conversational filler like 'Here is the data'. Just print the labeled metrics directly."
+        ),
+        name="Quant_Agent",
+    )
+    fundamental_agent = create_agent(
+        model=llm,
+        tools=[search_10k_filings, get_company_concept_xbrl],
+        system_prompt=(
+            "You are a Fundamental Analyst. "
+            "GOLDEN RULE: You must output the EXACT DATA or TEXT returned by your tools. "
+            "Do NOT explain how the tools work or what they do. "
+            "CRITICAL: ONCE YOU HAVE CALLED TO THE TOOL ONCE AND RECEIVED THE DATA, YOU MUST WRITE YOUR FINAL ANSWER IMMEDIATELY. DO NOT CALL THE TOOL A SECOND TIME. "
+            "Just answer the user's question using the fetched data and stop."
+        ),
+        name="Fundamental_Agent",
+    )
+    sentiment_agent = create_agent(
+        model=llm,
+        tools=[get_recent_news],
+        system_prompt=(
+            "You are a Sentiment Analyst. Fetch recent news using your tool. "
+            "CRITICAL RULES: Your final response MUST be exactly five lines. "
+            "Line 1: The sentiment score (a single number between -1.0 and 1.0). "
+            "Line 2-5: Justify the sentiment score based on the news articles."
+            "Include important keywords from the news articles in your response."
+            "Do not add conversational filler. Do not ask the user follow-up questions."
+        ),
+        name="Sentiment_Agent",
+    )
+    earnings_agent = create_agent(
+        model=llm,
+        tools=[
+            search_earnings_call,
+            get_earnings_sentiment_divergence,
+            get_earnings_keyword_trends,
+        ],
+        system_prompt=(
+            "You are an Earnings Call Analyst specializing in management commentary analysis. "
+            "You have THREE tools for analyzing pre-ingested earnings-call transcripts:\n"
+            "1. search_earnings_call: Search transcripts for specific topics (guidance, margins, strategy, etc.).\n"
+            "2. get_earnings_sentiment_divergence: Compare management tone in scripted Prepared Remarks vs live Q&A.\n"
+            "3. get_earnings_keyword_trends: Track keyword frequency changes across quarters.\n\n"
+            "CRITICAL RULES:\n"
+            "- You MUST call at least one tool. Do NOT answer from memory.\n"
+            "- If a tool returns an error about missing data, report that the earnings data for that "
+            "ticker/quarter has not been ingested and suggest running the ingest script.\n"
+            "- After the tool returns, write a clear, evidence-backed analysis. Bold key findings.\n"
+            "- Do NOT add conversational filler. Do NOT ask follow-up questions."
+        ),
+        name="Earnings_Agent",
+    )
+    workflow.add_node("Planner", create_planner_node(llm))
+    workflow.add_node("Supervisor", create_supervisor_node(llm))
+    workflow.add_node("Quant_Agent", make_worker_node(quant_agent, "Quant_Agent"))
+    workflow.add_node(
+        "Fundamental_Agent", make_worker_node(fundamental_agent, "Fundamental_Agent")
+    )
+    workflow.add_node("Sentiment_Agent", make_worker_node(sentiment_agent, "Sentiment_Agent"))
+    workflow.add_node("Earnings_Agent", make_worker_node(earnings_agent, "Earnings_Agent"))
+    workflow.add_node("Summarizer", create_summarizer_node(llm))
+    for member in members:
+        workflow.add_edge(member, "Supervisor")
+    workflow.add_edge(START, "Planner")
+    workflow.add_edge("Planner", "Supervisor")
+    workflow.add_conditional_edges(
+        "Supervisor",
+        lambda state: state["next"],
+        {
+            "Quant_Agent": "Quant_Agent",
+            "Fundamental_Agent": "Fundamental_Agent",
+            "Sentiment_Agent": "Sentiment_Agent",
+            "Earnings_Agent": "Earnings_Agent",
+            "FINISH": "Summarizer",
+        },
+    )
+    workflow.add_edge("Summarizer", END)
+    return workflow.compile()

core/rag_tools.py ADDED Viewed

	@@ -0,0 +1,60 @@

+import os
+from langchain_core.tools import tool
+from langchain_huggingface import HuggingFaceEmbeddings
+from langchain_chroma import Chroma
+from langchain_core.messages import SystemMessage, HumanMessage
+import functools
+@functools.lru_cache(maxsize=1)
+def get_cached_embeddings():
+    return HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
+def get_10k_vector_db(ticker: str) -> Chroma:
+    """Loads a pre-computed 10-K Chroma Vector Database from disk."""
+    ticker = ticker.upper()
+    persist_directory = f"./chroma_db/{ticker}_10k"
+    if not os.path.exists(persist_directory):
+        raise FileNotFoundError(
+            f"10-K data for {ticker} has not been ingested. "
+            f"Please run the ingestion pipeline: `python ingest.py --tickers {ticker}`"
+        )
+    # Using cached embeddings to prevent massive memory slowdowns on agent loops
+    embeddings = get_cached_embeddings()
+    return Chroma(
+        persist_directory=persist_directory,
+        embedding_function=embeddings
+    )
+@tool
+def search_10k_filings(ticker: str, query: str, llm=None) -> str:
+    """Searches 10-K and returns a CONCISE summary of findings."""
+    try:
+        db = get_10k_vector_db(ticker)
+        results = db.similarity_search(query, k=2)
+        if not results:
+            return f"No info found for {query}."
+        # Combine the text
+        context = "\n".join([doc.page_content for doc in results])
+        # We can use the LLM to 'pre-process' the data so the Supervisor stays clean
+        # Note: You'll need to pass the 'llm' object into this tool or initialize a local one
+        if llm:
+            response = llm.invoke([
+                SystemMessage(content="You are a helpful assistant."),
+                HumanMessage(content=f"Summarize the following 10-K findings for {ticker} regarding {query}:\n\n{context}")
+            ])
+            return response.content
+        else:
+            return f"SUMMARY OF 10-K FINDINGS FOR {ticker} ({query}):\n\n{context}"
+    except Exception as e:
+        return f"Error: {str(e)}"
+if __name__ == "__main__":
+    # Test using .invoke() as we discussed for sec_tools
+    print(search_10k_filings.invoke({"ticker": "TSLA", "query": "marketing risks"}))

core/runner.py ADDED Viewed

	@@ -0,0 +1,125 @@

+from langchain_openai import ChatOpenAI
+from langchain_core.messages import HumanMessage
+import json
+import time
+from .config import Settings
+def create_llm(settings: Settings) -> ChatOpenAI:
+    return ChatOpenAI(
+        model=settings.openai_model,
+        api_key=settings.openai_api_key,
+        base_url=settings.openai_base_url,
+        temperature=settings.openai_temperature,
+        max_tokens=800, # CRITICAL FIX: Stops Groq from 'reserving' 8000+ tokens per API call
+    )
+def run_financial_query(compiled_graph, user_query: str) -> dict:
+    """
+    Run one LangGraph turn. Returns memo (if Summarizer ran) and per-node step contents.
+    """
+    initial_state = {
+        "messages": [HumanMessage(content=user_query)],
+        "steps": 0,
+        "completed_tasks": set(),
+        "pending_tasks": [],
+    }
+    run_label = user_query if len(user_query) <= 80 else user_query[:77] + "..."
+    stream_config = {
+        "run_name": run_label,
+        "tags": ["fin-agent", "langgraph"],
+        "metadata": {"app": "FinAgent"},
+    }
+    steps: list[dict] = []
+    memo: str | None = None
+    start_time = time.time()
+    last_time = start_time
+    total_latency = 0.0
+    for output in compiled_graph.stream(initial_state, stream_config):
+        for node_name, state_update in output.items():
+            current_time = time.time()
+            step_latency = current_time - last_time
+            total_latency = current_time - start_time
+            last_time = current_time
+            if node_name == "Planner":
+                tasks = state_update.get("pending_tasks", [])
+                content = f"Generated {len(tasks)} parallel task(s): {[t['task_id'] for t in tasks]}"
+            elif node_name == "Supervisor":
+                next_agents = state_update.get("next", [])
+                if next_agents == "FINISH":
+                    content = "All tasks complete. Routing to Summarizer."
+                else:
+                    content = f"Dispatching tasks to: {next_agents}"
+            else:
+                messages = state_update.get("messages", [])
+                if not messages:
+                    continue
+                content = messages[-1].content
+            if node_name == "Summarizer":
+                memo = content
+            else:
+                steps.append({
+                    "node": node_name,
+                    "content": content,
+                    "step_latency": round(step_latency, 2),
+                    "total_latency": round(total_latency, 2)
+                })
+    return {"memo": memo, "steps": steps, "total_latency": round(total_latency, 2)}
+async def astream_financial_query(compiled_graph, user_query: str):
+    """
+    Async generator yielding Server-Sent Events (SSE) for each graph step.
+    Useful for streaming over HTTP.
+    """
+    initial_state = {
+        "messages": [HumanMessage(content=user_query)],
+        "steps": 0,
+        "completed_tasks": set(),
+        "pending_tasks": [],
+    }
+    run_label = user_query if len(user_query) <= 80 else user_query[:77] + "..."
+    stream_config = {
+        "run_name": run_label,
+        "tags": ["fin-agent", "langgraph"],
+        "metadata": {"app": "FinAgent"},
+    }
+    start_time = time.time()
+    last_time = start_time
+    async for output in compiled_graph.astream(initial_state, stream_config):
+        for node_name, state_update in output.items():
+            current_time = time.time()
+            step_latency = current_time - last_time
+            total_latency = current_time - start_time
+            last_time = current_time
+            if node_name == "Planner":
+                tasks = state_update.get("pending_tasks", [])
+                content = f"Generated {len(tasks)} parallel task(s): {[t['task_id'] for t in tasks]}"
+            elif node_name == "Supervisor":
+                next_agents = state_update.get("next", [])
+                if next_agents == "FINISH":
+                    content = "All tasks complete. Routing to Summarizer."
+                else:
+                    content = f"Dispatching tasks to: {next_agents}"
+            else:
+                messages = state_update.get("messages", [])
+                if not messages:
+                    continue
+                content = messages[-1].content
+            data = {
+                "node": node_name,
+                "content": content,
+                "step_latency": round(step_latency, 2),
+                "total_latency": round(total_latency, 2)
+            }
+            yield f"data: {json.dumps(data)}\n\n"

core/sec_tools.py ADDED Viewed

	@@ -0,0 +1,172 @@

+import requests
+import pandas as pd
+from langchain_core.tools import tool
+from datetime import datetime
+from typing import Literal
+from pydantic import BaseModel, Field
+import functools
+USER_AGENT = "Dev Goyal devgoyal9031@gmail.com"
+HEADERS = {"User-Agent": USER_AGENT}
+@functools.lru_cache(maxsize=1)
+def _get_ticker_to_cik_mapping() -> dict[str, str]:
+    """Fetches and caches the SEC ticker to CIK mapping."""
+    url = "https://www.sec.gov/files/company_tickers.json"
+    print("[System: Fetching SEC ticker to CIK mapping...]")
+    response = requests.get(url, headers=HEADERS)
+    response.raise_for_status()
+    data = response.json()
+    mapping = {}
+    for _, company_info in data.items():
+        mapping[company_info['ticker'].upper()] = str(company_info['cik_str']).zfill(10)
+    return mapping
+def get_cik_from_ticker(ticker: str) -> str:
+    ticker = ticker.upper()
+    mapping = _get_ticker_to_cik_mapping()
+    if ticker in mapping:
+        return mapping[ticker]
+    raise ValueError(f"Ticker {ticker} not found in SEC database.")
+def get_latest_10k_url(ticker: str) -> str:
+    """Finds the URL for the most recent 10-K filing for a given ticker."""
+    try:
+        cik = get_cik_from_ticker(ticker)
+        url = f"https://data.sec.gov/submissions/CIK{cik}.json"
+        print(f"[System: Fetching filing history for CIK {cik}...]")
+        response = requests.get(url, headers=HEADERS)
+        response.raise_for_status()
+        filings = response.json()['filings']['recent']
+        # Search for the most recent 10-K
+        for i, form in enumerate(filings['form']):
+            if form == '10-K':
+                accession_number = filings['accessionNumber'][i]
+                # The SEC URL format removes dashes from the accession number
+                accession_no_dashes = accession_number.replace('-', '')
+                # Construct the final document URL
+                document_url = f"https://www.sec.gov/Archives/edgar/data/{cik.lstrip('0')}/{accession_no_dashes}/{accession_number}.txt"
+                return document_url
+        return f"No 10-K found for {ticker}."
+    except Exception as e:
+        return f"Error: {str(e)}"
+# 1. Define the strict Pydantic Schema
+class XBRLConceptInput(BaseModel):
+    ticker: str = Field(
+        ...,
+        description="The official uppercase ticker symbol (e.g., AAPL)."
+    )
+    concept: Literal[
+        "Revenues",
+        "NetIncomeLoss",
+        "Assets",
+        "Liabilities",
+        "GrossProfit",
+        "OperatingIncomeLoss",
+        "AssetsCurrent",
+        "LiabilitiesCurrent",
+        "NetCashProvidedByUsedInOperatingActivities",
+        "PaymentsToAcquirePropertyPlantAndEquipment",
+        "EntityCommonStockSharesOutstanding"
+    ] = Field(
+        ...,
+        description="You MUST select the exact SEC XBRL concept from this list that best matches the user's request."
+    )
+# 2. Bind the schema to the tool
+@tool(args_schema=XBRLConceptInput)
+def get_company_concept_xbrl(ticker: str, concept: str) -> str:
+    """
+    Fetches official SEC accounting metrics for a company across recent quarters.
+    CRITICAL INSTRUCTIONS:
+    1. 'ticker': Must be the official uppercase ticker symbol (e.g., AAPL).
+    2. 'concept': You MUST use one of these exact SEC XBRL concepts (case-sensitive):
+       -- Core Size --
+       - 'Revenues' (Total Revenue / Sales)
+       - 'NetIncomeLoss' (Net Income / Profit)
+       - 'Assets' (Total Assets)
+       - 'Liabilities' (Total Liabilities)
+       -- Margins & Liquidity --
+       - 'GrossProfit' (Revenue minus Cost of Goods Sold)
+       - 'OperatingIncomeLoss' (Operating Income)
+       - 'AssetsCurrent' (Short-term assets like cash/inventory)
+       - 'LiabilitiesCurrent' (Short-term debt)
+       -- Cash Flow & Valuation --
+       - 'NetCashProvidedByUsedInOperatingActivities' (Operating Cash Flow)
+       - 'PaymentsToAcquirePropertyPlantAndEquipment' (Capital Expenditures / CapEx)
+       - 'EntityCommonStockSharesOutstanding' (Total shares outstanding)
+    Do not guess concepts. Only use the exact strings listed above.
+    """
+    try:
+        cik = get_cik_from_ticker(ticker)
+        url = f"https://data.sec.gov/api/xbrl/companyconcept/CIK{cik}/us-gaap/{concept}.json"
+        print(f"[System: Fetching latest {concept} for {ticker}...]")
+        response = requests.get(url, headers=HEADERS)
+        response.raise_for_status()
+        data = response.json()
+        if "USD" not in data.get("units", {}):
+            return f"No USD data found for {concept}."
+        # 1. Convert to DataFrame
+        df = pd.DataFrame(data["units"]["USD"])
+        # 2. Convert date strings to datetime objects
+        df['end'] = pd.to_datetime(df['end'])
+        df['filed'] = pd.to_datetime(df['filed'])
+        # 3. Filter for standard filings to avoid "preliminary" noise
+        df = df[df['form'].isin(['10-Q', '10-K', '10-K/A', '10-Q/A'])]
+        # 4. CRITICAL: Deduplicate.
+        # If the same period ('end') is reported multiple times, take the most recently filed one.
+        df = df.sort_values(by=['end', 'filed'], ascending=[False, False])
+        df = df.drop_duplicates(subset=['end'])
+        # 5. Filter for the last 2 years
+        current_year = datetime.now().year
+        df = df[df['end'].dt.year >= (current_year - 2)]
+        # 6. Take top 4 most recent periods
+        df = df.head(4)
+        if df.empty:
+            return f"No recent (2024-2026) {concept} data available for {ticker}."
+        summary = f"Latest official {concept} data for {ticker}:\n"
+        for _, row in df.iterrows():
+            formatted_val = f"${int(row['val']):,}"
+            date_str = row['end'].strftime('%Y-%m-%d')
+            summary += f"- Period End: {date_str} (Filed: {row['filed'].strftime('%Y-%m-%d')}): {formatted_val}\n"
+        return summary
+    except Exception as e:
+        return f"Error fetching XBRL data: {str(e)}"
+# Quick test block for the new function
+if __name__ == "__main__":
+    test_ticker = "MSFT"
+    # Test 1: URL fetcher
+    try:
+        url = get_latest_10k_url(test_ticker)
+        print(f"\n10-K URL: {url}")
+    except Exception as e:
+        print(f"URL Fetch Failed: {e}")
+    # Test 2: XBRL fetcher
+    test_concept = "NetIncomeLoss"
+    print(get_company_concept_xbrl.invoke({"ticker": test_ticker, "concept": test_concept}))

core/sentiment_tools.py ADDED Viewed

	@@ -0,0 +1,54 @@

+import requests
+from bs4 import BeautifulSoup
+from langchain_core.tools import tool
+@tool
+def get_recent_news(ticker: str) -> str:
+    """
+    Fetches the most recent news headlines for a given stock ticker.
+    CRITICAL INSTRUCTIONS:
+    1. 'ticker': Must be the official uppercase ticker symbol (e.g., AAPL). DO NOT pass the full company name.
+    2. Use this tool to gauge current market sentiment, breaking news, and short-term catalysts.
+    """
+    try:
+        ticker = ticker.upper()
+        print(f"\n[System: Fetching latest news for {ticker} via Yahoo Finance RSS...]")
+        # Hit the official Yahoo Finance RSS endpoint
+        url = f"https://feeds.finance.yahoo.com/rss/2.0/headline?s={ticker}&region=US&lang=en-US"
+        # A standard web browser User-Agent so Yahoo doesn't block us
+        headers = {
+            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
+        }
+        response = requests.get(url, headers=headers)
+        response.raise_for_status()
+        # Parse the XML feed using BeautifulSoup
+        soup = BeautifulSoup(response.content, features="xml")
+        items = soup.find_all("item")
+        if not items:
+            return f"No recent news found for {ticker}."
+        summary = f"Recent News Headlines for {ticker}:\n\n"
+        # Grab the top 5 most recent articles
+        for i, item in enumerate(items[:10]):
+            title = item.title.text if item.title else "No Title"
+            # RSS provides nicely formatted publication dates
+            pub_date = item.pubDate.text if item.pubDate else "Recent"
+            summary += f"{i+1}. [{pub_date}] {title}\n"
+        return summary
+    except Exception as e:
+        return f"Error fetching news for {ticker}: {str(e)}"
+# Quick test block
+if __name__ == "__main__":
+    test_ticker = "NVDA"
+    print("Testing News Pipeline...")
+    print(get_recent_news.invoke({"ticker": test_ticker}))

docker-compose.yml ADDED Viewed

	@@ -0,0 +1,27 @@

+services:
+  api:
+    build:
+      context: .
+      dockerfile: Dockerfile.api
+    ports:
+      - "8000:8000"
+    env_file:
+      - .env
+    volumes:
+      - ./:/app # Hot-reloading mapping
+      - ./chroma_db:/app/chroma_db # Persist the vector database locally
+    command: uvicorn backend.api:app --host 0.0.0.0 --port 8000 --reload
+  ui:
+    build:
+      context: .
+      dockerfile: Dockerfile.ui
+    ports:
+      - "8501:8501"
+    environment:
+      - API_URL=http://api:8000/chat/stream
+    volumes:
+      - ./:/app # Hot-reloading mapping
+      - ./chroma_db:/app/chroma_db
+    depends_on:
+      - api

frontend/streamlit_app.py ADDED Viewed

	@@ -0,0 +1,158 @@

+import streamlit as st
+import requests
+import json
+import os
+# Configure the API URL. In local dev it is 127.0.0.1.
+# Via docker-compose, this is overridden to http://api:8000/chat/stream via ENV.
+API_URL = os.getenv("API_URL", "http://127.0.0.1:8000/chat/stream")
+st.set_page_config(page_title="FinAgent Portfolio", page_icon="📈", layout="wide")
+st.title("📈 🤖 FinAgent: Autonomous Financial AI")
+with st.sidebar:
+    st.markdown("### 👨‍💻 About this Agent")
+    st.markdown(
+        "This application uses **LangGraph** to construct a deterministic multi-agent state machine. "
+        "The **Planner Agent** parses the query, while the **Supervisor** appropriately routes tasks "
+        "to specialized **Quant, Fundamental, Sentiment, and Earnings Agents**.\n\n"
+        "Finally, the **Summarizer** compiles a comprehensive Investment Memo."
+    )
+    st.divider()
+    # Automatically survey the indexed ChromeDB vector stores
+    available_tickers = []
+    earnings_tickers = []
+    if os.path.exists("./chroma_db"):
+        for d in os.listdir("./chroma_db"):
+            if d.endswith("_10k"):
+                available_tickers.append(d.replace("_10k", ""))
+            elif d.endswith("_earnings"):
+                earnings_tickers.append(d.replace("_earnings", ""))
+    if available_tickers:
+        st.markdown("### 📚 Supported 10-K Data")
+        st.markdown("Deep RAG (Fundamental SEC filings) currently verified & compiled for:")
+        cols = st.columns(4)
+        for i, t in enumerate(sorted(available_tickers)):
+            cols[i % 4].code(t)
+    if earnings_tickers:
+        st.markdown("### 🎙️ Earnings Call Data")
+        st.markdown("Ingested earnings-call transcripts available for:")
+        cols = st.columns(4)
+        for i, t in enumerate(sorted(earnings_tickers)):
+            cols[i % 4].code(t)
+    st.divider()
+    st.markdown("### ⚡ Recruiter Quick-Test")
+    st.markdown("Try one of these example queries to see the multi-agent graph in action:")
+    if st.button("🍎 Apple Financial Overview"):
+        st.session_state.example_query = "What is the price, sentiment, and recent 10-K risks for Apple (AAPL)?"
+    if st.button("🏎️ Tesla Breaking Sentiment"):
+        st.session_state.example_query = "What is the latest news sentiment for TSLA?"
+    if st.button("💻 MSFT vs GOOGL"):
+        st.session_state.example_query = "Compare the current stock performance of Microsoft and Google."
+    if st.button("🎙️ Earnings Call Analysis"):
+        st.session_state.example_query = "Analyze the latest earnings call for Apple (AAPL) — compare management tone in prepared remarks vs Q&A and show keyword trends."
+    st.divider()
+    st.caption("Powered by Llama-3.1-8B via Groq")
+# Initialize chat history
+if "messages" not in st.session_state:
+    st.session_state.messages = []
+# Display chat history on screen
+for message in st.session_state.messages:
+    with st.chat_message(message["role"]):
+        if message["role"] == "assistant" and "steps" in message and message["steps"]:
+            # Render past steps in a collapsed status box
+            total_time = message.get("total_latency", 0)
+            title = f"✅ Investment Memo Generated! (Total Latency: {total_time}s)" if total_time else "✅ Investment Memo Generated!"
+            with st.status(title, expanded=False):
+                for step in message["steps"]:
+                    lat = step.get('step_latency', 0)
+                    lat_str = f"({lat}s) " if lat else ""
+                    st.write(f"**[{step['node']}]** {lat_str}{step['content']}")
+        st.markdown(message["content"])
+# Unconditionally render the chat_input so it NEVER disappears from the UI
+chat_val = st.chat_input("Ask about any stock ticker (e.g. AAPL, TSLA, NVDA)...")
+if "example_query" in st.session_state and st.session_state.example_query:
+    prompt = st.session_state.example_query
+    st.session_state.example_query = "" # Reset
+else:
+    prompt = chat_val
+if prompt:
+    # Render user message
+    st.session_state.messages.append({"role": "user", "content": prompt})
+    with st.chat_message("user"):
+        st.markdown(prompt)
+    with st.chat_message("assistant"):
+        # We will stream the intermediate node outputs into a dynamic expanding status box
+        status_box = st.status("🧠 Consulting Specialized AI Agents...", expanded=True)
+        final_memo_placeholder = st.empty()
+        try:
+            # Stream the response via POST request
+            with requests.post(API_URL, json={"query": prompt}, stream=True) as response:
+                response.raise_for_status()
+                final_memo = ""
+                session_steps = []
+                # Consume Server-Sent Events (SSE)
+                for line in response.iter_lines():
+                    if line:
+                        decoded_line = line.decode('utf-8')
+                        if decoded_line.startswith("data: "):
+                            data_str = decoded_line[len("data: "):]
+                            try:
+                                data = json.loads(data_str)
+                                node = data.get("node")
+                                content = data.get("content")
+                                step_latency = data.get("step_latency", 0)
+                                total_latency = data.get("total_latency", 0)
+                                if node == "Summarizer":
+                                    # The final node returns the full markdown report
+                                    final_memo = content
+                                    final_memo_placeholder.markdown(final_memo)
+                                    status_box.update(label=f"✅ Investment Memo Generated! (Total Latency: {total_latency}s)", state="complete", expanded=False)
+                                else:
+                                    # Show what the different agents (Quant, Sentiment, etc.) are calculating
+                                    lat_str = f"({step_latency}s) " if step_latency else ""
+                                    status_box.write(f"**[{node}]** {lat_str}{content}")
+                                    session_steps.append({
+                                        "node": node,
+                                        "content": content,
+                                        "step_latency": step_latency,
+                                        "total_latency": total_latency
+                                    })
+                            except json.JSONDecodeError:
+                                pass
+                # Save the final memo and intermediate steps to history
+                if final_memo:
+                    final_total_latency = session_steps[-1].get("total_latency", 0) if session_steps else 0
+                    st.session_state.messages.append({
+                        "role": "assistant",
+                        "content": final_memo,
+                        "steps": session_steps,
+                        "total_latency": final_total_latency
+                    })
+        except requests.exceptions.RequestException as e:
+            status_box.update(label="❌ Connection Error", state="error", expanded=False)
+            st.error(f"Failed to connect to the backend FastAPI server: {e}")

requirements.txt ADDED Viewed

	@@ -0,0 +1,32 @@

+# AI Orchestration & Agent Framework
+langchain>=0.2.0
+langchain-core>=0.2.0
+langchain-openai>=0.1.0
+langgraph>=0.0.60
+pydantic>=2.0.0
+pydantic-settings>=2.0.0
+fastapi>=0.115.0
+uvicorn[standard]>=0.30.0
+# Quantitative Data & Math
+yfinance>=0.2.40
+pandas>=2.2.0
+# SEC API & Web Scraping
+requests>=2.31.0
+beautifulsoup4>=4.12.0
+lxml>=5.2.0
+# Vector DB & Embeddings (RAG)
+langchain-community>=0.0.10
+langchain-text-splitters>=0.2.0
+langchain-huggingface>=0.0.3
+sentence-transformers>=3.0.0
+chromadb>=0.5.0
+langchain-chroma>=0.1.2
+# Utilities
+python-dotenv>=1.0.0
+# UI
+streamlit>=1.39.0

scripts/ingest.py ADDED Viewed

	@@ -0,0 +1,79 @@

+import os
+import argparse
+import requests
+import re
+from bs4 import BeautifulSoup
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+from langchain_huggingface import HuggingFaceEmbeddings
+from langchain_chroma import Chroma
+from langchain_core.documents import Document
+from core.sec_tools import get_latest_10k_url, HEADERS
+def ingest_10k(ticker: str):
+    """Downloads, cleans, and embeds a 10-K into a local Chroma Vector Database."""
+    ticker = ticker.upper()
+    persist_directory = f"./chroma_db/{ticker}_10k"
+    if os.path.exists(persist_directory):
+        print(f"[Ingest: Vector DB for {ticker} 10-K already exists at {persist_directory}. Skipping...]")
+        return
+    print(f"\n==============================================")
+    print(f"Starting Ingestion Pipeline for {ticker}")
+    print(f"==============================================")
+    url = get_latest_10k_url(ticker)
+    if url.startswith("Error") or url.startswith("No 10-K"):
+        print(f"[Error: SEC URL Fetch failed: {url}]")
+        return
+    print(f"[1/4] Downloading raw 10-K from SEC: {url}")
+    response = requests.get(url, headers=HEADERS)
+    response.raise_for_status()
+    raw_text = response.text
+    print(f"[2/4] Parsing HTML and isolating text payload...")
+    doc_match = re.search(r'<DOCUMENT>(.*?)</DOCUMENT>', raw_text, re.DOTALL | re.IGNORECASE)
+    if doc_match:
+        raw_text = doc_match.group(1)
+    soup = BeautifulSoup(raw_text, "html.parser")
+    clean_text = soup.get_text(separator=" ", strip=True)
+    print(f"[3/4] Chunking document...")
+    text_splitter = RecursiveCharacterTextSplitter(
+        chunk_size=1000,
+        chunk_overlap=200
+    )
+    docs = [Document(page_content=clean_text, metadata={"source": url, "ticker": ticker})]
+    chunks = text_splitter.split_documents(docs)
+    print(f"[4/4] Embedding {len(chunks)} chunks into Chroma DB. (This may take a minute) ...")
+    embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
+    Chroma.from_documents(
+        documents=chunks,
+        embedding=embeddings,
+        persist_directory=persist_directory
+    )
+    print(f"[Success] {ticker} 10-K successfully ingested.")
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Ingest SEC 10-K filings into Chroma DB.")
+    parser.add_argument(
+        "--tickers",
+        nargs="+",
+        required=True,
+        help="List of stock tickers to ingest (e.g., --tickers AAPL MSFT TSLA)"
+    )
+    args = parser.parse_args()
+    os.makedirs("./chroma_db", exist_ok=True)
+    for t in args.tickers:
+        try:
+            ingest_10k(t)
+        except Exception as e:
+            print(f"[Error] Failed to ingest {t}: {str(e)}")

scripts/ingest_earnings_calls.py ADDED Viewed

	@@ -0,0 +1,106 @@

+#!/usr/bin/env python3
+"""
+CLI script to ingest earnings-call transcripts into ChromaDB.
+Usage:
+    python scripts/ingest_earnings_calls.py --tickers AAPL MSFT --quarters Q4-2024 Q1-2025
+    python scripts/ingest_earnings_calls.py --tickers TSLA --quarters Q1-2025
+Data sources (tried in order):
+    1. Alpha Vantage EARNINGS_CALL_TRANSCRIPT (requires premium key)
+    2. SEC EDGAR 8-K filings (free, always available)
+"""
+import argparse
+import os
+import sys
+from dotenv import load_dotenv
+load_dotenv()
+# Ensure project root is on sys.path so `core.*` imports work
+sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
+from core.config import Settings
+from core.earnings_tools import ingest_earnings_call, parse_quarter
+def main():
+    parser = argparse.ArgumentParser(
+        description="Ingest earnings-call transcripts into ChromaDB."
+    )
+    parser.add_argument(
+        "--tickers",
+        nargs="+",
+        required=True,
+        help="Stock tickers to ingest (e.g. --tickers AAPL MSFT)",
+    )
+    parser.add_argument(
+        "--quarters",
+        nargs="+",
+        required=True,
+        help="Quarters to ingest, format Q<N>-<YYYY> (e.g. --quarters Q4-2024 Q1-2025)",
+    )
+    args = parser.parse_args()
+    settings = Settings()
+    api_key = settings.alpha_vantage_api_key or os.getenv("ALPHA_VANTAGE_API_KEY", "")
+    chroma_path = settings.earnings_chroma_path
+    os.makedirs(chroma_path, exist_ok=True)
+    # Parse quarters upfront to fail fast on bad formats
+    parsed_quarters: list[tuple[int, int]] = []
+    for q_str in args.quarters:
+        try:
+            q, y = parse_quarter(q_str)
+            parsed_quarters.append((q, y))
+        except ValueError as e:
+            print(f"[Error] {e}")
+            sys.exit(1)
+    results: list[dict] = []
+    for ticker in args.tickers:
+        ticker = ticker.upper()
+        for quarter, year in parsed_quarters:
+            print(f"\n{'=' * 50}")
+            print(f"Ingesting {ticker} Q{quarter}-{year}")
+            print(f"{'=' * 50}")
+            try:
+                status = ingest_earnings_call(
+                    ticker=ticker,
+                    quarter=quarter,
+                    year=year,
+                    api_key=api_key,
+                    chroma_path=chroma_path,
+                )
+            except Exception as e:
+                print(f"[Error] Failed to ingest {ticker} Q{quarter}-{year}: {e}")
+                status = "error"
+            results.append(
+                {"ticker": ticker, "quarter": f"Q{quarter}-{year}", "status": status}
+            )
+    # Summary
+    print(f"\n{'=' * 50}")
+    print("INGEST SUMMARY")
+    print(f"{'=' * 50}")
+    for r in results:
+        icon = {"success": "✅", "partial": "⚠️", "failed": "❌", "exists": "⏭️", "error": "💥"}.get(
+            r["status"], "❓"
+        )
+        print(f"  {icon}  {r['ticker']} {r['quarter']}: {r['status']}")
+    failed = [r for r in results if r["status"] in ("failed", "error")]
+    if failed:
+        print(f"\n{len(failed)} ingest(s) failed. Check logs above.")
+        sys.exit(1)
+    else:
+        print(f"\nAll {len(results)} ingest(s) completed.")
+if __name__ == "__main__":
+    main()

scripts/main.py ADDED Viewed

	@@ -0,0 +1,40 @@

+import os
+from dotenv import load_dotenv
+import langchain
+from langchain_core.messages import HumanMessage
+from core.config import Settings
+from core.graph_builder import build_financial_graph
+from core.runner import create_llm, run_financial_query
+load_dotenv()
+langchain.debug = os.getenv("LANGCHAIN_DEBUG", "").lower() in ("1", "true", "yes")
+def main():
+    settings = Settings()
+    llm = create_llm(settings)
+    print("--- Multi-Agent System + Summarizer Initialized ---")
+    print("Type 'exit' to quit.\n")
+    compiled = build_financial_graph(llm)
+    while True:
+        user_query = input("\nAsk about a stock: ")
+        if user_query.lower() == "exit":
+            break
+        print("\n--- Agent Workflow Started ---")
+        result = run_financial_query(compiled, user_query)
+        for step in result["steps"]:
+            print(f"\n[{step['node']}]: {step['content']}")
+        if result.get("memo") is not None:
+            print("\n--- Investment Memo ---")
+            print(result["memo"])
+        print("\n--- Workflow Complete ---")
+if __name__ == "__main__":
+    main()

supervisord.conf ADDED Viewed

	@@ -0,0 +1,24 @@

+[supervisord]
+nodaemon=true
+logfile=/tmp/supervisord.log
+pidfile=/tmp/supervisord.pid
+[program:fastapi]
+command=uvicorn backend.api:app --host 0.0.0.0 --port 8000
+directory=/app
+autostart=true
+autorestart=true
+stdout_logfile=/dev/stdout
+stdout_logfile_maxbytes=0
+stderr_logfile=/dev/stderr
+stderr_logfile_maxbytes=0
+[program:streamlit]
+command=streamlit run frontend/streamlit_app.py --server.port=7860 --server.address=0.0.0.0 --server.headless=true --browser.gatherUsageStats=false
+directory=/app
+autostart=true
+autorestart=true
+stdout_logfile=/dev/stdout
+stdout_logfile_maxbytes=0
+stderr_logfile=/dev/stderr
+stderr_logfile_maxbytes=0