Dev-Docs-Rag / dev_docs_rag_implementation_plan.md
rishitbhowmick's picture
feat: MVP for a developer documents analyzer.
7312837

Production Grade RAG β€” Developer Documentation Assistant

Implementation Plan


Project Overview

A domain-specific "Ask my Docs" system built for developer documentation. Users can query any SDK/API/framework documentation and receive accurate answers with citations.

Initial Target Domain: FastAPI Documentation

Locked MVP Choices

The MVP is intentionally constrained to one implementation path. Any alternatives listed in earlier notes are out of scope until the baseline system is working and evaluated.

Area MVP Choice
Documentation source Official FastAPI documentation pages only
Ingestion method BeautifulSoup crawler + local normalized markdown/json output
Extra sources No GitHub /docs sync, no PDF upload, no arbitrary .md ingestion in MVP
Chunking Structure-aware chunking with LangChain RecursiveCharacterTextSplitter
Embeddings OpenAI text-embedding-3-small
Vector store ChromaDB only
Keyword retrieval rank_bm25
Fusion Reciprocal Rank Fusion (RRF)
Re-ranker Local cross-encoder/ms-marco-MiniLM-L-6-v2
Answer model OpenAI gpt-4o-mini
API layer FastAPI
Evaluation Ragas + a hand-written FastAPI eval set
CI gate Lightweight retrieval/regression checks on PRs; full eval run on demand or nightly

MVP Scope Boundaries

  • Index only the official FastAPI docs corpus.
  • Treat the MVP as a single-tenant local/dev system.
  • Optimize for answer quality and citation correctness before UI polish.
  • Defer multi-doc search, version-aware retrieval, PDF ingestion, and production vector DB migration until after MVP validation.

Phase 1: Fundamentals (Data Ingestion Pipeline)

Step 1 β€” Document Collection

  • Crawl official FastAPI documentation using BeautifulSoup
  • Normalize each page into local markdown/json before chunking
  • Store raw HTML plus cleaned content locally for reproducibility
  • Exclude GitHub docs sync, PDF upload, and arbitrary file ingestion from MVP
  • Acceptance criteria:
    • All target FastAPI pages are fetched successfully
    • Each page is stored with source_url, page_title, and crawl timestamp
    • Re-running ingestion updates changed pages without duplicating records

Step 2 β€” Chunking Strategy

  • Use LangChain RecursiveCharacterTextSplitter
  • Chunk size target: 700 tokens
  • Overlap: 100 tokens
  • Preserve code blocks and heading boundaries whenever possible
  • Tag each chunk with metadata:
    • source_url
    • page_title
    • section_title
    • chunk_id
    • doc_version
  • Acceptance criteria:
    • No chunk splits in the middle of fenced code blocks
    • Every chunk can be traced back to an exact source page and section
    • Chunk output is deterministic for unchanged source documents

Step 3 β€” Embeddings + Vector Store

  • Embedding model: OpenAI text-embedding-3-small
  • Store embeddings in ChromaDB
  • Each vector carries full metadata from Step 2
  • Acceptance criteria:
    • Full FastAPI corpus is embedded successfully
    • Re-indexing does not create duplicate vectors for unchanged chunks
    • Querying Chroma returns chunk metadata required for citation display

Phase 2: Hybrid Retrieval

Step 4 β€” BM25 + Semantic Search

  • BM25 via rank_bm25 library for keyword matching
  • Vector similarity search via ChromaDB
  • Merge both result sets using Reciprocal Rank Fusion (RRF)
  • Retrieve top 10 BM25 hits and top 10 vector hits before fusion
  • Acceptance criteria:
    • Known keyword-heavy queries surface exact-term matches
    • Known semantic queries surface relevant conceptual matches
    • Fused retrieval performs better than vector-only on the seed eval set

Step 5 β€” Cross Encoder Re-ranker

  • Use local cross-encoder/ms-marco-MiniLM-L-6-v2
  • Re-rank top 20 results β†’ pass top 5 to LLM
  • Biggest quality boost in the pipeline
  • Acceptance criteria:
    • Re-ranking is applied on every answer path
    • Top 5 contexts are logged for debugging and eval review
    • Reranked results improve answer grounding on the seed eval set

Step 6 β€” Citation Enforcement

  • Each retrieved chunk carries source_url + section_title
  • Prompt the LLM to answer only using provided context
  • Force structured output: answer + list of cited chunks
  • If no relevant chunk found β†’ return "I don't know" (no hallucination)
  • Validate that every cited chunk ID exists in the retrieved context set
  • Acceptance criteria:
    • API response returns answer text plus machine-readable citations
    • Unsupported answers are rejected or converted to "I don't know"
    • Manual spot checks confirm citations map to relevant evidence

Phase 3: Evaluation Pipeline

Step 7 β€” Build Eval Dataset

  • Manually write 30-50 Q&A pairs from FastAPI docs for the initial eval set
  • Cover: factual questions, code questions, comparison questions
  • Store as JSON/CSV in the repo
  • Expand to 100+ only after the first reliable baseline is in place
  • Acceptance criteria:
    • Eval set covers multiple sections of the docs
    • Each question has an expected answer and supporting source reference
    • The dataset is stable enough to compare runs over time

Step 8 β€” Ragas Evaluation Metrics

Metric What it measures
faithfulness Does the answer match retrieved context?
answer_relevancy Is the answer on-topic?
context_precision Are the right chunks being retrieved?
context_recall Are all relevant chunks being found?

Step 9 β€” CI Integration

  • Run lightweight regression checks on every PR via GitHub Actions
  • Reserve full Ragas evaluation for scheduled or manually triggered runs
  • Use an initial target such as faithfulness >= 0.85 as a baseline, not a permanent hard-coded gate
  • Store eval results over time to track regression
  • Acceptance criteria:
    • PR workflow catches obvious retrieval or API regressions quickly
    • Full eval workflow produces repeatable metrics artifacts
    • Baseline metrics are documented before strict CI thresholds are enforced

Tech Stack

Layer Tool
Orchestration LangChain
Vector Store (dev) ChromaDB
Vector Store (prod) Deferred until post-MVP
Reranker cross-encoder/ms-marco-MiniLM-L-6-v2
Evaluation Ragas
LLM gpt-4o-mini
Scraping BeautifulSoup
CI/CD GitHub Actions
API Layer FastAPI
Frontend (optional) Deferred until post-MVP

Suggested Build Timeline

Week Focus
Week 1 Phase 1 β€” Ingest, chunk, embed, store
Week 2 Phase 2 β€” BM25 + rerank + citation enforcement
Week 3 Phase 3 β€” Eval dataset + Ragas + CI pipeline
Week 4 API hardening + README + baseline evaluation review

Bonus Features (Post-MVP)

  • Version-aware retrieval β€” "In v2 vs v3, how does X work?"
  • Code snippet extraction β€” return relevant code blocks alongside answers
  • Multi-doc search β€” compare two frameworks side by side
  • Query rewriting β€” auto-expand vague queries before retrieval

Based on project notes from January 10–11, 2026