Agentic RAG

Introduction to Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) combines the power of large language models with external knowledge retrieval to produce more accurate, factual, and contextually relevant responses. At its core, RAG is about “using an LLM to answer a user query, but basing the answer on information retrieved from a knowledge base.”

Why Use RAG?

RAG offers several significant advantages over using vanilla or fine-tuned LLMs:

Factual Grounding: Reduces hallucinations by anchoring responses in retrieved facts
Domain Specialization: Provides domain-specific knowledge without model retraining
Knowledge Recency: Allows access to information beyond the model’s training cutoff
Transparency: Enables citation of sources for generated content
Control: Offers fine-grained control over what information the model can access

Limitations of Traditional RAG

Despite its benefits, traditional RAG approaches face several challenges:

Single Retrieval Step: If the initial retrieval results are poor, the final generation will suffer
Query-Document Mismatch: User queries (often questions) may not match well with documents containing answers (often statements)
Limited Reasoning: Simple RAG pipelines don’t allow for multi-step reasoning or query refinement
Context Window Constraints: Retrieved documents must fit within the model’s context window

Agentic RAG: A More Powerful Approach

We can overcome these limitations by implementing an Agentic RAG system - essentially an agent equipped with retrieval capabilities. This approach transforms RAG from a rigid pipeline into an interactive, reasoning-driven process.

Key Benefits of Agentic RAG

An agent with retrieval tools can:

✅ Formulate optimized queries: The agent can transform user questions into retrieval-friendly queries
✅ Perform multiple retrievals: The agent can retrieve information iteratively as needed
✅ Reason over retrieved content: The agent can analyze, synthesize, and draw conclusions from multiple sources
✅ Self-critique and refine: The agent can evaluate retrieval results and adjust its approach

This approach naturally implements advanced RAG techniques:

Hypothetical Document Embedding (HyDE): Instead of using the user query directly, the agent formulates retrieval-optimized queries (paper reference)
Self-Query Refinement: The agent can analyze initial results and perform follow-up retrievals with refined queries (technique reference)

Building an Agentic RAG System

Let’s build a complete Agentic RAG system step by step. We’ll create an agent that can answer questions about the Hugging Face Transformers library by retrieving information from its documentation.

You can follow along with the code snippets below, or check out the full example in the smolagents GitHub repository: examples/rag.py.

Step 1: Install Required Dependencies

First, we need to install the necessary packages:

pip install smolagents pandas langchain langchain-community sentence-transformers datasets python-dotenv rank_bm25 --upgrade

If you plan to use Hugging Face’s Inference API, you’ll need to set up your API token:

# Load environment variables (including HF_TOKEN)
from dotenv import load_dotenv
load_dotenv()

Step 2: Prepare the Knowledge Base

We’ll use a dataset containing Hugging Face documentation and prepare it for retrieval:

import datasets
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.retrievers import BM25Retriever

# Load the Hugging Face documentation dataset
knowledge_base = datasets.load_dataset("m-ric/huggingface_doc", split="train")

# Filter to include only Transformers documentation
knowledge_base = knowledge_base.filter(lambda row: row["source"].startswith("huggingface/transformers"))

# Convert dataset entries to Document objects with metadata
source_docs = [
    Document(page_content=doc["text"], metadata={"source": doc["source"].split("/")[1]})
    for doc in knowledge_base
]

# Split documents into smaller chunks for better retrieval
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,  # Characters per chunk
    chunk_overlap=50,  # Overlap between chunks to maintain context
    add_start_index=True,
    strip_whitespace=True,
    separators=["\n\n", "\n", ".", " ", ""],  # Priority order for splitting
)
docs_processed = text_splitter.split_documents(source_docs)

print(f"Knowledge base prepared with {len(docs_processed)} document chunks")

Step 3: Create a Retriever Tool

Now we’ll create a custom tool that our agent can use to retrieve information from the knowledge base:

from smolagents import Tool

class RetrieverTool(Tool):
    name = "retriever"
    description = "Uses semantic search to retrieve the parts of transformers documentation that could be most relevant to answer your query."
    inputs = {
        "query": {
            "type": "string",
            "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
        }
    }
    output_type = "string"

    def __init__(self, docs, **kwargs):
        super().__init__(**kwargs)
        # Initialize the retriever with our processed documents
        self.retriever = BM25Retriever.from_documents(
            docs, k=10  # Return top 10 most relevant documents
        )

    def forward(self, query: str) -> str:
        """Execute the retrieval based on the provided query."""
        assert isinstance(query, str), "Your search query must be a string"

        # Retrieve relevant documents
        docs = self.retriever.invoke(query)

        # Format the retrieved documents for readability
        return "\nRetrieved documents:\n" + "".join(
            [
                f"\n\n===== Document {str(i)} =====\n" + doc.page_content
                for i, doc in enumerate(docs)
            ]
        )

# Initialize our retriever tool with the processed documents
retriever_tool = RetrieverTool(docs_processed)

We’re using BM25, a lexical retrieval method, for simplicity and speed. For production systems, you might want to use semantic search with embeddings for better retrieval quality. Check the MTEB Leaderboard for high-quality embedding models.

Step 4: Create an Advanced Retrieval Agent

Now we’ll create an agent that can use our retriever tool to answer questions:

from smolagents import InferenceClientModel, CodeAgent

# Initialize the agent with our retriever tool
agent = CodeAgent(
    tools=[retriever_tool],  # List of tools available to the agent
    model=InferenceClientModel(),  # Default model "Qwen/Qwen2.5-Coder-32B-Instruct"
    max_steps=4,  # Limit the number of reasoning steps
    verbosity_level=2,  # Show detailed agent reasoning
)

# To use a specific model, you can specify it like this:
# model=InferenceClientModel(model_id="meta-llama/Llama-3.3-70B-Instruct")

The Inference API hosts models based on various criteria, and deployed models may be updated or replaced without prior notice. Learn more about it here.

Step 5: Run the Agent to Answer Questions

Let’s use our agent to answer a question about Transformers:

# Ask a question that requires retrieving information
question = "For a transformers model training, which is slower, the forward or the backward pass?"

# Run the agent to get an answer
agent_output = agent.run(question)

# Display the final answer
print("\nFinal answer:")
print(agent_output)

Practical Applications of Agentic RAG

Agentic RAG systems can be applied to various use cases:

Technical Documentation Assistance: Help users navigate complex technical documentation
Research Paper Analysis: Extract and synthesize information from scientific papers
Legal Document Review: Find relevant precedents and clauses in legal documents
Customer Support: Answer questions based on product documentation and knowledge bases
Educational Tutoring: Provide explanations based on textbooks and learning materials

Conclusion

Agentic RAG represents a significant advancement over traditional RAG pipelines. By combining the reasoning capabilities of LLM agents with the factual grounding of retrieval systems, we can build more powerful, flexible, and accurate information systems.

The approach we’ve demonstrated:

Overcomes the limitations of single-step retrieval
Enables more natural interactions with knowledge bases
Provides a framework for continuous improvement through self-critique and query refinement

As you build your own Agentic RAG systems, consider experimenting with different retrieval methods, agent architectures, and knowledge sources to find the optimal configuration for your specific use case.

< > Update on GitHub