RAG Pipeline API Documentation
Overview
FastAPI-based RAG (Retrieval-Augmented Generation) pipeline with OpenRouter GLM integration for intelligent tool calling.
Base URL
http://localhost:8000
Endpoints
/chat - Main Chat Endpoint
Method: POST
Description: Intelligent chat with RAG tool calling. GLM automatically determines when to use RAG vs. general conversation.
Request Body
{
"messages": [
{
"role": "user|assistant|system",
"content": "string"
}
]
}
Response Format
{
"response": "string",
"tool_calls": [
{
"name": "rag_qa",
"arguments": "{\"question\": \"string\", \"dataset\": \"string\"}"
}
] | null
}
Examples
1. General Greeting (No RAG):
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"hi"}]}'
Response:
{
"response": "Hi! I'm Rohit's AI assistant. I can help you learn about his professional background, skills, and experience. What would you like to know about Rohit?",
"tool_calls": null
}
2. Portfolio Question (RAG Enabled):
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"What is your current role?"}]}'
Response:
{
"response": "Based on the portfolio information, Rohit is currently working as a Tech Lead at FleetEnable, where he leads UI development for a logistics SaaS product focused on drayage and freight management...",
"tool_calls": [
{
"name": "rag_qa",
"arguments": "{\"question\": \"What is your current role?\"}"
}
]
}
/health - Health Check
Method: GET
Description: Check API and dataset loading status.
Response
{
"status": "healthy",
"datasets_loaded": 1,
"available_datasets": ["developer-portfolio"]
}
/datasets - List Available Datasets
Method: GET
Description: Get list of available datasets.
Response
{
"datasets": ["developer-portfolio"]
}
Features
π§ Intelligent Tool Calling
- Automatic Detection: GLM determines when questions need RAG vs. general conversation
- Context-Aware: Uses portfolio information for relevant questions
- Natural Responses: Synthesizes RAG results into conversational answers
π― Third-Person AI Assistant
- Portfolio Focus: Responds about Rohit's experience (not "my" experience)
- Professional Tone: Maintains proper third-person references
- Context Integration: Combines multiple data points coherently
β‘ Performance Optimizations
- On-Demand Loading: Datasets load only when RAG is needed
- Clean Output: No verbose ML logging for general conversations
- Fast Responses: Sub-second for greetings, ~20s for first RAG query
Available Datasets
developer-portfolio
- Content: Work experience, skills, projects, achievements
- Topics: FleetEnable, Coditude, technologies, leadership
- Size: 19 documents with full metadata
Error Handling
Common Responses
- Datasets Loading: "RAG Pipeline is running but datasets are still loading..."
- Dataset Not Found: "Dataset 'xyz' not available. Available datasets: [...]"
- API Errors: HTTP 500 with error details
Status Codes
200- Success400- Bad Request (invalid JSON, missing fields)500- Internal Server Error
Environment Variables
Create .env file:
OPENROUTER_API_KEY=sk-or-v1-your-key-here
PORT=8000
TOKENIZERS_PARALLELISM=false
Development
Running Locally
# Install dependencies
pip install -r requirements.txt
# Start server
python -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
# Or use script
./start.sh
Testing
# Health check
curl http://localhost:8000/health
# Chat test
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"hi"}]}'
Deployment
Docker
# Build
docker build -t rag-pipeline .
# Run
docker run -p 8000:8000 rag-pipeline
Hugging Face Spaces
- Push code to repository
- Connect Space to repository
- Set environment variables in Space settings
- Automatic deployment from
mainbranch
Architecture
OpenRouter GLM-4.5-air (Parent AI)
βββ Tool Calling Logic
β βββ Automatically detects RAG-worthy questions
β βββ Falls back to general knowledge
βββ RAG Tool Function
β βββ Dataset selection (developer-portfolio)
β βββ Document retrieval
β βββ Context formatting
βββ Response Generation
βββ Tool results integration
βββ Natural language responses
Changelog
v2.0 - Current
- β OpenRouter GLM integration with tool calling
- β Intelligent RAG vs. conversation detection
- β Third-person AI assistant for Rohit's portfolio
- β On-demand dataset loading
- β
Removed
/answerendpoint (use/chatonly) - β Environment variable configuration
- β Performance optimizations
v1.0 - Legacy
- Google Gemini integration
- Multiple endpoints (
/answer,/chat) - Background dataset loading
- First-person responses