HonestAI / API_DOCUMENTATION.md
JatsTheAIGen's picture
Add context mode endpoints (fresh vs relevant) and update API documentation
c3a42ce

Flask API Documentation

Overview

The Research AI Assistant API provides a RESTful interface for interacting with an AI-powered research assistant. The API uses local GPU models for inference and supports conversational interactions with context management.

Base URL (HF Spaces): https://jatinautonomouslabs-research-ai-assistant-api.hf.space

Alternative Base URL: https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant_API

API Version: 1.0

Content-Type: application/json

Note: For Hugging Face Spaces Docker deployments, use the .hf.space domain format. The space name is converted to lowercase with hyphens.

Features

  • 🤖 AI-Powered Responses - Local GPU model inference (Tesla T4)
  • 💬 Conversational Context - Maintains conversation history and user context
  • 🔒 CORS Enabled - Ready for web integration
  • Async Processing - Efficient request handling
  • 📊 Transparent Reasoning - Returns reasoning chains and performance metrics

Authentication

Currently, the API does not require authentication. However, for production use, you should:

  1. Set HF_TOKEN environment variable for Hugging Face model access
  2. Implement API key authentication if needed

Endpoints

1. Get API Information

Endpoint: GET /

Description: Returns API information, version, and available endpoints.

Request:

GET / HTTP/1.1
Host: huggingface.co

Response:

{
  "name": "AI Assistant Flask API",
  "version": "1.0",
  "status": "running",
  "orchestrator_ready": true,
  "features": {
    "local_gpu_models": true,
    "max_workers": 4,
    "hardware": "NVIDIA T4 Medium"
  },
  "endpoints": {
    "health": "GET /api/health",
    "chat": "POST /api/chat",
    "initialize": "POST /api/initialize",
    "context_mode_get": "GET /api/context/mode",
    "context_mode_set": "POST /api/context/mode"
  }
}

Status Codes:

  • 200 OK - Success

2. Health Check

Endpoint: GET /api/health

Description: Checks if the API and orchestrator are ready to handle requests.

Request:

GET /api/health HTTP/1.1
Host: huggingface.co

Response:

{
  "status": "healthy",
  "orchestrator_ready": true
}

Status Codes:

  • 200 OK - API is healthy
    • orchestrator_ready: true - Ready to process requests
    • orchestrator_ready: false - Still initializing

Example Response (Initializing):

{
  "status": "initializing",
  "orchestrator_ready": false
}

3. Chat Endpoint

Endpoint: POST /api/chat

Description: Send a message to the AI assistant and receive a response with reasoning and context.

Request Headers:

Content-Type: application/json

Request Body:

{
  "message": "Explain quantum entanglement in simple terms",
  "history": [
    ["User message 1", "Assistant response 1"],
    ["User message 2", "Assistant response 2"]
  ],
  "session_id": "session-123",
  "user_id": "user-456"
}

Request Fields:

Field Type Required Description
message string ✅ Yes User's message/question (max 10,000 characters)
history array ❌ No Conversation history as array of [user, assistant] pairs
session_id string ❌ No Unique session identifier for context continuity
user_id string ❌ No User identifier (defaults to "anonymous")
context_mode string ❌ No Context retrieval mode: "fresh" (no user context) or "relevant" (only relevant context). Defaults to "fresh" if not set.

Response (Success):

{
  "success": true,
  "message": "Quantum entanglement is when two particles become linked...",
  "history": [
    ["Explain quantum entanglement", "Quantum entanglement is when two particles become linked..."]
  ],
  "reasoning": {
    "intent": "educational_query",
    "steps": ["Understanding request", "Gathering information", "Synthesizing response"],
    "confidence": 0.95
  },
  "performance": {
    "response_time_ms": 2345,
    "tokens_generated": 156,
    "model_used": "mistralai/Mistral-7B-Instruct-v0.2"
  }
}

Response Fields:

Field Type Description
success boolean Whether the request was successful
message string AI assistant's response
history array Updated conversation history including the new exchange
reasoning object AI reasoning process and confidence metrics
performance object Performance metrics (response time, tokens, model used)

Status Codes:

  • 200 OK - Request processed successfully
  • 400 Bad Request - Invalid request (missing message, empty message, too long, wrong type)
  • 500 Internal Server Error - Server error processing request
  • 503 Service Unavailable - Orchestrator not ready (still initializing)

Error Response:

{
  "success": false,
  "error": "Message is required",
  "message": "Error processing your request. Please try again."
}

Context Mode Feature:

The context_mode parameter controls how user context is retrieved and used:

  • "fresh" (default): No user context is included. Each conversation starts fresh, ideal for:

    • General questions requiring no prior context
    • Avoiding context contamination
    • Faster responses (no context retrieval overhead)
  • "relevant": Only relevant user context is included based on relevance classification. The system:

    • Analyzes all previous interactions for the session
    • Classifies which interactions are relevant to the current query
    • Includes only relevant context summaries
    • Ideal for:
      • Follow-up questions that build on previous conversations
      • Maintaining continuity within a research session
      • Personalized responses based on user history

Example with Context Mode:

{
  "message": "Can you remind me what we discussed about quantum computing?",
  "session_id": "session-123",
  "user_id": "user-456",
  "context_mode": "relevant"
}

4. Initialize Orchestrator

Endpoint: POST /api/initialize

Description: Manually trigger orchestrator initialization (useful if initialization failed on startup).

Request:

POST /api/initialize HTTP/1.1
Host: huggingface.co
Content-Type: application/json

Request Body:

{}

Response (Success):

{
  "success": true,
  "message": "Orchestrator initialized successfully"
}

Response (Failure):

{
  "success": false,
  "message": "Initialization failed. Check logs for details."
}

Status Codes:

  • 200 OK - Initialization successful
  • 500 Internal Server Error - Initialization failed

5. Get Context Mode

Endpoint: GET /api/context/mode

Description: Retrieve the current context retrieval mode for a session.

Request:

GET /api/context/mode?session_id=session-123 HTTP/1.1
Host: huggingface.co

Query Parameters:

Parameter Type Required Description
session_id string ✅ Yes Session identifier

Response (Success):

{
  "success": true,
  "session_id": "session-123",
  "context_mode": "fresh",
  "description": {
    "fresh": "No user context included - starts fresh each time",
    "relevant": "Only relevant user context included based on relevance classification"
  }
}

Response Fields:

Field Type Description
success boolean Whether the request was successful
session_id string Session identifier
context_mode string Current mode: "fresh" or "relevant"
description object Description of each mode

Status Codes:

  • 200 OK - Success
  • 400 Bad Request - Missing session_id parameter
  • 500 Internal Server Error - Server error
  • 503 Service Unavailable - Orchestrator not ready or context mode not available

Error Response:

{
  "success": false,
  "error": "session_id query parameter is required"
}

6. Set Context Mode

Endpoint: POST /api/context/mode

Description: Set the context retrieval mode for a session (fresh or relevant).

Request Headers:

Content-Type: application/json

Request Body:

{
  "session_id": "session-123",
  "mode": "relevant",
  "user_id": "user-456"
}

Request Fields:

Field Type Required Description
session_id string ✅ Yes Session identifier
mode string ✅ Yes Context mode: "fresh" or "relevant"
user_id string ❌ No User identifier (defaults to "anonymous")

Response (Success):

{
  "success": true,
  "session_id": "session-123",
  "context_mode": "relevant",
  "message": "Context mode set successfully"
}

Response Fields:

Field Type Description
success boolean Whether the request was successful
session_id string Session identifier
context_mode string The mode that was set
message string Success message

Status Codes:

  • 200 OK - Context mode set successfully
  • 400 Bad Request - Invalid request (missing fields, invalid mode)
  • 500 Internal Server Error - Server error or failed to set mode
  • 503 Service Unavailable - Orchestrator not ready or context mode not available

Error Response:

{
  "success": false,
  "error": "mode must be 'fresh' or 'relevant'"
}

Usage Notes:

  • The context mode persists for the session until changed
  • Setting mode to "relevant" enables relevance classification, which analyzes all previous interactions to include only relevant context
  • Setting mode to "fresh" disables context retrieval, providing faster responses without user history
  • The mode can also be set per-request via the context_mode parameter in /api/chat

Code Examples

Python

import requests
import json

BASE_URL = "https://jatinautonomouslabs-research-ai-assistant-api.hf.space"

# Check health
def check_health():
    response = requests.get(f"{BASE_URL}/api/health")
    return response.json()

# Send chat message
def send_message(message, session_id=None, user_id=None, history=None, context_mode=None):
    payload = {
        "message": message,
        "session_id": session_id,
        "user_id": user_id or "anonymous",
        "history": history or []
    }
    if context_mode:
        payload["context_mode"] = context_mode
    
    response = requests.post(
        f"{BASE_URL}/api/chat",
        json=payload,
        headers={"Content-Type": "application/json"}
    )
    
    if response.status_code == 200:
        return response.json()
    else:
        raise Exception(f"API Error: {response.status_code} - {response.text}")

# Example usage
if __name__ == "__main__":
    # Check if API is ready
    health = check_health()
    print(f"API Status: {health}")
    
    if health.get("orchestrator_ready"):
        # Send a message
        result = send_message(
            message="What is machine learning?",
            session_id="my-session-123",
            user_id="user-456"
        )
        
        print(f"Response: {result['message']}")
        print(f"Reasoning: {result.get('reasoning', {})}")
        
        # Set context mode to relevant for follow-up
        import requests
        requests.post(
            f"{BASE_URL}/api/context/mode",
            json={
                "session_id": "my-session-123",
                "mode": "relevant",
                "user_id": "user-456"
            }
        )
        
        # Continue conversation with relevant context
        history = result['history']
        result2 = send_message(
            message="Can you explain neural networks?",
            session_id="my-session-123",
            user_id="user-456",
            history=history,
            context_mode="relevant"
        )
        print(f"Follow-up Response: {result2['message']}")

JavaScript (Fetch API)

const BASE_URL = 'https://jatinautonomouslabs-research-ai-assistant-api.hf.space';

// Check health
async function checkHealth() {
    const response = await fetch(`${BASE_URL}/api/health`);
    return await response.json();
}

// Get context mode for a session
async function getContextMode(sessionId) {
    const response = await fetch(`${BASE_URL}/api/context/mode?session_id=${sessionId}`);
    if (!response.ok) {
        throw new Error(`API Error: ${response.status}`);
    }
    return await response.json();
}

// Set context mode for a session
async function setContextMode(sessionId, mode, userId = null) {
    const payload = {
        session_id: sessionId,
        mode: mode
    };
    if (userId) {
        payload.user_id = userId;
    }
    
    const response = await fetch(`${BASE_URL}/api/context/mode`, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json'
        },
        body: JSON.stringify(payload)
    });
    
    if (!response.ok) {
        const error = await response.json();
        throw new Error(`API Error: ${response.status} - ${error.error || error.message}`);
    }
    
    return await response.json();
}

// Send chat message
async function sendMessage(message, sessionId = null, userId = null, history = [], contextMode = null) {
    const payload = {
        message: message,
        session_id: sessionId,
        user_id: userId || 'anonymous',
        history: history
    };
    if (contextMode) {
        payload.context_mode = contextMode;
    }
    
    const response = await fetch(`${BASE_URL}/api/chat`, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json'
        },
        body: JSON.stringify(payload)
    });
    
    if (!response.ok) {
        const error = await response.json();
        throw new Error(`API Error: ${response.status} - ${error.error || error.message}`);
    }
    
    return await response.json();
}

// Example usage
async function main() {
    try {
        // Check if API is ready
        const health = await checkHealth();
        console.log('API Status:', health);
        
        if (health.orchestrator_ready) {
            // Send a message
            const result = await sendMessage(
                'What is machine learning?',
                'my-session-123',
                'user-456'
            );
            
            console.log('Response:', result.message);
            console.log('Reasoning:', result.reasoning);
            
            // Continue conversation with relevant context
            await setContextMode('my-session-123', 'relevant', 'user-456');
            const result2 = await sendMessage(
                'Can you explain neural networks?',
                'my-session-123',
                'user-456',
                result.history,
                'relevant'
            );
            console.log('Follow-up Response:', result2.message);
            
            // Check current context mode
            const modeInfo = await getContextMode('my-session-123');
            console.log('Current context mode:', modeInfo.context_mode);
        }
    } catch (error) {
        console.error('Error:', error);
    }
}

main();

cURL

# Check health
curl -X GET "https://jatinautonomouslabs-research-ai-assistant-api.hf.space/api/health"

# Get context mode
curl -X GET "https://jatinautonomouslabs-research-ai-assistant-api.hf.space/api/context/mode?session_id=my-session-123"

# Set context mode to relevant
curl -X POST "https://jatinautonomouslabs-research-ai-assistant-api.hf.space/api/context/mode" \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "my-session-123",
    "mode": "relevant",
    "user_id": "user-456"
  }'

# Send chat message
curl -X POST "https://jatinautonomouslabs-research-ai-assistant-api.hf.space/api/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What is machine learning?",
    "session_id": "my-session-123",
    "user_id": "user-456",
    "context_mode": "relevant",
    "history": []
  }'

# Continue conversation
curl -X POST "https://jatinautonomouslabs-research-ai-assistant-api.hf.space/api/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Can you explain neural networks?",
    "session_id": "my-session-123",
    "user_id": "user-456",
    "history": [
      ["What is machine learning?", "Machine learning is a subset of artificial intelligence..."]
    ]
  }'

Node.js (Axios)

const axios = require('axios');

const BASE_URL = 'https://jatinautonomouslabs-research-ai-assistant-api.hf.space';

// Check health
async function checkHealth() {
    const response = await axios.get(`${BASE_URL}/api/health`);
    return response.data;
}

// Get context mode
async function getContextMode(sessionId) {
    const response = await axios.get(`${BASE_URL}/api/context/mode`, {
        params: { session_id: sessionId }
    });
    return response.data;
}

// Set context mode
async function setContextMode(sessionId, mode, userId = null) {
    const payload = {
        session_id: sessionId,
        mode: mode
    };
    if (userId) payload.user_id = userId;
    
    const response = await axios.post(`${BASE_URL}/api/context/mode`, payload);
    return response.data;
}

// Send chat message
async function sendMessage(message, sessionId = null, userId = null, history = [], contextMode = null) {
    try {
        const payload = {
            message: message,
            session_id: sessionId,
            user_id: userId || 'anonymous',
            history: history
        };
        if (contextMode) payload.context_mode = contextMode;
        
        const response = await axios.post(`${BASE_URL}/api/chat`, payload, {
            headers: {
                'Content-Type': 'application/json'
            }
        });
        
        return response.data;
    } catch (error) {
        if (error.response) {
            throw new Error(`API Error: ${error.response.status} - ${error.response.data.error || error.response.data.message}`);
        }
        throw error;
    }
}

// Example usage
(async () => {
    try {
        const health = await checkHealth();
        console.log('API Status:', health);
        
        if (health.orchestrator_ready) {
            // Set context mode to relevant
            await setContextMode('my-session-123', 'relevant', 'user-456');
            
            const result = await sendMessage(
                'What is machine learning?',
                'my-session-123',
                'user-456',
                [],
                'relevant'
            );
            
            console.log('Response:', result.message);
            
            // Check current mode
            const modeInfo = await getContextMode('my-session-123');
            console.log('Context mode:', modeInfo.context_mode);
        }
    } catch (error) {
        console.error('Error:', error.message);
    }
})();

Error Handling

Common Error Responses

400 Bad Request

Missing Message:

{
  "success": false,
  "error": "Message is required"
}

Empty Message:

{
  "success": false,
  "error": "Message cannot be empty"
}

Message Too Long:

{
  "success": false,
  "error": "Message too long. Maximum length is 10000 characters"
}

Invalid Type:

{
  "success": false,
  "error": "Message must be a string"
}

503 Service Unavailable

Orchestrator Not Ready:

{
  "success": false,
  "error": "Orchestrator not ready",
  "message": "AI system is initializing. Please try again in a moment."
}

Solution: Wait a few seconds and retry, or check the /api/health endpoint.

500 Internal Server Error

Generic Error:

{
  "success": false,
  "error": "Error message here",
  "message": "Error processing your request. Please try again."
}

Best Practices

1. Session Management

  • Use consistent session IDs for maintaining conversation context
  • Generate unique session IDs per user conversation thread
  • Include conversation history in subsequent requests for better context
# Good: Maintains context
session_id = "user-123-session-1"
history = []

# First message
result1 = send_message("What is AI?", session_id=session_id, history=history)
history = result1['history']

# Follow-up message (includes context)
result2 = send_message("Can you explain more?", session_id=session_id, history=history)

2. Error Handling

Always implement retry logic for 503 errors:

import time

def send_message_with_retry(message, max_retries=3, retry_delay=2):
    for attempt in range(max_retries):
        try:
            result = send_message(message)
            return result
        except Exception as e:
            if "503" in str(e) and attempt < max_retries - 1:
                time.sleep(retry_delay)
                continue
            raise

3. Health Checks

Check API health before sending requests:

def is_api_ready():
    try:
        health = check_health()
        return health.get("orchestrator_ready", False)
    except:
        return False

if is_api_ready():
    # Send request
    result = send_message("Hello")
else:
    print("API is not ready yet")

4. Rate Limiting

  • No explicit rate limits are currently enforced
  • Recommended: Implement client-side rate limiting (e.g., 1 request per second)
  • Consider: Implementing request queuing for high-volume applications

5. Message Length

  • Maximum: 10,000 characters per message
  • Recommended: Keep messages concise for faster processing
  • For long content: Split into multiple messages or summarize

6. Context Management

  • Include history in requests to maintain conversation context
  • Session IDs help track conversations across multiple requests
  • User IDs enable personalization and user-specific context

Integration Examples

React Component

import React, { useState, useEffect } from 'react';

const AIAssistant = () => {
    const [message, setMessage] = useState('');
    const [history, setHistory] = useState([]);
    const [loading, setLoading] = useState(false);
    const [sessionId] = useState(`session-${Date.now()}`);
    
    const sendMessage = async () => {
        if (!message.trim()) return;
        
        setLoading(true);
        try {
            const response = await fetch('https://jatinautonomouslabs-research-ai-assistant-api.hf.space/api/chat', {
                method: 'POST',
                headers: { 'Content-Type': 'application/json' },
                body: JSON.stringify({
                    message: message,
                    session_id: sessionId,
                    user_id: 'user-123',
                    history: history
                })
            });
            
            const data = await response.json();
            if (data.success) {
                setHistory(data.history);
                setMessage('');
            }
        } catch (error) {
            console.error('Error:', error);
        } finally {
            setLoading(false);
        }
    };
    
    return (
        <div>
            <div className="chat-history">
                {history.map(([user, assistant], idx) => (
                    <div key={idx}>
                        <div><strong>You:</strong> {user}</div>
                        <div><strong>Assistant:</strong> {assistant}</div>
                    </div>
                ))}
            </div>
            <input
                value={message}
                onChange={(e) => setMessage(e.target.value)}
                onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
                disabled={loading}
            />
            <button onClick={sendMessage} disabled={loading}>
                {loading ? 'Sending...' : 'Send'}
            </button>
        </div>
    );
};

Python CLI Tool

#!/usr/bin/env python3
import requests
import sys

BASE_URL = "https://jatinautonomouslabs-research-ai-assistant-api.hf.space"

class ChatCLI:
    def __init__(self):
        self.session_id = f"cli-session-{hash(__file__)}"
        self.history = []
    
    def chat(self, message):
        response = requests.post(
            f"{BASE_URL}/api/chat",
            json={
                "message": message,
                "session_id": self.session_id,
                "user_id": "cli-user",
                "history": self.history
            }
        )
        
        if response.status_code == 200:
            data = response.json()
            self.history = data['history']
            return data['message']
        else:
            return f"Error: {response.status_code} - {response.text}"
    
    def run(self):
        print("AI Assistant CLI (Type 'exit' to quit)")
        print("=" * 50)
        
        while True:
            user_input = input("\nYou: ").strip()
            if user_input.lower() in ['exit', 'quit']:
                break
            
            print("Assistant: ", end="", flush=True)
            response = self.chat(user_input)
            print(response)

if __name__ == "__main__":
    cli = ChatCLI()
    cli.run()

Response Times

  • Typical Response: 2-10 seconds
  • First Request: May take longer due to model loading (10-30 seconds)
  • Subsequent Requests: Faster due to cached models (2-5 seconds)

Factors Affecting Response Time:

  • Message length
  • Model loading (first request)
  • GPU availability
  • Concurrent requests

Troubleshooting

Common Issues

404 Not Found

Problem: Getting 404 when accessing the API

Solutions:

  1. Verify the Space is running:

    • Check the Hugging Face Space page to ensure it's built and running
    • Wait for the initial build to complete (5-10 minutes)
  2. Check URL format:

    • ✅ Correct: https://jatinautonomouslabs-research-ai-assistant-api.hf.space
    • ❌ Wrong: https://jatinautonomouslabs-research_ai_assistant_api.hf.space (underscores)
    • ✅ Alternative: https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant_API
  3. Verify endpoint paths:

    • Health: GET /api/health
    • Chat: POST /api/chat
    • Root: GET /
  4. Test with root endpoint first:

    curl https://jatinautonomouslabs-research-ai-assistant-api.hf.space/
    

503 Service Unavailable

Problem: Orchestrator not ready

Solutions:

  1. Wait 30-60 seconds for initialization
  2. Check /api/health endpoint
  3. Use /api/initialize to manually trigger initialization

CORS Errors

Problem: CORS errors in browser

Solutions:

  • The API has CORS enabled for all origins
  • If issues persist, check browser console for specific errors
  • Ensure you're using the correct base URL

Testing API Connectivity

Quick Health Check:

# Test root endpoint
curl https://jatinautonomouslabs-research-ai-assistant-api.hf.space/

# Test health endpoint
curl https://jatinautonomouslabs-research-ai-assistant-api.hf.space/api/health

Python Test Script:

import requests

BASE_URL = "https://jatinautonomouslabs-research-ai-assistant-api.hf.space"

# Test root
try:
    response = requests.get(f"{BASE_URL}/", timeout=10)
    print(f"Root endpoint: {response.status_code} - {response.json()}")
except Exception as e:
    print(f"Root endpoint failed: {e}")

# Test health
try:
    response = requests.get(f"{BASE_URL}/api/health", timeout=10)
    print(f"Health endpoint: {response.status_code} - {response.json()}")
except Exception as e:
    print(f"Health endpoint failed: {e}")

Support

For issues, questions, or contributions:


Changelog

Version 1.0 (Current)

  • Initial API release
  • Chat endpoint with context management
  • Health check endpoint
  • Local GPU model inference
  • CORS enabled for web integration

License

This API is provided as-is. Please refer to the main project README for license information.