sema-chat / README.md
kamau1's picture
Readme config added
155ccbe
metadata
title: Sema Chat API
emoji: πŸ’¬
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
short_description: Chat with llms

Sema Chat API πŸ’¬

Modern chatbot API with streaming capabilities, flexible model backends, and production-ready features. Built with FastAPI and designed for rapid GenAI advancements.

πŸš€ Quick Start with Gemma

Option 1: Automated HuggingFace Spaces Deployment

cd backend/sema-chat
./setup_huggingface.sh

Option 2: Manual Local Setup

cd backend/sema-chat
pip install -r requirements.txt

# Copy and configure environment
cp .env.example .env

# For Gemma via Google AI Studio (Recommended)
# Edit .env:
MODEL_TYPE=google
MODEL_NAME=gemma-2-9b-it
GOOGLE_API_KEY=your_google_api_key

# Run the API
uvicorn app.main:app --reload --host 0.0.0.0 --port 7860

Option 3: Local Gemma (Free, No API Key)

# Edit .env:
MODEL_TYPE=local
MODEL_NAME=google/gemma-2b-it
DEVICE=cpu

# Run (will download model on first run)
uvicorn app.main:app --reload --host 0.0.0.0 --port 7860

🌐 Access Your API

Once running, access:

πŸ§ͺ Quick Test

# Test chat
curl -X POST "http://localhost:7860/api/v1/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Hello! Can you introduce yourself?",
    "session_id": "test-session"
  }'

# Test streaming
curl -N -H "Accept: text/event-stream" \
  "http://localhost:7860/api/v1/chat/stream?message=Tell%20me%20about%20AI&session_id=test"

🎯 Features

Core Capabilities

  • βœ… Real-time Streaming: Server-Sent Events and WebSocket support
  • βœ… Multiple Model Backends: Local, HuggingFace API, OpenAI, Anthropic, Google AI, MiniMax
  • βœ… Session Management: Persistent conversation contexts
  • βœ… Rate Limiting: Built-in protection with configurable limits
  • βœ… Health Monitoring: Comprehensive health checks and metrics

Supported Models

  • Local: TinyLlama, DialoGPT, Gemma, Qwen
  • Google AI: Gemma-2-9b-it, Gemini-1.5-flash, Gemini-1.5-pro
  • OpenAI: GPT-3.5-turbo, GPT-4, GPT-4-turbo
  • Anthropic: Claude-3-haiku, Claude-3-sonnet, Claude-3-opus
  • HuggingFace API: Any model via Inference API
  • MiniMax: M1 model with reasoning capabilities

πŸ”§ Configuration

Environment Variables

# Model Backend (local, google, openai, anthropic, hf_api, minimax)
MODEL_TYPE=google
MODEL_NAME=gemma-2-9b-it

# API Keys (as needed)
GOOGLE_API_KEY=your_key
OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
HF_API_TOKEN=your_token
MINIMAX_API_KEY=your_key

# Generation Settings
TEMPERATURE=0.7
MAX_NEW_TOKENS=512
TOP_P=0.9

# Server Settings
HOST=0.0.0.0
PORT=7860
DEBUG=false

πŸ“š Documentation

πŸ§ͺ Testing

# Run comprehensive tests
python tests/test_api.py

# Test different backends
python examples/test_backends.py

# Test specific backend
python examples/test_backends.py --backend google

πŸš€ Deployment

HuggingFace Spaces (Recommended)

  1. Run the setup script: ./setup_huggingface.sh
  2. Create your Space on HuggingFace
  3. Push the generated code
  4. Set environment variables in Space settings
  5. Your API will be live at: https://username-spacename.hf.space/

Docker

docker build -t sema-chat-api .
docker run -e MODEL_TYPE=google \
           -e GOOGLE_API_KEY=your_key \
           -p 7860:7860 \
           sema-chat-api

πŸ”— API Endpoints

Chat

  • POST /api/v1/chat - Send chat message
  • GET /api/v1/chat/stream - Streaming chat (SSE)
  • WebSocket /api/v1/chat/ws - Real-time WebSocket chat

Sessions

  • GET /api/v1/sessions/{id} - Get conversation history
  • DELETE /api/v1/sessions/{id} - Clear conversation
  • GET /api/v1/sessions - List active sessions

System

  • GET /api/v1/health - Comprehensive health check
  • GET /api/v1/model/info - Current model information
  • GET /api/v1/status - Basic status

πŸ’‘ Why This Architecture?

  1. Future-Proof: Modular design adapts to rapid GenAI advancements
  2. Flexible: Switch between local models and APIs with environment variables
  3. Production-Ready: Rate limiting, monitoring, error handling built-in
  4. Cost-Effective: Start free with local models, scale with APIs
  5. Developer-Friendly: Comprehensive docs, tests, and examples

πŸ› οΈ Development

Project Structure

app/
β”œβ”€β”€ main.py                     # FastAPI application
β”œβ”€β”€ api/v1/endpoints.py         # API routes
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ config.py              # Environment-based configuration
β”‚   └── logging.py             # Structured logging
β”œβ”€β”€ models/schemas.py           # Pydantic request/response models
β”œβ”€β”€ services/
β”‚   β”œβ”€β”€ chat_manager.py        # Chat orchestration
β”‚   β”œβ”€β”€ model_manager.py       # Backend selection
β”‚   β”œβ”€β”€ session_manager.py     # Conversation management
β”‚   └── model_backends/        # Model implementations
└── utils/helpers.py           # Utility functions

Adding New Backends

  1. Create new backend in app/services/model_backends/
  2. Inherit from ModelBackend base class
  3. Implement required methods
  4. Add to ModelManager._create_backend()
  5. Update configuration and documentation

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

πŸ“„ License

MIT License - see LICENSE file for details.

πŸ™ Acknowledgments

  • HuggingFace for model hosting and Spaces platform
  • Google for Gemma models and AI Studio
  • FastAPI for the excellent web framework
  • OpenAI, Anthropic, MiniMax for their APIs

Ready to chat? Deploy your Sema Chat API today! πŸš€πŸ’¬