metadata
title: Sema Chat API
emoji: π¬
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
short_description: Chat with llms
Sema Chat API π¬
Modern chatbot API with streaming capabilities, flexible model backends, and production-ready features. Built with FastAPI and designed for rapid GenAI advancements.
π Quick Start with Gemma
Option 1: Automated HuggingFace Spaces Deployment
cd backend/sema-chat
./setup_huggingface.sh
Option 2: Manual Local Setup
cd backend/sema-chat
pip install -r requirements.txt
# Copy and configure environment
cp .env.example .env
# For Gemma via Google AI Studio (Recommended)
# Edit .env:
MODEL_TYPE=google
MODEL_NAME=gemma-2-9b-it
GOOGLE_API_KEY=your_google_api_key
# Run the API
uvicorn app.main:app --reload --host 0.0.0.0 --port 7860
Option 3: Local Gemma (Free, No API Key)
# Edit .env:
MODEL_TYPE=local
MODEL_NAME=google/gemma-2b-it
DEVICE=cpu
# Run (will download model on first run)
uvicorn app.main:app --reload --host 0.0.0.0 --port 7860
π Access Your API
Once running, access:
- Swagger UI: http://localhost:7860/
- Health Check: http://localhost:7860/api/v1/health
- Chat Endpoint: http://localhost:7860/api/v1/chat
π§ͺ Quick Test
# Test chat
curl -X POST "http://localhost:7860/api/v1/chat" \
-H "Content-Type: application/json" \
-d '{
"message": "Hello! Can you introduce yourself?",
"session_id": "test-session"
}'
# Test streaming
curl -N -H "Accept: text/event-stream" \
"http://localhost:7860/api/v1/chat/stream?message=Tell%20me%20about%20AI&session_id=test"
π― Features
Core Capabilities
- β Real-time Streaming: Server-Sent Events and WebSocket support
- β Multiple Model Backends: Local, HuggingFace API, OpenAI, Anthropic, Google AI, MiniMax
- β Session Management: Persistent conversation contexts
- β Rate Limiting: Built-in protection with configurable limits
- β Health Monitoring: Comprehensive health checks and metrics
Supported Models
- Local: TinyLlama, DialoGPT, Gemma, Qwen
- Google AI: Gemma-2-9b-it, Gemini-1.5-flash, Gemini-1.5-pro
- OpenAI: GPT-3.5-turbo, GPT-4, GPT-4-turbo
- Anthropic: Claude-3-haiku, Claude-3-sonnet, Claude-3-opus
- HuggingFace API: Any model via Inference API
- MiniMax: M1 model with reasoning capabilities
π§ Configuration
Environment Variables
# Model Backend (local, google, openai, anthropic, hf_api, minimax)
MODEL_TYPE=google
MODEL_NAME=gemma-2-9b-it
# API Keys (as needed)
GOOGLE_API_KEY=your_key
OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
HF_API_TOKEN=your_token
MINIMAX_API_KEY=your_key
# Generation Settings
TEMPERATURE=0.7
MAX_NEW_TOKENS=512
TOP_P=0.9
# Server Settings
HOST=0.0.0.0
PORT=7860
DEBUG=false
π Documentation
- Configuration Guide - Detailed setup for all backends
- HuggingFace Deployment - Step-by-step deployment guide
- API Documentation - Interactive Swagger UI
π§ͺ Testing
# Run comprehensive tests
python tests/test_api.py
# Test different backends
python examples/test_backends.py
# Test specific backend
python examples/test_backends.py --backend google
π Deployment
HuggingFace Spaces (Recommended)
- Run the setup script:
./setup_huggingface.sh
- Create your Space on HuggingFace
- Push the generated code
- Set environment variables in Space settings
- Your API will be live at:
https://username-spacename.hf.space/
Docker
docker build -t sema-chat-api .
docker run -e MODEL_TYPE=google \
-e GOOGLE_API_KEY=your_key \
-p 7860:7860 \
sema-chat-api
π API Endpoints
Chat
POST /api/v1/chat
- Send chat messageGET /api/v1/chat/stream
- Streaming chat (SSE)WebSocket /api/v1/chat/ws
- Real-time WebSocket chat
Sessions
GET /api/v1/sessions/{id}
- Get conversation historyDELETE /api/v1/sessions/{id}
- Clear conversationGET /api/v1/sessions
- List active sessions
System
GET /api/v1/health
- Comprehensive health checkGET /api/v1/model/info
- Current model informationGET /api/v1/status
- Basic status
π‘ Why This Architecture?
- Future-Proof: Modular design adapts to rapid GenAI advancements
- Flexible: Switch between local models and APIs with environment variables
- Production-Ready: Rate limiting, monitoring, error handling built-in
- Cost-Effective: Start free with local models, scale with APIs
- Developer-Friendly: Comprehensive docs, tests, and examples
π οΈ Development
Project Structure
app/
βββ main.py # FastAPI application
βββ api/v1/endpoints.py # API routes
βββ core/
β βββ config.py # Environment-based configuration
β βββ logging.py # Structured logging
βββ models/schemas.py # Pydantic request/response models
βββ services/
β βββ chat_manager.py # Chat orchestration
β βββ model_manager.py # Backend selection
β βββ session_manager.py # Conversation management
β βββ model_backends/ # Model implementations
βββ utils/helpers.py # Utility functions
Adding New Backends
- Create new backend in
app/services/model_backends/
- Inherit from
ModelBackend
base class - Implement required methods
- Add to
ModelManager._create_backend()
- Update configuration and documentation
π€ Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
π License
MIT License - see LICENSE file for details.
π Acknowledgments
- HuggingFace for model hosting and Spaces platform
- Google for Gemma models and AI Studio
- FastAPI for the excellent web framework
- OpenAI, Anthropic, MiniMax for their APIs
Ready to chat? Deploy your Sema Chat API today! ππ¬