Spaces:

sematech
/

sema-chat

Sleeping

App Files Files Community

sema-chat / README.md

kamau1

Readme config added

155ccbe 4 months ago

preview code

raw

history blame contribute delete

6.23 kB

metadata

title: Sema Chat API
emoji: 💬
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
short_description: Chat with llms

Sema Chat API 💬

Modern chatbot API with streaming capabilities, flexible model backends, and production-ready features. Built with FastAPI and designed for rapid GenAI advancements.

🚀 Quick Start with Gemma

Option 1: Automated HuggingFace Spaces Deployment

cd backend/sema-chat
./setup_huggingface.sh

Option 2: Manual Local Setup

cd backend/sema-chat
pip install -r requirements.txt

# Copy and configure environment
cp .env.example .env

# For Gemma via Google AI Studio (Recommended)
# Edit .env:
MODEL_TYPE=google
MODEL_NAME=gemma-2-9b-it
GOOGLE_API_KEY=your_google_api_key

# Run the API
uvicorn app.main:app --reload --host 0.0.0.0 --port 7860

Option 3: Local Gemma (Free, No API Key)

# Edit .env:
MODEL_TYPE=local
MODEL_NAME=google/gemma-2b-it
DEVICE=cpu

# Run (will download model on first run)
uvicorn app.main:app --reload --host 0.0.0.0 --port 7860

🌐 Access Your API

Once running, access:

Swagger UI: http://localhost:7860/
Health Check: http://localhost:7860/api/v1/health
Chat Endpoint: http://localhost:7860/api/v1/chat

🧪 Quick Test

# Test chat
curl -X POST "http://localhost:7860/api/v1/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Hello! Can you introduce yourself?",
    "session_id": "test-session"
  }'

# Test streaming
curl -N -H "Accept: text/event-stream" \
  "http://localhost:7860/api/v1/chat/stream?message=Tell%20me%20about%20AI&session_id=test"

🎯 Features

Core Capabilities

✅ Real-time Streaming: Server-Sent Events and WebSocket support
✅ Multiple Model Backends: Local, HuggingFace API, OpenAI, Anthropic, Google AI, MiniMax
✅ Session Management: Persistent conversation contexts
✅ Rate Limiting: Built-in protection with configurable limits
✅ Health Monitoring: Comprehensive health checks and metrics

Supported Models

Local: TinyLlama, DialoGPT, Gemma, Qwen
Google AI: Gemma-2-9b-it, Gemini-1.5-flash, Gemini-1.5-pro
OpenAI: GPT-3.5-turbo, GPT-4, GPT-4-turbo
Anthropic: Claude-3-haiku, Claude-3-sonnet, Claude-3-opus
HuggingFace API: Any model via Inference API
MiniMax: M1 model with reasoning capabilities

🔧 Configuration

Environment Variables

# Model Backend (local, google, openai, anthropic, hf_api, minimax)
MODEL_TYPE=google
MODEL_NAME=gemma-2-9b-it

# API Keys (as needed)
GOOGLE_API_KEY=your_key
OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
HF_API_TOKEN=your_token
MINIMAX_API_KEY=your_key

# Generation Settings
TEMPERATURE=0.7
MAX_NEW_TOKENS=512
TOP_P=0.9

# Server Settings
HOST=0.0.0.0
PORT=7860
DEBUG=false

📚 Documentation

Configuration Guide - Detailed setup for all backends
HuggingFace Deployment - Step-by-step deployment guide
API Documentation - Interactive Swagger UI

🧪 Testing

# Run comprehensive tests
python tests/test_api.py

# Test different backends
python examples/test_backends.py

# Test specific backend
python examples/test_backends.py --backend google

🚀 Deployment

HuggingFace Spaces (Recommended)

Run the setup script: ./setup_huggingface.sh
Create your Space on HuggingFace
Push the generated code
Set environment variables in Space settings
Your API will be live at: https://username-spacename.hf.space/

Docker

docker build -t sema-chat-api .
docker run -e MODEL_TYPE=google \
           -e GOOGLE_API_KEY=your_key \
           -p 7860:7860 \
           sema-chat-api

🔗 API Endpoints

Chat

POST /api/v1/chat - Send chat message
GET /api/v1/chat/stream - Streaming chat (SSE)
WebSocket /api/v1/chat/ws - Real-time WebSocket chat

Sessions

GET /api/v1/sessions/{id} - Get conversation history
DELETE /api/v1/sessions/{id} - Clear conversation
GET /api/v1/sessions - List active sessions

System

GET /api/v1/health - Comprehensive health check
GET /api/v1/model/info - Current model information
GET /api/v1/status - Basic status

💡 Why This Architecture?

Future-Proof: Modular design adapts to rapid GenAI advancements
Flexible: Switch between local models and APIs with environment variables
Production-Ready: Rate limiting, monitoring, error handling built-in
Cost-Effective: Start free with local models, scale with APIs
Developer-Friendly: Comprehensive docs, tests, and examples

🛠️ Development

Project Structure

app/
├── main.py                     # FastAPI application
├── api/v1/endpoints.py         # API routes
├── core/
│   ├── config.py              # Environment-based configuration
│   └── logging.py             # Structured logging
├── models/schemas.py           # Pydantic request/response models
├── services/
│   ├── chat_manager.py        # Chat orchestration
│   ├── model_manager.py       # Backend selection
│   ├── session_manager.py     # Conversation management
│   └── model_backends/        # Model implementations
└── utils/helpers.py           # Utility functions

Adding New Backends

Create new backend in app/services/model_backends/
Inherit from ModelBackend base class
Implement required methods
Add to ModelManager._create_backend()
Update configuration and documentation

🤝 Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

HuggingFace for model hosting and Spaces platform
Google for Gemma models and AI Studio
FastAPI for the excellent web framework
OpenAI, Anthropic, MiniMax for their APIs

Ready to chat? Deploy your Sema Chat API today! 🚀💬