π§ Sema Chat API Configuration Guide
π― MiniMax Integration
Configuration
MODEL_TYPE=minimax
MODEL_NAME=MiniMax-M1
MINIMAX_API_KEY=your_minimax_api_key
MINIMAX_API_URL=https://api.minimax.chat/v1/text/chatcompletion_v2
MINIMAX_MODEL_VERSION=abab6.5s-chat
Features
- β Reasoning Capabilities: Shows model's thinking process
- β Streaming Support: Real-time response generation
- β Custom API Integration: Direct integration with MiniMax API
- β Reasoning Content: Displays both reasoning and final response
Example Usage
curl -X POST "http://localhost:7860/api/v1/chat" \
-H "Content-Type: application/json" \
-d '{
"message": "Solve this math problem: 2x + 5 = 15",
"session_id": "minimax-test"
}'
Response includes reasoning:
{
"message": "[Reasoning: I need to solve for x. First, subtract 5 from both sides: 2x = 10. Then divide by 2: x = 5]\n\nTo solve 2x + 5 = 15:\n1. Subtract 5 from both sides: 2x = 10\n2. Divide by 2: x = 5\n\nTherefore, x = 5.",
"session_id": "minimax-test",
"model_name": "MiniMax-M1"
}
π₯ Gemma Integration
Option 1: Local Gemma (Free Tier)
MODEL_TYPE=local
MODEL_NAME=google/gemma-2b-it
DEVICE=auto
Option 2: Gemma via HuggingFace API
MODEL_TYPE=hf_api
MODEL_NAME=google/gemma-2b-it
HF_API_TOKEN=your_hf_token
Option 3: Gemma via Google AI Studio
MODEL_TYPE=google
MODEL_NAME=gemma-2-9b-it
GOOGLE_API_KEY=your_google_api_key
Available Gemma Models
- gemma-2-2b-it (2B parameters, instruction-tuned)
- gemma-2-9b-it (9B parameters, instruction-tuned)
- gemma-2-27b-it (27B parameters, instruction-tuned)
- gemini-1.5-flash (Fast, efficient)
- gemini-1.5-pro (Most capable)
Example Usage
curl -X POST "http://localhost:7860/api/v1/chat" \
-H "Content-Type: application/json" \
-d '{
"message": "Explain quantum computing in simple terms",
"session_id": "gemma-test",
"temperature": 0.7
}'
π Complete Backend Comparison
| Backend | Cost | Setup | Streaming | Special Features |
|---|---|---|---|---|
| Local | Free | Medium | β | Offline, Private |
| HF API | Free/Paid | Easy | β | Many models |
| OpenAI | Paid | Easy | β | High quality |
| Anthropic | Paid | Easy | β | Long context |
| MiniMax | Paid | Easy | β | Reasoning |
| Free/Paid | Easy | β | Multimodal |
π§ Configuration Examples
Free Tier Setup (HuggingFace Spaces)
# Best for free deployment
MODEL_TYPE=local
MODEL_NAME=TinyLlama/TinyLlama-1.1B-Chat-v1.0
DEVICE=cpu
MAX_NEW_TOKENS=256
TEMPERATURE=0.7
Production Setup (API-based)
# Best for production with fallbacks
MODEL_TYPE=openai
MODEL_NAME=gpt-3.5-turbo
OPENAI_API_KEY=your_key
# Fallback configuration
FALLBACK_MODEL_TYPE=hf_api
FALLBACK_MODEL_NAME=microsoft/DialoGPT-medium
HF_API_TOKEN=your_token
Research Setup (Multiple Models)
# Primary: Latest Gemini
MODEL_TYPE=google
MODEL_NAME=gemini-1.5-pro
GOOGLE_API_KEY=your_key
# For reasoning tasks
REASONING_MODEL_TYPE=minimax
REASONING_MODEL_NAME=MiniMax-M1
MINIMAX_API_KEY=your_key
π― Model Selection Guide
For Free Deployment (HuggingFace Spaces):
- TinyLlama/TinyLlama-1.1B-Chat-v1.0 - Smallest, fastest
- microsoft/DialoGPT-medium - Better conversations
- Qwen/Qwen2.5-0.5B-Instruct - Good instruction following
For Reasoning Tasks:
- MiniMax M1 - Shows thinking process
- Claude-3 Opus - Deep reasoning
- GPT-4 - Complex problem solving
For Conversations:
- Claude-3 Haiku - Natural, fast
- GPT-3.5-turbo - Balanced cost/quality
- Gemini-1.5-flash - Fast, capable
For Multilingual:
- Gemma-2-9b-it - Good multilingual
- GPT-4 - Excellent multilingual
- Local models - Depends on training
π Dynamic Model Switching
The API supports runtime model switching:
# Switch to MiniMax for reasoning
POST /api/v1/model/switch
{
"model_type": "minimax",
"model_name": "MiniMax-M1"
}
# Switch back to fast model
POST /api/v1/model/switch
{
"model_type": "google",
"model_name": "gemini-1.5-flash"
}
π§ͺ Testing Your Setup
Test All Backends
python examples/test_backends.py
Test Specific Backend
# Test MiniMax
MINIMAX_API_KEY=your_key python -c "
import asyncio
from app.services.model_backends.minimax_api import MiniMaxAPIBackend
from app.models.schemas import ChatMessage
async def test():
backend = MiniMaxAPIBackend('MiniMax-M1', api_key='your_key', api_url='your_url')
await backend.load_model()
messages = [ChatMessage(role='user', content='Hello')]
response = await backend.generate_response(messages)
print(response.message)
asyncio.run(test())
"
Test Gemma
# Test local Gemma
MODEL_TYPE=local MODEL_NAME=google/gemma-2b-it python tests/test_api.py
# Test Gemma via Google AI
MODEL_TYPE=google MODEL_NAME=gemma-2-9b-it GOOGLE_API_KEY=your_key python tests/test_api.py
π Deployment Examples
HuggingFace Spaces (Free)
# In your Space settings
MODEL_TYPE: local
MODEL_NAME: TinyLlama/TinyLlama-1.1B-Chat-v1.0
DEVICE: cpu
HuggingFace Spaces (With API)
# In your Space settings
MODEL_TYPE: google
MODEL_NAME: gemma-2-9b-it
GOOGLE_API_KEY: your_secret_key
Docker Deployment
docker run -e MODEL_TYPE=minimax \
-e MINIMAX_API_KEY=your_key \
-e MINIMAX_API_URL=your_url \
-p 8000:7860 \
sema-chat-api
π‘ Pro Tips
- Start Small: Begin with TinyLlama for testing
- Use APIs for Production: More reliable than local models
- Enable Streaming: Better user experience
- Monitor Usage: Track API costs and limits
- Have Fallbacks: Configure multiple backends
- Test Thoroughly: Use the provided test scripts
π Getting API Keys
- HuggingFace: https://huggingface.co/settings/tokens
- OpenAI: https://platform.openai.com/api-keys
- Anthropic: https://console.anthropic.com/
- Google AI: https://aistudio.google.com/
- MiniMax: Contact MiniMax for API access
Your architecture is now ready for both MiniMax and Gemma! π