sema-chat / docs /CONFIGURATION_GUIDE.md
kamau1's picture
Readme config added
155ccbe
# πŸ”§ Sema Chat API Configuration Guide
## 🎯 **MiniMax Integration**
### Configuration
```bash
MODEL_TYPE=minimax
MODEL_NAME=MiniMax-M1
MINIMAX_API_KEY=your_minimax_api_key
MINIMAX_API_URL=https://api.minimax.chat/v1/text/chatcompletion_v2
MINIMAX_MODEL_VERSION=abab6.5s-chat
```
### Features
- βœ… **Reasoning Capabilities**: Shows model's thinking process
- βœ… **Streaming Support**: Real-time response generation
- βœ… **Custom API Integration**: Direct integration with MiniMax API
- βœ… **Reasoning Content**: Displays both reasoning and final response
### Example Usage
```bash
curl -X POST "http://localhost:7860/api/v1/chat" \
-H "Content-Type: application/json" \
-d '{
"message": "Solve this math problem: 2x + 5 = 15",
"session_id": "minimax-test"
}'
```
**Response includes reasoning:**
```json
{
"message": "[Reasoning: I need to solve for x. First, subtract 5 from both sides: 2x = 10. Then divide by 2: x = 5]\n\nTo solve 2x + 5 = 15:\n1. Subtract 5 from both sides: 2x = 10\n2. Divide by 2: x = 5\n\nTherefore, x = 5.",
"session_id": "minimax-test",
"model_name": "MiniMax-M1"
}
```
---
## πŸ”₯ **Gemma Integration**
### Option 1: Local Gemma (Free Tier)
```bash
MODEL_TYPE=local
MODEL_NAME=google/gemma-2b-it
DEVICE=auto
```
### Option 2: Gemma via HuggingFace API
```bash
MODEL_TYPE=hf_api
MODEL_NAME=google/gemma-2b-it
HF_API_TOKEN=your_hf_token
```
### Option 3: Gemma via Google AI Studio
```bash
MODEL_TYPE=google
MODEL_NAME=gemma-2-9b-it
GOOGLE_API_KEY=your_google_api_key
```
### Available Gemma Models
- **gemma-2-2b-it** (2B parameters, instruction-tuned)
- **gemma-2-9b-it** (9B parameters, instruction-tuned)
- **gemma-2-27b-it** (27B parameters, instruction-tuned)
- **gemini-1.5-flash** (Fast, efficient)
- **gemini-1.5-pro** (Most capable)
### Example Usage
```bash
curl -X POST "http://localhost:7860/api/v1/chat" \
-H "Content-Type: application/json" \
-d '{
"message": "Explain quantum computing in simple terms",
"session_id": "gemma-test",
"temperature": 0.7
}'
```
---
## πŸš€ **Complete Backend Comparison**
| Backend | Cost | Setup | Streaming | Special Features |
|---------|------|-------|-----------|------------------|
| **Local** | Free | Medium | βœ… | Offline, Private |
| **HF API** | Free/Paid | Easy | βœ… | Many models |
| **OpenAI** | Paid | Easy | βœ… | High quality |
| **Anthropic** | Paid | Easy | βœ… | Long context |
| **MiniMax** | Paid | Easy | βœ… | Reasoning |
| **Google** | Free/Paid | Easy | βœ… | Multimodal |
---
## πŸ”§ **Configuration Examples**
### Free Tier Setup (HuggingFace Spaces)
```bash
# Best for free deployment
MODEL_TYPE=local
MODEL_NAME=TinyLlama/TinyLlama-1.1B-Chat-v1.0
DEVICE=cpu
MAX_NEW_TOKENS=256
TEMPERATURE=0.7
```
### Production Setup (API-based)
```bash
# Best for production with fallbacks
MODEL_TYPE=openai
MODEL_NAME=gpt-3.5-turbo
OPENAI_API_KEY=your_key
# Fallback configuration
FALLBACK_MODEL_TYPE=hf_api
FALLBACK_MODEL_NAME=microsoft/DialoGPT-medium
HF_API_TOKEN=your_token
```
### Research Setup (Multiple Models)
```bash
# Primary: Latest Gemini
MODEL_TYPE=google
MODEL_NAME=gemini-1.5-pro
GOOGLE_API_KEY=your_key
# For reasoning tasks
REASONING_MODEL_TYPE=minimax
REASONING_MODEL_NAME=MiniMax-M1
MINIMAX_API_KEY=your_key
```
---
## 🎯 **Model Selection Guide**
### For **Free Deployment** (HuggingFace Spaces):
1. **TinyLlama/TinyLlama-1.1B-Chat-v1.0** - Smallest, fastest
2. **microsoft/DialoGPT-medium** - Better conversations
3. **Qwen/Qwen2.5-0.5B-Instruct** - Good instruction following
### For **Reasoning Tasks**:
1. **MiniMax M1** - Shows thinking process
2. **Claude-3 Opus** - Deep reasoning
3. **GPT-4** - Complex problem solving
### For **Conversations**:
1. **Claude-3 Haiku** - Natural, fast
2. **GPT-3.5-turbo** - Balanced cost/quality
3. **Gemini-1.5-flash** - Fast, capable
### For **Multilingual**:
1. **Gemma-2-9b-it** - Good multilingual
2. **GPT-4** - Excellent multilingual
3. **Local models** - Depends on training
---
## πŸ”„ **Dynamic Model Switching**
The API supports runtime model switching:
```python
# Switch to MiniMax for reasoning
POST /api/v1/model/switch
{
"model_type": "minimax",
"model_name": "MiniMax-M1"
}
# Switch back to fast model
POST /api/v1/model/switch
{
"model_type": "google",
"model_name": "gemini-1.5-flash"
}
```
---
## πŸ§ͺ **Testing Your Setup**
### Test All Backends
```bash
python examples/test_backends.py
```
### Test Specific Backend
```bash
# Test MiniMax
MINIMAX_API_KEY=your_key python -c "
import asyncio
from app.services.model_backends.minimax_api import MiniMaxAPIBackend
from app.models.schemas import ChatMessage
async def test():
backend = MiniMaxAPIBackend('MiniMax-M1', api_key='your_key', api_url='your_url')
await backend.load_model()
messages = [ChatMessage(role='user', content='Hello')]
response = await backend.generate_response(messages)
print(response.message)
asyncio.run(test())
"
```
### Test Gemma
```bash
# Test local Gemma
MODEL_TYPE=local MODEL_NAME=google/gemma-2b-it python tests/test_api.py
# Test Gemma via Google AI
MODEL_TYPE=google MODEL_NAME=gemma-2-9b-it GOOGLE_API_KEY=your_key python tests/test_api.py
```
---
## πŸš€ **Deployment Examples**
### HuggingFace Spaces (Free)
```yaml
# In your Space settings
MODEL_TYPE: local
MODEL_NAME: TinyLlama/TinyLlama-1.1B-Chat-v1.0
DEVICE: cpu
```
### HuggingFace Spaces (With API)
```yaml
# In your Space settings
MODEL_TYPE: google
MODEL_NAME: gemma-2-9b-it
GOOGLE_API_KEY: your_secret_key
```
### Docker Deployment
```bash
docker run -e MODEL_TYPE=minimax \
-e MINIMAX_API_KEY=your_key \
-e MINIMAX_API_URL=your_url \
-p 8000:7860 \
sema-chat-api
```
---
## πŸ’‘ **Pro Tips**
1. **Start Small**: Begin with TinyLlama for testing
2. **Use APIs for Production**: More reliable than local models
3. **Enable Streaming**: Better user experience
4. **Monitor Usage**: Track API costs and limits
5. **Have Fallbacks**: Configure multiple backends
6. **Test Thoroughly**: Use the provided test scripts
---
## πŸ”— **Getting API Keys**
- **HuggingFace**: https://huggingface.co/settings/tokens
- **OpenAI**: https://platform.openai.com/api-keys
- **Anthropic**: https://console.anthropic.com/
- **Google AI**: https://aistudio.google.com/
- **MiniMax**: Contact MiniMax for API access
---
**Your architecture is now ready for both MiniMax and Gemma! πŸŽ‰**