File size: 6,225 Bytes
155ccbe
 
 
 
 
 
 
 
 
 
 
639f3bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0943b9d
 
639f3bb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
---
title: Sema Chat API
emoji: πŸ’¬
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
short_description: Chat with llms
---

# Sema Chat API πŸ’¬

Modern chatbot API with streaming capabilities, flexible model backends, and production-ready features. Built with FastAPI and designed for rapid GenAI advancements.

## πŸš€ Quick Start with Gemma

### Option 1: Automated HuggingFace Spaces Deployment
```bash
cd backend/sema-chat
./setup_huggingface.sh
```

### Option 2: Manual Local Setup
```bash
cd backend/sema-chat
pip install -r requirements.txt

# Copy and configure environment
cp .env.example .env

# For Gemma via Google AI Studio (Recommended)
# Edit .env:
MODEL_TYPE=google
MODEL_NAME=gemma-2-9b-it
GOOGLE_API_KEY=your_google_api_key

# Run the API
uvicorn app.main:app --reload --host 0.0.0.0 --port 7860
```

### Option 3: Local Gemma (Free, No API Key)
```bash
# Edit .env:
MODEL_TYPE=local
MODEL_NAME=google/gemma-2b-it
DEVICE=cpu

# Run (will download model on first run)
uvicorn app.main:app --reload --host 0.0.0.0 --port 7860
```

## 🌐 Access Your API

Once running, access:
- **Swagger UI**: http://localhost:7860/
- **Health Check**: http://localhost:7860/api/v1/health
- **Chat Endpoint**: http://localhost:7860/api/v1/chat

## πŸ§ͺ Quick Test

```bash
# Test chat
curl -X POST "http://localhost:7860/api/v1/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Hello! Can you introduce yourself?",
    "session_id": "test-session"
  }'

# Test streaming
curl -N -H "Accept: text/event-stream" \
  "http://localhost:7860/api/v1/chat/stream?message=Tell%20me%20about%20AI&session_id=test"
```

## 🎯 Features

### Core Capabilities
- βœ… **Real-time Streaming**: Server-Sent Events and WebSocket support
- βœ… **Multiple Model Backends**: Local, HuggingFace API, OpenAI, Anthropic, Google AI, MiniMax
- βœ… **Session Management**: Persistent conversation contexts
- βœ… **Rate Limiting**: Built-in protection with configurable limits
- βœ… **Health Monitoring**: Comprehensive health checks and metrics

### Supported Models
- **Local**: TinyLlama, DialoGPT, Gemma, Qwen
- **Google AI**: Gemma-2-9b-it, Gemini-1.5-flash, Gemini-1.5-pro
- **OpenAI**: GPT-3.5-turbo, GPT-4, GPT-4-turbo
- **Anthropic**: Claude-3-haiku, Claude-3-sonnet, Claude-3-opus
- **HuggingFace API**: Any model via Inference API
- **MiniMax**: M1 model with reasoning capabilities

## πŸ”§ Configuration

### Environment Variables
```bash
# Model Backend (local, google, openai, anthropic, hf_api, minimax)
MODEL_TYPE=google
MODEL_NAME=gemma-2-9b-it

# API Keys (as needed)
GOOGLE_API_KEY=your_key
OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
HF_API_TOKEN=your_token
MINIMAX_API_KEY=your_key

# Generation Settings
TEMPERATURE=0.7
MAX_NEW_TOKENS=512
TOP_P=0.9

# Server Settings
HOST=0.0.0.0
PORT=7860
DEBUG=false
```

## πŸ“š Documentation

- **[Configuration Guide](CONFIGURATION_GUIDE.md)** - Detailed setup for all backends
- **[HuggingFace Deployment](HUGGINGFACE_DEPLOYMENT.md)** - Step-by-step deployment guide
- **[API Documentation](http://localhost:7860/)** - Interactive Swagger UI

## πŸ§ͺ Testing

```bash
# Run comprehensive tests
python tests/test_api.py

# Test different backends
python examples/test_backends.py

# Test specific backend
python examples/test_backends.py --backend google
```

## πŸš€ Deployment

### HuggingFace Spaces (Recommended)
1. Run the setup script: `./setup_huggingface.sh`
2. Create your Space on HuggingFace
3. Push the generated code
4. Set environment variables in Space settings
5. Your API will be live at: `https://username-spacename.hf.space/`

### Docker
```bash
docker build -t sema-chat-api .
docker run -e MODEL_TYPE=google \
           -e GOOGLE_API_KEY=your_key \
           -p 7860:7860 \
           sema-chat-api
```

## πŸ”— API Endpoints

### Chat
- **`POST /api/v1/chat`** - Send chat message
- **`GET /api/v1/chat/stream`** - Streaming chat (SSE)
- **`WebSocket /api/v1/chat/ws`** - Real-time WebSocket chat

### Sessions
- **`GET /api/v1/sessions/{id}`** - Get conversation history
- **`DELETE /api/v1/sessions/{id}`** - Clear conversation
- **`GET /api/v1/sessions`** - List active sessions

### System
- **`GET /api/v1/health`** - Comprehensive health check
- **`GET /api/v1/model/info`** - Current model information
- **`GET /api/v1/status`** - Basic status

## πŸ’‘ Why This Architecture?

1. **Future-Proof**: Modular design adapts to rapid GenAI advancements
2. **Flexible**: Switch between local models and APIs with environment variables
3. **Production-Ready**: Rate limiting, monitoring, error handling built-in
4. **Cost-Effective**: Start free with local models, scale with APIs
5. **Developer-Friendly**: Comprehensive docs, tests, and examples

## πŸ› οΈ Development

### Project Structure
```
app/
β”œβ”€β”€ main.py                     # FastAPI application
β”œβ”€β”€ api/v1/endpoints.py         # API routes
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ config.py              # Environment-based configuration
β”‚   └── logging.py             # Structured logging
β”œβ”€β”€ models/schemas.py           # Pydantic request/response models
β”œβ”€β”€ services/
β”‚   β”œβ”€β”€ chat_manager.py        # Chat orchestration
β”‚   β”œβ”€β”€ model_manager.py       # Backend selection
β”‚   β”œβ”€β”€ session_manager.py     # Conversation management
β”‚   └── model_backends/        # Model implementations
└── utils/helpers.py           # Utility functions
```

### Adding New Backends
1. Create new backend in `app/services/model_backends/`
2. Inherit from `ModelBackend` base class
3. Implement required methods
4. Add to `ModelManager._create_backend()`
5. Update configuration and documentation

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request

## πŸ“„ License

MIT License - see LICENSE file for details.

## πŸ™ Acknowledgments

- **HuggingFace** for model hosting and Spaces platform
- **Google** for Gemma models and AI Studio
- **FastAPI** for the excellent web framework
- **OpenAI, Anthropic, MiniMax** for their APIs

---

**Ready to chat? Deploy your Sema Chat API today! πŸš€πŸ’¬**