SB-PoC / gettingstart.md
Chirapath's picture
First draft coding project
963ae98 verified
# Getting Started with Unified AI Services
This guide will walk you through setting up and running the complete Unified AI Services system.
## πŸ“‹ Quick Overview
The Unified AI Services system consists of:
- **NER Service** (Port 8500): Named Entity Recognition with relationship extraction
- **OCR Service** (Port 8400): Optical Character Recognition with document processing
- **RAG Service** (Port 8401): Retrieval-Augmented Generation with vector search
- **Unified App** (Port 8000): Main application coordinating all services
## πŸš€ Quick Start (Recommended)
### Step 1: Automated Setup
```bash
# Run the automated setup wizard
python setup.py
```
This will:
- βœ… Check your Python environment
- βœ… Create necessary directories
- βœ… Help configure your .env file
- βœ… Install dependencies
- βœ… Validate configuration
- βœ… Create startup scripts
### Step 2: Start the System
```bash
# Start all services automatically
python app.py
```
Or use the generated scripts:
- **Windows**: Double-click `start_services.bat`
- **Linux/Mac**: Run `./start_services.sh`
### Step 3: Test the System
```bash
# Run comprehensive tests
python test_unified.py
```
Or use the generated scripts:
- **Windows**: Double-click `run_tests.bat`
- **Linux/Mac**: Run `./run_tests.sh`
### Step 4: Try the Demo
```bash
# Run interactive demo
python demo.py
```
## πŸ“ File Structure
After setup, your directory should look like this:
```
unified-ai-services/
β”œβ”€β”€ app.py # 🌐 Main unified application
β”œβ”€β”€ configs.py # βš™οΈ Configuration management
β”œβ”€β”€ setup.py # πŸ› οΈ Automated setup script
β”œβ”€β”€ manage_services.py # πŸ”§ Service management tool
β”œβ”€β”€ test_unified.py # πŸ§ͺ Comprehensive test suite
β”œβ”€β”€ demo.py # 🎬 Interactive demo
β”œβ”€β”€ requirements.txt # πŸ“¦ Python dependencies
β”œβ”€β”€ .env # πŸ” Environment configuration
β”œβ”€β”€ README.md # πŸ“– Documentation
β”œβ”€β”€ GETTING_STARTED.md # πŸš€ This file
β”œβ”€β”€ services/ # πŸ“‚ Service implementations
β”‚ β”œβ”€β”€ ner_service.py # Named Entity Recognition
β”‚ β”œβ”€β”€ ocr_service.py # Optical Character Recognition
β”‚ └── rag_service.py # Retrieval-Augmented Generation
β”œβ”€β”€ exports/ # πŸ“ Generated export files
β”œβ”€β”€ logs/ # πŸ“ Application logs
└── temp/ # πŸ—‚οΈ Temporary files
```
## βš™οΈ Manual Setup (Alternative)
If you prefer manual setup:
### Prerequisites
- Python 3.8 or higher
- PostgreSQL with vector extension
- Azure OpenAI account
- Azure Document Intelligence account
- DeepSeek API account
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
### 2. Configure Environment
Create a `.env` file with your configuration:
```bash
# Server Configuration
HOST=0.0.0.0
MAIN_PORT=8000
NER_PORT=8500
OCR_PORT=8400
RAG_PORT=8401
# PostgreSQL Configuration
POSTGRES_HOST=your-postgres-server.com
POSTGRES_PORT=5432
POSTGRES_USER=your-username
POSTGRES_PASSWORD=your-password
POSTGRES_DATABASE=postgres
# Azure OpenAI Configuration
AZURE_OPENAI_ENDPOINT=https://your-openai.openai.azure.com/
AZURE_OPENAI_API_KEY=your-api-key
EMBEDDING_MODEL=text-embedding-3-large
# DeepSeek Configuration (for advanced NER)
DEEPSEEK_ENDPOINT=https://your-deepseek-endpoint/
DEEPSEEK_API_KEY=your-deepseek-key
DEEPSEEK_MODEL=DeepSeek-R1-0528
# Azure Document Intelligence Configuration
AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-di.cognitiveservices.azure.com/
AZURE_DOCUMENT_INTELLIGENCE_KEY=your-di-key
# Azure Storage Configuration
AZURE_STORAGE_ACCOUNT_URL=https://yourstorage.blob.core.windows.net/
AZURE_BLOB_SAS_TOKEN=your-sas-token
BLOB_CONTAINER=historylog
```
### 3. Create Directory Structure
```bash
mkdir -p services exports logs temp tests data
```
### 4. Place Service Files
Ensure your service files are in the correct locations:
- `services/ner_service.py`
- `services/ocr_service.py`
- `services/rag_service.py`
## πŸ”§ Service Management
### Using the Service Manager
The `manage_services.py` script provides easy service management:
```bash
# Start individual services
python manage_services.py start ner
python manage_services.py start ocr
python manage_services.py start rag
python manage_services.py start unified
# Start all services
python manage_services.py start all
# Check status
python manage_services.py status
# Test services
python manage_services.py test ner
python manage_services.py test all
# Stop services
python manage_services.py stop all
# Restart services
python manage_services.py restart all
# List available services
python manage_services.py list
```
### Direct Service Management
Start services individually for development:
```bash
# Terminal 1: Start OCR service
cd services && python ocr_service.py
# Terminal 2: Start RAG service
cd services && python rag_service.py
# Terminal 3: Start NER service
cd services && python ner_service.py
# Terminal 4: Start unified application
python app.py
```
## πŸ§ͺ Testing and Validation
### Comprehensive System Tests
```bash
# Run all tests
python test_unified.py
# Test output will show:
# βœ… Unified App Health Check
# βœ… Individual Service Health
# βœ… Unified Analysis (Text)
# βœ… Unified Analysis (URL)
# βœ… Combined Search
# βœ… Service Proxies
# βœ… File Upload (Unified)
# βœ… Service Discovery
# βœ… System Performance
# βœ… Error Handling
```
### Individual Service Tests
```bash
# Test NER service specifically
python test_ner.py
# Test RAG service specifically
python test_rag.py
```
### Quick Health Checks
```bash
# Check unified system
curl http://localhost:8000/health
# Check individual services
curl http://localhost:8500/health # NER
curl http://localhost:8400/health # OCR
curl http://localhost:8401/health # RAG
```
## 🎬 Interactive Demo
The demo script showcases all system capabilities:
```bash
python demo.py
```
Demo includes:
- Multi-language text analysis (Thai + English)
- Entity and relationship extraction
- RAG document indexing
- Combined search functionality
- Service proxy testing
- Real-time performance monitoring
## 🌐 API Usage
### API Documentation
Once running, access interactive documentation:
- **Unified API**: http://localhost:8000/docs
- **NER Service**: http://localhost:8500/docs
- **OCR Service**: http://localhost:8400/docs
- **RAG Service**: http://localhost:8401/docs
### Key Endpoints
#### Unified Analysis
```python
# Analyze text with automatic RAG indexing
POST /analyze/unified
{
"text": "Your text here...",
"extract_relationships": true,
"enable_rag_indexing": true,
"rag_title": "Document Title"
}
```
#### Combined Search
```python
# Search with automatic NER enhancement
POST /search/combined
{
"query": "search terms",
"include_ner_analysis": true,
"limit": 10
}
```
#### Service Proxies
```python
# Direct access to individual services
POST /ner/analyze/text # NER analysis
POST /ocr/upload # OCR processing
POST /rag/search # RAG search
GET /rag/documents # List documents
```
## πŸ” Health Monitoring
### System Status
```bash
# Get overall system health
GET /health
# Get detailed status
GET /status
# Discover available services
GET /services
```
### Service Monitoring
Each service provides health information:
- Response times
- Uptime
- Resource usage
- Configuration status
- Error rates
## πŸ› οΈ Troubleshooting
### Common Issues
#### 1. Services Won't Start
**Check ports:**
```bash
netstat -an | grep :8000
netstat -an | grep :8500
netstat -an | grep :8400
netstat -an | grep :8401
```
**Verify configuration:**
```bash
python configs.py
```
**Check dependencies:**
```bash
pip list | grep fastapi
pip list | grep asyncpg
```
#### 2. Database Connection Issues
**Test connection:**
```bash
# Use your actual connection details
python -c "
import asyncio
import asyncpg
async def test():
conn = await asyncpg.connect('postgresql://user:pass@host:5432/db')
print('Connected successfully')
await conn.close()
asyncio.run(test())
"
```
**Common fixes:**
- Verify PostgreSQL is running
- Check firewall rules
- Confirm SSL requirements
- Validate credentials
#### 3. Azure Service Issues
**Check API keys:**
```bash
# Test Azure OpenAI
curl -H "api-key: YOUR_KEY" "YOUR_ENDPOINT/openai/deployments/YOUR_MODEL/embeddings?api-version=2024-02-01"
# Test Document Intelligence
curl -H "Ocp-Apim-Subscription-Key: YOUR_KEY" "YOUR_ENDPOINT/formrecognizer/info?api-version=2023-07-31"
```
**Common fixes:**
- Verify API keys are correct
- Check service regions
- Confirm quota limits
- Validate endpoint URLs
#### 4. Performance Issues
**Monitor resources:**
```bash
# Check system resources
top
htop
python manage_services.py status
```
**Common solutions:**
- Increase system memory
- Optimize database queries
- Reduce concurrent requests
- Check network latency
### Getting Help
1. **Check logs**: Services log to console
2. **Run health checks**: Use `/health` endpoints
3. **Validate configuration**: Run `python configs.py`
4. **Test individual services**: Use service manager
5. **Check database connectivity**: Test connection strings
6. **Verify Azure services**: Check API endpoints
### Debug Mode
Enable debug mode for detailed logging:
```bash
# In .env file
DEBUG=True
# Or set environment variable
export DEBUG=true
python app.py
```
## πŸš€ Production Deployment
### Security Considerations
1. **Environment Variables**: Use secure secret management
2. **HTTPS**: Enable SSL/TLS in production
3. **Authentication**: Implement API authentication
4. **Rate Limiting**: Add request rate limiting
5. **Input Validation**: Validate all input data
### Performance Optimization
1. **Caching**: Implement Redis caching
2. **Load Balancing**: Use reverse proxy (nginx)
3. **Database**: Optimize PostgreSQL configuration
4. **Monitoring**: Set up application monitoring
5. **Scaling**: Consider horizontal scaling
### Deployment Options
1. **Docker**: Containerize services
2. **Cloud**: Deploy to Azure/AWS/GCP
3. **Kubernetes**: Orchestrate with k8s
4. **CI/CD**: Automate deployments
## πŸ“ž Next Steps
After successful setup:
1. **Explore the API**: Use the interactive documentation
2. **Try the demo**: Run `python demo.py`
3. **Run tests**: Execute `python test_unified.py`
4. **Monitor system**: Check health endpoints
5. **Customize**: Modify services for your needs
6. **Scale**: Consider production deployment
## 🎯 Success Indicators
You know the system is working when:
- βœ… All health checks pass
- βœ… Tests complete successfully
- βœ… Demo runs without errors
- βœ… API documentation is accessible
- βœ… Services respond to requests
- βœ… Database connections work
- βœ… Azure integrations function
- βœ… File uploads process correctly
- βœ… Search returns results
- βœ… Export files generate properly
**Congratulations! Your Unified AI Services system is ready to use! πŸŽ‰**