|
# Unified AI Services
|
|
|
|
A comprehensive AI platform that integrates Named Entity Recognition (NER), Optical Character Recognition (OCR), and Retrieval-Augmented Generation (RAG) services into a unified application.
|
|
|
|
## π Features
|
|
|
|
### Core Services
|
|
- **NER Service** (Port 8500): Advanced named entity recognition with relationship extraction
|
|
- **OCR Service** (Port 8400): Document processing with Azure Document Intelligence
|
|
- **RAG Service** (Port 8401): Vector search and document retrieval
|
|
- **Unified App** (Port 8000): Coordinated workflows and service management
|
|
|
|
### Key Capabilities
|
|
- β
Multi-language support (Thai + English)
|
|
- β
Complex relationship extraction
|
|
- β
Entity deduplication
|
|
- β
Graph database exports (Neo4j, GraphML, GEXF)
|
|
- β
Vector search with semantic similarity
|
|
- β
Document processing (PDF, images, text)
|
|
- β
Real-time service health monitoring
|
|
- β
Unified workflows combining all services
|
|
- β
Comprehensive API documentation
|
|
|
|
## π Quick Start
|
|
|
|
### Prerequisites
|
|
- Python 3.8 or higher
|
|
- PostgreSQL with vector extension support
|
|
- Azure OpenAI account
|
|
- Azure Document Intelligence account
|
|
- DeepSeek API account (for advanced NER)
|
|
|
|
### Automated Setup
|
|
|
|
1. **Clone and navigate to the project directory**
|
|
```bash
|
|
cd unified-ai-services
|
|
```
|
|
|
|
2. **Run the automated setup**
|
|
```bash
|
|
python setup.py
|
|
```
|
|
|
|
This will:
|
|
- Check your Python environment
|
|
- Create necessary directories
|
|
- Help you configure .env file
|
|
- Install dependencies
|
|
- Validate configuration
|
|
- Create startup scripts
|
|
|
|
3. **Start the unified application**
|
|
```bash
|
|
python app.py
|
|
```
|
|
|
|
Or use the generated scripts:
|
|
- Windows: `start_services.bat`
|
|
- Unix/Linux/Mac: `./start_services.sh`
|
|
|
|
4. **Run comprehensive tests**
|
|
```bash
|
|
python test_unified.py
|
|
```
|
|
|
|
Or use the generated scripts:
|
|
- Windows: `run_tests.bat`
|
|
- Unix/Linux/Mac: `./run_tests.sh`
|
|
|
|
### Manual Setup
|
|
|
|
If you prefer manual setup:
|
|
|
|
1. **Install dependencies**
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
2. **Create .env file** (copy from .env.example)
|
|
```bash
|
|
cp .env.example .env
|
|
# Edit .env with your configuration
|
|
```
|
|
|
|
3. **Set up directories**
|
|
```bash
|
|
mkdir -p services exports logs temp tests data
|
|
```
|
|
|
|
4. **Place service files in the services directory**
|
|
```
|
|
services/
|
|
βββ ner_service.py
|
|
βββ ocr_service.py
|
|
βββ rag_service.py
|
|
```
|
|
|
|
## π Project Structure
|
|
|
|
```
|
|
unified-ai-services/
|
|
βββ app.py # Main unified application
|
|
βββ configs.py # Centralized configuration
|
|
βββ setup.py # Automated setup script
|
|
βββ requirements.txt # Python dependencies
|
|
βββ test_unified.py # Comprehensive test suite
|
|
βββ .env # Environment configuration
|
|
βββ services/ # Individual service files
|
|
β βββ ner_service.py # NER service implementation
|
|
β βββ ocr_service.py # OCR service implementation
|
|
β βββ rag_service.py # RAG service implementation
|
|
βββ exports/ # Generated export files
|
|
βββ logs/ # Application logs
|
|
βββ temp/ # Temporary files
|
|
βββ tests/ # Additional test files
|
|
βββ data/ # Data files
|
|
```
|
|
|
|
## βοΈ Configuration
|
|
|
|
### Environment Variables
|
|
|
|
The system uses a `.env` file for configuration. Key variables include:
|
|
|
|
#### Server Configuration
|
|
```bash
|
|
HOST=0.0.0.0
|
|
DEBUG=True
|
|
MAIN_PORT=8000
|
|
NER_PORT=8500
|
|
OCR_PORT=8400
|
|
RAG_PORT=8401
|
|
```
|
|
|
|
#### Database Configuration
|
|
```bash
|
|
POSTGRES_HOST=your-postgres-server.com
|
|
POSTGRES_PORT=5432
|
|
POSTGRES_USER=your-username
|
|
POSTGRES_PASSWORD=your-password
|
|
POSTGRES_DATABASE=postgres
|
|
```
|
|
|
|
#### Azure OpenAI Configuration
|
|
```bash
|
|
AZURE_OPENAI_ENDPOINT=https://your-openai.openai.azure.com/
|
|
AZURE_OPENAI_API_KEY=your-api-key
|
|
EMBEDDING_MODEL=text-embedding-3-large
|
|
```
|
|
|
|
#### DeepSeek Configuration
|
|
```bash
|
|
DEEPSEEK_ENDPOINT=https://your-deepseek-endpoint/
|
|
DEEPSEEK_API_KEY=your-deepseek-key
|
|
DEEPSEEK_MODEL=DeepSeek-R1-0528
|
|
```
|
|
|
|
#### Azure Document Intelligence Configuration
|
|
```bash
|
|
AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-di.cognitiveservices.azure.com/
|
|
AZURE_DOCUMENT_INTELLIGENCE_KEY=your-di-key
|
|
```
|
|
|
|
#### Azure Storage Configuration
|
|
```bash
|
|
AZURE_STORAGE_ACCOUNT_URL=https://yourstorage.blob.core.windows.net/
|
|
AZURE_BLOB_SAS_TOKEN=your-sas-token
|
|
BLOB_CONTAINER=historylog
|
|
```
|
|
|
|
## π§ API Documentation
|
|
|
|
Once running, access the interactive API documentation:
|
|
- **Unified API**: http://localhost:8000/docs
|
|
- **NER Service**: http://localhost:8500/docs
|
|
- **OCR Service**: http://localhost:8400/docs
|
|
- **RAG Service**: http://localhost:8401/docs
|
|
|
|
## π― API Usage Examples
|
|
|
|
### 1. Unified Analysis (Text + RAG Indexing)
|
|
|
|
```python
|
|
import httpx
|
|
|
|
async def unified_analysis():
|
|
data = {
|
|
"text": "Your text content here...",
|
|
"extract_relationships": True,
|
|
"include_embeddings": False,
|
|
"generate_graph_files": True,
|
|
"export_formats": ["neo4j", "json"],
|
|
"enable_rag_indexing": True,
|
|
"rag_title": "My Document",
|
|
"rag_keywords": ["keyword1", "keyword2"]
|
|
}
|
|
|
|
async with httpx.AsyncClient() as client:
|
|
response = await client.post("http://localhost:8000/analyze/unified", json=data)
|
|
return response.json()
|
|
```
|
|
|
|
### 2. Combined Search with NER Analysis
|
|
|
|
```python
|
|
async def combined_search():
|
|
data = {
|
|
"query": "search query here",
|
|
"limit": 10,
|
|
"similarity_threshold": 0.2,
|
|
"include_ner_analysis": True
|
|
}
|
|
|
|
async with httpx.AsyncClient() as client:
|
|
response = await client.post("http://localhost:8000/search/combined", json=data)
|
|
return response.json()
|
|
```
|
|
|
|
### 3. File Upload Analysis
|
|
|
|
```python
|
|
async def analyze_file():
|
|
files = {"file": ("document.pdf", open("document.pdf", "rb"), "application/pdf")}
|
|
data = {
|
|
"extract_relationships": "true",
|
|
"generate_graph_files": "true",
|
|
"export_formats": "neo4j,json"
|
|
}
|
|
|
|
async with httpx.AsyncClient() as client:
|
|
response = await client.post("http://localhost:8000/ner/analyze/file", files=files, data=data)
|
|
return response.json()
|
|
```
|
|
|
|
## π§ͺ Testing
|
|
|
|
### Comprehensive Test Suite
|
|
|
|
The project includes comprehensive tests covering:
|
|
- β
Service health checks
|
|
- β
Individual service functionality
|
|
- β
Unified workflow testing
|
|
- β
Service proxy functionality
|
|
- β
Error handling and resilience
|
|
- β
Performance testing
|
|
- β
File upload/download testing
|
|
|
|
Run tests with:
|
|
```bash
|
|
python test_unified.py
|
|
```
|
|
|
|
### Individual Service Tests
|
|
|
|
Test individual services:
|
|
```bash
|
|
# Test NER service
|
|
python test_ner.py
|
|
|
|
# Test RAG service
|
|
python test_rag.py
|
|
```
|
|
|
|
### Quick Health Check
|
|
|
|
```bash
|
|
curl http://localhost:8000/health
|
|
```
|
|
|
|
## π Monitoring and Health Checks
|
|
|
|
### Health Endpoints
|
|
- **Unified System**: `GET /health`
|
|
- **Individual Services**: `GET /ner/health`, `GET /ocr/health`, `GET /rag/health`
|
|
- **Detailed Status**: `GET /status`
|
|
- **Service Discovery**: `GET /services`
|
|
|
|
### Monitoring Features
|
|
- Real-time service health monitoring
|
|
- Response time tracking
|
|
- Service uptime monitoring
|
|
- Error rate tracking
|
|
- Resource usage monitoring
|
|
|
|
## π Service Architecture
|
|
|
|
```mermaid
|
|
graph TB
|
|
Client[Client Applications]
|
|
|
|
subgraph "Unified AI Services (Port 8000)"
|
|
UA[Unified App]
|
|
Proxy[Service Proxies]
|
|
Health[Health Monitor]
|
|
end
|
|
|
|
subgraph "Core Services"
|
|
NER[NER Service<br/>Port 8500]
|
|
OCR[OCR Service<br/>Port 8400]
|
|
RAG[RAG Service<br/>Port 8401]
|
|
end
|
|
|
|
subgraph "External Services"
|
|
Azure[Azure Services]
|
|
DeepSeek[DeepSeek API]
|
|
DB[(PostgreSQL)]
|
|
end
|
|
|
|
Client --> UA
|
|
UA --> Proxy
|
|
Proxy --> NER
|
|
Proxy --> OCR
|
|
Proxy --> RAG
|
|
|
|
NER --> Azure
|
|
NER --> DeepSeek
|
|
NER --> DB
|
|
|
|
OCR --> Azure
|
|
|
|
RAG --> Azure
|
|
RAG --> DB
|
|
RAG --> OCR
|
|
```
|
|
|
|
## π οΈ Development
|
|
|
|
### Adding New Features
|
|
|
|
1. **Service Modifications**: Update individual service files in `services/`
|
|
2. **Unified Workflows**: Modify `app.py` for new combined workflows
|
|
3. **Configuration**: Update `configs.py` for new settings
|
|
4. **Tests**: Add tests to `test_unified.py`
|
|
|
|
### Debugging
|
|
|
|
1. **Check Service Logs**: Services log to console
|
|
2. **Health Checks**: Use `/health` endpoints
|
|
3. **Configuration**: Run `python configs.py` to validate
|
|
4. **Database**: Check PostgreSQL connectivity
|
|
5. **Azure Services**: Verify API keys and endpoints
|
|
|
|
### Service Management
|
|
|
|
Start individual services for development:
|
|
```bash
|
|
# Start NER service only
|
|
cd services && python ner_service.py
|
|
|
|
# Start OCR service only
|
|
cd services && python ocr_service.py
|
|
|
|
# Start RAG service only
|
|
cd services && python rag_service.py
|
|
```
|
|
|
|
## π¨ Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
#### 1. Services Won't Start
|
|
- Check port availability: `netstat -an | grep :8000`
|
|
- Verify Python dependencies: `pip list`
|
|
- Check .env configuration: `python configs.py`
|
|
|
|
#### 2. Database Connection Issues
|
|
- Verify PostgreSQL is running
|
|
- Check connection string in .env
|
|
- Test connectivity: `python -c "import asyncpg; asyncio.run(asyncpg.connect('your-connection-string'))"`
|
|
|
|
#### 3. Azure Service Issues
|
|
- Verify API keys and endpoints
|
|
- Check Azure service status
|
|
- Review rate limits and quotas
|
|
|
|
#### 4. Performance Issues
|
|
- Monitor resource usage: `top` or Task Manager
|
|
- Check database performance
|
|
- Review log files for errors
|
|
|
|
### Error Codes
|
|
|
|
- **500**: Internal service error
|
|
- **503**: Service unavailable
|
|
- **400**: Bad request (check input data)
|
|
- **422**: Validation error
|
|
- **404**: Endpoint not found
|
|
|
|
## π Performance Optimization
|
|
|
|
### Recommended Settings
|
|
|
|
#### Production Configuration
|
|
```bash
|
|
DEBUG=False
|
|
MAX_FILE_SIZE=50
|
|
REQUEST_TIMEOUT=300
|
|
CHUNK_SIZE=1000
|
|
CHUNK_OVERLAP=200
|
|
```
|
|
|
|
#### Database Optimization
|
|
- Use connection pooling
|
|
- Configure appropriate indexes
|
|
- Monitor query performance
|
|
- Regular maintenance
|
|
|
|
#### Service Optimization
|
|
- Enable caching where appropriate
|
|
- Use async operations
|
|
- Optimize batch processing
|
|
- Monitor memory usage
|
|
|
|
## π Security Considerations
|
|
|
|
### API Security
|
|
- Implement authentication/authorization as needed
|
|
- Use HTTPS in production
|
|
- Validate all input data
|
|
- Rate limiting
|
|
|
|
### Data Security
|
|
- Secure database connections (SSL)
|
|
- Encrypt sensitive data
|
|
- Regular security updates
|
|
- Monitor access logs
|
|
|
|
### Azure Security
|
|
- Rotate API keys regularly
|
|
- Use managed identities where possible
|
|
- Monitor usage and costs
|
|
- Follow Azure security best practices
|
|
|
|
## π License
|
|
|
|
This project is licensed under the MIT License - see the LICENSE file for details.
|
|
|
|
## π€ Contributing
|
|
|
|
1. Fork the repository
|
|
2. Create a feature branch
|
|
3. Make your changes
|
|
4. Add tests for new functionality
|
|
5. Run the test suite
|
|
6. Submit a pull request
|
|
|
|
## π Support
|
|
|
|
For support and questions:
|
|
1. Check this README for common issues
|
|
2. Review the test suite for usage examples
|
|
3. Check service logs for error details
|
|
4. Verify configuration with `python configs.py`
|
|
|
|
## π― Roadmap
|
|
|
|
### Current Version (1.0.0)
|
|
- β
Unified service integration
|
|
- β
Comprehensive testing
|
|
- β
Multi-language support
|
|
- β
Graph database exports
|
|
|
|
### Future Enhancements
|
|
- π Advanced caching mechanisms
|
|
- π Enhanced monitoring and analytics
|
|
- π Additional export formats
|
|
- π Improved error recovery
|
|
- π Performance optimizations
|
|
- π Additional language support |