Unified AI Services
A comprehensive AI platform that integrates Named Entity Recognition (NER), Optical Character Recognition (OCR), and Retrieval-Augmented Generation (RAG) services into a unified application.
π Features
Core Services
- NER Service (Port 8500): Advanced named entity recognition with relationship extraction
- OCR Service (Port 8400): Document processing with Azure Document Intelligence
- RAG Service (Port 8401): Vector search and document retrieval
- Unified App (Port 8000): Coordinated workflows and service management
Key Capabilities
- β Multi-language support (Thai + English)
- β Complex relationship extraction
- β Entity deduplication
- β Graph database exports (Neo4j, GraphML, GEXF)
- β Vector search with semantic similarity
- β Document processing (PDF, images, text)
- β Real-time service health monitoring
- β Unified workflows combining all services
- β Comprehensive API documentation
π Quick Start
Prerequisites
- Python 3.8 or higher
- PostgreSQL with vector extension support
- Azure OpenAI account
- Azure Document Intelligence account
- DeepSeek API account (for advanced NER)
Automated Setup
Clone and navigate to the project directory
cd unified-ai-services
Run the automated setup
python setup.py
This will:
- Check your Python environment
- Create necessary directories
- Help you configure .env file
- Install dependencies
- Validate configuration
- Create startup scripts
Start the unified application
python app.py
Or use the generated scripts:
- Windows:
start_services.bat
- Unix/Linux/Mac:
./start_services.sh
- Windows:
Run comprehensive tests
python test_unified.py
Or use the generated scripts:
- Windows:
run_tests.bat
- Unix/Linux/Mac:
./run_tests.sh
- Windows:
Manual Setup
If you prefer manual setup:
Install dependencies
pip install -r requirements.txt
Create .env file (copy from .env.example)
cp .env.example .env # Edit .env with your configuration
Set up directories
mkdir -p services exports logs temp tests data
Place service files in the services directory
services/ βββ ner_service.py βββ ocr_service.py βββ rag_service.py
π Project Structure
unified-ai-services/
βββ app.py # Main unified application
βββ configs.py # Centralized configuration
βββ setup.py # Automated setup script
βββ requirements.txt # Python dependencies
βββ test_unified.py # Comprehensive test suite
βββ .env # Environment configuration
βββ services/ # Individual service files
β βββ ner_service.py # NER service implementation
β βββ ocr_service.py # OCR service implementation
β βββ rag_service.py # RAG service implementation
βββ exports/ # Generated export files
βββ logs/ # Application logs
βββ temp/ # Temporary files
βββ tests/ # Additional test files
βββ data/ # Data files
βοΈ Configuration
Environment Variables
The system uses a .env
file for configuration. Key variables include:
Server Configuration
HOST=0.0.0.0
DEBUG=True
MAIN_PORT=8000
NER_PORT=8500
OCR_PORT=8400
RAG_PORT=8401
Database Configuration
POSTGRES_HOST=your-postgres-server.com
POSTGRES_PORT=5432
POSTGRES_USER=your-username
POSTGRES_PASSWORD=your-password
POSTGRES_DATABASE=postgres
Azure OpenAI Configuration
AZURE_OPENAI_ENDPOINT=https://your-openai.openai.azure.com/
AZURE_OPENAI_API_KEY=your-api-key
EMBEDDING_MODEL=text-embedding-3-large
DeepSeek Configuration
DEEPSEEK_ENDPOINT=https://your-deepseek-endpoint/
DEEPSEEK_API_KEY=your-deepseek-key
DEEPSEEK_MODEL=DeepSeek-R1-0528
Azure Document Intelligence Configuration
AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-di.cognitiveservices.azure.com/
AZURE_DOCUMENT_INTELLIGENCE_KEY=your-di-key
Azure Storage Configuration
AZURE_STORAGE_ACCOUNT_URL=https://yourstorage.blob.core.windows.net/
AZURE_BLOB_SAS_TOKEN=your-sas-token
BLOB_CONTAINER=historylog
π§ API Documentation
Once running, access the interactive API documentation:
- Unified API: http://localhost:8000/docs
- NER Service: http://localhost:8500/docs
- OCR Service: http://localhost:8400/docs
- RAG Service: http://localhost:8401/docs
π― API Usage Examples
1. Unified Analysis (Text + RAG Indexing)
import httpx
async def unified_analysis():
data = {
"text": "Your text content here...",
"extract_relationships": True,
"include_embeddings": False,
"generate_graph_files": True,
"export_formats": ["neo4j", "json"],
"enable_rag_indexing": True,
"rag_title": "My Document",
"rag_keywords": ["keyword1", "keyword2"]
}
async with httpx.AsyncClient() as client:
response = await client.post("http://localhost:8000/analyze/unified", json=data)
return response.json()
2. Combined Search with NER Analysis
async def combined_search():
data = {
"query": "search query here",
"limit": 10,
"similarity_threshold": 0.2,
"include_ner_analysis": True
}
async with httpx.AsyncClient() as client:
response = await client.post("http://localhost:8000/search/combined", json=data)
return response.json()
3. File Upload Analysis
async def analyze_file():
files = {"file": ("document.pdf", open("document.pdf", "rb"), "application/pdf")}
data = {
"extract_relationships": "true",
"generate_graph_files": "true",
"export_formats": "neo4j,json"
}
async with httpx.AsyncClient() as client:
response = await client.post("http://localhost:8000/ner/analyze/file", files=files, data=data)
return response.json()
π§ͺ Testing
Comprehensive Test Suite
The project includes comprehensive tests covering:
- β Service health checks
- β Individual service functionality
- β Unified workflow testing
- β Service proxy functionality
- β Error handling and resilience
- β Performance testing
- β File upload/download testing
Run tests with:
python test_unified.py
Individual Service Tests
Test individual services:
# Test NER service
python test_ner.py
# Test RAG service
python test_rag.py
Quick Health Check
curl http://localhost:8000/health
π Monitoring and Health Checks
Health Endpoints
- Unified System:
GET /health
- Individual Services:
GET /ner/health
,GET /ocr/health
,GET /rag/health
- Detailed Status:
GET /status
- Service Discovery:
GET /services
Monitoring Features
- Real-time service health monitoring
- Response time tracking
- Service uptime monitoring
- Error rate tracking
- Resource usage monitoring
π Service Architecture
graph TB
Client[Client Applications]
subgraph "Unified AI Services (Port 8000)"
UA[Unified App]
Proxy[Service Proxies]
Health[Health Monitor]
end
subgraph "Core Services"
NER[NER Service<br/>Port 8500]
OCR[OCR Service<br/>Port 8400]
RAG[RAG Service<br/>Port 8401]
end
subgraph "External Services"
Azure[Azure Services]
DeepSeek[DeepSeek API]
DB[(PostgreSQL)]
end
Client --> UA
UA --> Proxy
Proxy --> NER
Proxy --> OCR
Proxy --> RAG
NER --> Azure
NER --> DeepSeek
NER --> DB
OCR --> Azure
RAG --> Azure
RAG --> DB
RAG --> OCR
π οΈ Development
Adding New Features
- Service Modifications: Update individual service files in
services/
- Unified Workflows: Modify
app.py
for new combined workflows - Configuration: Update
configs.py
for new settings - Tests: Add tests to
test_unified.py
Debugging
- Check Service Logs: Services log to console
- Health Checks: Use
/health
endpoints - Configuration: Run
python configs.py
to validate - Database: Check PostgreSQL connectivity
- Azure Services: Verify API keys and endpoints
Service Management
Start individual services for development:
# Start NER service only
cd services && python ner_service.py
# Start OCR service only
cd services && python ocr_service.py
# Start RAG service only
cd services && python rag_service.py
π¨ Troubleshooting
Common Issues
1. Services Won't Start
- Check port availability:
netstat -an | grep :8000
- Verify Python dependencies:
pip list
- Check .env configuration:
python configs.py
2. Database Connection Issues
- Verify PostgreSQL is running
- Check connection string in .env
- Test connectivity:
python -c "import asyncpg; asyncio.run(asyncpg.connect('your-connection-string'))"
3. Azure Service Issues
- Verify API keys and endpoints
- Check Azure service status
- Review rate limits and quotas
4. Performance Issues
- Monitor resource usage:
top
or Task Manager - Check database performance
- Review log files for errors
Error Codes
- 500: Internal service error
- 503: Service unavailable
- 400: Bad request (check input data)
- 422: Validation error
- 404: Endpoint not found
π Performance Optimization
Recommended Settings
Production Configuration
DEBUG=False
MAX_FILE_SIZE=50
REQUEST_TIMEOUT=300
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
Database Optimization
- Use connection pooling
- Configure appropriate indexes
- Monitor query performance
- Regular maintenance
Service Optimization
- Enable caching where appropriate
- Use async operations
- Optimize batch processing
- Monitor memory usage
π Security Considerations
API Security
- Implement authentication/authorization as needed
- Use HTTPS in production
- Validate all input data
- Rate limiting
Data Security
- Secure database connections (SSL)
- Encrypt sensitive data
- Regular security updates
- Monitor access logs
Azure Security
- Rotate API keys regularly
- Use managed identities where possible
- Monitor usage and costs
- Follow Azure security best practices
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π€ Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run the test suite
- Submit a pull request
π Support
For support and questions:
- Check this README for common issues
- Review the test suite for usage examples
- Check service logs for error details
- Verify configuration with
python configs.py
π― Roadmap
Current Version (1.0.0)
- β Unified service integration
- β Comprehensive testing
- β Multi-language support
- β Graph database exports
Future Enhancements
- π Advanced caching mechanisms
- π Enhanced monitoring and analytics
- π Additional export formats
- π Improved error recovery
- π Performance optimizations
- π Additional language support