Spaces:
Running
Running
# Configuration Refactoring | |
## Overview | |
This document outlines the changes made to centralize configuration parameters and reduce technical debt in the OCR processing system. | |
## Key Changes | |
### Centralized Configuration | |
All previously hard-coded parameters have been moved to `config.py` and organized by functional category: | |
- **PDF_SETTINGS**: Parameters for PDF processing | |
- **SEGMENTATION_SETTINGS**: Image segmentation configuration | |
- **CACHE_SETTINGS**: Cache TTL and capacity settings | |
- **TEXT_REPAIR_SETTINGS**: Duplication detection and repair thresholds | |
### Environment Variable Support | |
All configuration parameters can now be overridden via environment variables: | |
```bash | |
# Example: Override PDF DPI | |
export PDF_DEFAULT_DPI=200 | |
# Example: Increase cache size | |
export CACHE_MAX_ENTRIES=50 | |
``` | |
### Import Strategy | |
To prevent circular dependencies, configuration is imported at function level where needed: | |
```python | |
def process_image(): | |
from config import SEGMENTATION_SETTINGS | |
# Function implementation using settings | |
``` | |
## Benefits | |
- **Maintainability**: Settings are centralized and documented | |
- **Flexibility**: Configuration can be adjusted without code changes | |
- **Consistency**: Standardized approach to configuration across modules | |
- **Traceability**: Clear overview of all configurable parameters | |
## Future Improvements | |
- Add configuration schema validation | |
- Support for configuration profiles (dev/test/prod) | |
- Add detailed documentation for each parameter |