Spaces:
Running
Running
File size: 1,496 Bytes
2d01495 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
# Configuration Refactoring
## Overview
This document outlines the changes made to centralize configuration parameters and reduce technical debt in the OCR processing system.
## Key Changes
### Centralized Configuration
All previously hard-coded parameters have been moved to `config.py` and organized by functional category:
- **PDF_SETTINGS**: Parameters for PDF processing
- **SEGMENTATION_SETTINGS**: Image segmentation configuration
- **CACHE_SETTINGS**: Cache TTL and capacity settings
- **TEXT_REPAIR_SETTINGS**: Duplication detection and repair thresholds
### Environment Variable Support
All configuration parameters can now be overridden via environment variables:
```bash
# Example: Override PDF DPI
export PDF_DEFAULT_DPI=200
# Example: Increase cache size
export CACHE_MAX_ENTRIES=50
```
### Import Strategy
To prevent circular dependencies, configuration is imported at function level where needed:
```python
def process_image():
from config import SEGMENTATION_SETTINGS
# Function implementation using settings
```
## Benefits
- **Maintainability**: Settings are centralized and documented
- **Flexibility**: Configuration can be adjusted without code changes
- **Consistency**: Standardized approach to configuration across modules
- **Traceability**: Clear overview of all configurable parameters
## Future Improvements
- Add configuration schema validation
- Support for configuration profiles (dev/test/prod)
- Add detailed documentation for each parameter |