Spaces:
Running
Running
A newer version of the Streamlit SDK is available:
1.45.1
System Patterns: HOCR Processing Tool
1. High-Level Architecture
- Modular Pipeline: The system appears structured as a pipeline with distinct modules for different stages of OCR processing. Key modules suggested by filenames include:
preprocessing.py
: Handles initial image adjustments.image_segmentation.py
: Identifies regions of interest (text blocks).ocr_processing.py
: Manages the core OCR engine interaction.language_detection.py
: Determines the language of the text.pdf_ocr.py
: Specific handling for PDF inputs.structured_ocr.py
: Likely involved in formatting the output.
- Configuration Driven:
config.py
suggests a centralized configuration management approach, allowing pipeline behavior to be customized. - Entry Point / Orchestration:
app.py
likely serves as the main entry point or orchestrator, possibly for a web UI or API, coordinating the pipeline execution based on user input and configuration.process_file.py
might be an alternative entry point or a core processing function called byapp.py
. - UI Layer: The
ui/
directory (ui/layout.py
,ui/ui_components.py
) indicates a dedicated user interface layer, possibly built with Streamlit or Flask (as suggested inprojectbrief.md
). - Utility Functions: The
utils/
directory (utils/image_utils.py
,utils/text_utils.py
, etc.) points to a pattern of encapsulating reusable helper functions. - Error Handling:
error_handler.py
suggests a dedicated mechanism for managing and reporting errors during processing.
2. Key Design Patterns (Inferred)
- Pipeline Pattern: The core processing flow seems to follow a pipeline pattern, where data (image/document) passes through sequential processing stages.
- Configuration Management: Centralized configuration (
config.py
) allows for decoupling settings from code. - Separation of Concerns: Different functionalities (UI, core processing, utilities, configuration) appear to be separated into distinct modules/files.
- Utility/Helper Modules: Common, reusable functions are grouped into utility modules.
3. Component Relationships (Initial Diagram - Mermaid)
graph TD
subgraph User Interface / Entry Point
A[app.py / UI Layer] --> B(process_file.py);
end
subgraph Configuration
C[config.py];
end
subgraph Core OCR Pipeline
B --> D(preprocessing.py);
D --> E(image_segmentation.py);
E --> F(ocr_processing.py);
F --> G(language_detection.py);
G --> H(structured_ocr.py);
end
subgraph Input Handling
I[pdf_ocr.py] --> B;
J[Image Input] --> B;
end
subgraph Utilities
K[utils/];
L[error_handler.py];
end
A --> C;
B --> C;
D --> K;
E --> K;
F --> K;
G --> K;
H --> K;
I --> K;
B --> L;
style User Interface / Entry Point fill:#f9f,stroke:#333,stroke-width:2px
style Configuration fill:#ccf,stroke:#333,stroke-width:2px
style Core OCR Pipeline fill:#cfc,stroke:#333,stroke-width:2px
style Input Handling fill:#ffc,stroke:#333,stroke-width:2px
style Utilities fill:#eee,stroke:#333,stroke-width:2px
4. Critical Implementation Paths
- Image Input -> Preprocessing -> Segmentation -> OCR -> Structured Output: The main flow for image files.
- PDF Input -> PDF Extraction -> Image Conversion (per page) -> [Main Flow] -> Aggregated Output: The likely path for PDF documents.
- Configuration Loading -> Pipeline Execution: How settings influence the process.
(This document outlines the observed structure. It will be refined as the codebase is analyzed in more detail.)