Image Data Extractor
Overview:
The Image Data Extractor is a Python-based tool designed to extract and structure text data from images of visiting cards using PaddleOCR. The extracted text is processed to identify and organize key information such as name, designation, contact number, address, and company name. The Mistral 7B model is used for advanced text analysis, and if it becomes unavailable, the system falls back to the Gliner urchade/gliner_mediumv2.1 model. Both Mistral 7B and Gliner urchade/gliner_mediumv2.1 models are used under the Apache 2.0 license.
Installation Guide:
Create and Activate a Virtual Environment
python -m venv venv source venv/bin/activate # For Linux/Mac # or venv\Scripts\activate # For Windows
Install Required Libraries
pip install -r requirements.txt
Run the Application
- If Docker is being used:
docker-compose up --build
- Without Docker:
python app.py
Set up Hugging Face Token
- Add your Hugging Face token in the
.env
file:
HF_TOKEN=<your_huggingface_token>
- Add your Hugging Face token in the
File Structure Overview:
ImageDataExtractor/
β
βββ app.py # Main Flask app
βββ requirements.txt # Dependencies
βββ Dockerfile # Docker container setup
βββ docker-compose.yml # Docker Compose setup
β
βββ utility/
β βββ utils.py # PaddleOCR integration, Image preprocessing and Mistral model processing
β
βββ template/
β βββ index.html # UI for image uploads
β βββ result.html # Display extracted results
β
βββ Backup/
β βββ modules/ # Base classes for data processing models
β β βββ base.py
β β βββ data_proc.py
β β βββ evaluator.py
β β βββ layers.py
β β βββ run_evaluation.py
β β βββ span_rep.py
β β βββ token_rep.py
β βββ backup.py # Backup handling Gliner Model integration and backup logic
β βββ model.py
β βββ save_load.py
β βββ train.py
β
βββ .env # Environment variables (includes Hugging Face token)
Program Overview:
PaddleOCR Integration (utility/utils.py):
- Text Extraction: The tool utilizes PaddleOCR to extract text from image-based inputs (PNG, JPG, JPEG) of visiting cards.
- Preprocessing: Handles basic image preprocessing to enhance text recognition for OCR.
Mistral 7B Integration (utility/utils.py):
- Data Structuring: After text extraction, the Mistral 7B model processes the extracted data, structuring it into fields such as name, designation, contact number, address, and company name.
Fallback Mechanism (Backup/backup.py):
- Gliner urchade/gliner_mediumv2.1 Model: If the Mistral model is unavailable, the system uses the Gliner urchade/gliner_mediumv2.1 model to perform the same task, ensuring continuous service.
- Error Handling: Manages failures in model availability and ensures smooth fallback.
Web Interface (app.py):
- Flask API: Provides endpoints for image uploads and displays the results in a structured manner.
- HTML Interface: A frontend for users to upload images of visiting cards and view the parsed results.
Tree Map of the Program:
app.py
βββ Handles Flask API and web interface
βββ Manages file upload
βββ Extracts text with PaddleOCR
βββ Processes text with Mistral 7B
βββ Displays structured results
utility/utils.py
βββ PaddleOCR for text extraction
βββ Mistral 7B for data structuring
Backup/backup.py
βββ Gliner urchade/gliner_mediumv2.1 as fallback
βββ Backup and error handling
Licensing:
- Mistral 7B model is used under the Apache 2.0 license.
- Gliner urchade/gliner_mediumv2.1 model is used under the Apache 2.0 license.
Main Task:
The primary objective is to extract and structure data from visiting cards. The system identifies and organizes:
- Name
- Designation
- Phone Number
- Address
- Company Name
References:
- PaddleOCR Documentation
- Mistral 7B Documentation
- Gliner urchade/gliner_mediumv2.1 Documentation
- Flask Documentation
- Docker Documentation
- Virtual Environments in Python