ImageDataExtractor2 / README2.md
WebashalarForML's picture
Update README2.md
c5ca5ae verified
|
raw
history blame
5.21 kB

Image Data Extractor


Overview:

The Image Data Extractor is a Python-based tool designed to extract and structure text data from images of visiting cards using PaddleOCR. The extracted text is processed to identify and organize key information such as name, designation, contact number, address, and company name. The Mistral 7B model is used for advanced text analysis, and if it becomes unavailable, the system falls back to the Gliner urchade/gliner_mediumv2.1 model. Both Mistral 7B and Gliner urchade/gliner_mediumv2.1 models are used under the Apache 2.0 license.


Installation Guide:

  1. Create and Activate a Virtual Environment

    python -m venv venv
    source venv/bin/activate  # For Linux/Mac
    # or
    venv\Scripts\activate  # For Windows
    
  2. Install Required Libraries

    pip install -r requirements.txt
    
  3. Run the Application

    • If Docker is being used:
    docker-compose up --build
    
    • Without Docker:
    python app.py
    
  4. Set up Hugging Face Token

    • Add your Hugging Face token in the .env file:
    HF_TOKEN=<your_huggingface_token>
    

File Structure Overview:

ImageDataExtractor/
β”‚
β”œβ”€β”€ app.py                       # Main Flask app
β”œβ”€β”€ requirements.txt             # Dependencies
β”œβ”€β”€ Dockerfile                   # Docker container setup
β”œβ”€β”€ docker-compose.yml           # Docker Compose setup
β”‚    
β”œβ”€β”€ utility/     
β”‚   └── utils.py                 # PaddleOCR integration, Image preprocessing and Mistral model processing 
β”‚    
β”œβ”€β”€ template/    
β”‚   β”œβ”€β”€ index.html               # UI for image uploads
β”‚   └── result.html              # Display extracted results
β”‚    
β”œβ”€β”€ Backup/  
β”‚   β”œβ”€β”€ modules/                 # Base classes for data processing models
β”‚   β”‚   └── base.py              
β”‚   β”‚   └── data_proc.py         
β”‚   β”‚   └── evaluator.py         
β”‚   β”‚   └── layers.py            
β”‚   β”‚   └── run_evaluation.py    
β”‚   β”‚   └── span_rep.py          
β”‚   β”‚   └── token_rep.py         
β”‚   β”œβ”€β”€ backup.py                # Backup handling Gliner Model integration and backup logic
β”‚   └── model.py                 
β”‚   └── save_load.py             
β”‚   └── train.py                 
β”‚    
└── .env                         # Environment variables (includes Hugging Face token)

Program Overview:

PaddleOCR Integration (utility/utils.py):

  • Text Extraction: The tool utilizes PaddleOCR to extract text from image-based inputs (PNG, JPG, JPEG) of visiting cards.
  • Preprocessing: Handles basic image preprocessing to enhance text recognition for OCR.

Mistral 7B Integration (utility/utils.py):

  • Data Structuring: After text extraction, the Mistral 7B model processes the extracted data, structuring it into fields such as name, designation, contact number, address, and company name.

Fallback Mechanism (Backup/backup.py):

  • Gliner urchade/gliner_mediumv2.1 Model: If the Mistral model is unavailable, the system uses the Gliner urchade/gliner_mediumv2.1 model to perform the same task, ensuring continuous service.
  • Error Handling: Manages failures in model availability and ensures smooth fallback.

Web Interface (app.py):

  • Flask API: Provides endpoints for image uploads and displays the results in a structured manner.
  • HTML Interface: A frontend for users to upload images of visiting cards and view the parsed results.

Tree Map of the Program:

app.py
β”œβ”€β”€ Handles Flask API and web interface
β”œβ”€β”€ Manages file upload
β”œβ”€β”€ Extracts text with PaddleOCR
β”œβ”€β”€ Processes text with Mistral 7B
└── Displays structured results

utility/utils.py
β”œβ”€β”€ PaddleOCR for text extraction
└── Mistral 7B for data structuring

Backup/backup.py
β”œβ”€β”€ Gliner urchade/gliner_mediumv2.1 as fallback
└── Backup and error handling

Licensing:


Main Task:

The primary objective is to extract and structure data from visiting cards. The system identifies and organizes:

  • Name
  • Designation
  • Phone Number
  • Address
  • Company Name

References: