--- license: mit tags: - donut - image-to-text - vision datasets: - shreyanshu09/Block_Diagram - shreyanshu09/BD-EnKo language: - en - ko --- # Block Diagram Global Information Extractor It was introduced in the paper **"Unveiling the Power of Integration: Block Diagram Summarization through Local-Global Fusion"** accepted at ACL 2024. ## Model description This model is trained using a transformer encoder and decoder architecture, based on the configuration specified in [Donut](https://arxiv.org/abs/2111.15664), to extract the overall summary of block diagram images. It supports both English and Korean languages. The straightforward architecture comprises a visual encoder module and a text decoder module, both based on the Transformer architecture. ## Training dataset - 41,933 samples from the synthetic and real-world block diagrams in English language (BD-EnKo) - 33,101 samples from the synthetic and real-world block diagrams in Korean language (BD-EnKo) - 396 samples from real-world English block diagram dataset (CBD) - 357 samples from handwritten English block diagram dataset (FC_A) - 476 samples from handwritten English block diagram dataset (FC_B) ## How to use Here is how to use this model in PyTorch: ```python import os from PIL import Image import torch from donut import DonutModel # Load the pre-trained model model = DonutModel.from_pretrained("shreyanshu09/block_diagram_global_information") # Move the model to GPU if available if torch.cuda.is_available(): model.half() device = torch.device("cuda:0") model.to(device) # Function to process a single image def process_image(image_path): # Load and process the image image = Image.open(image_path) task_name = os.path.basename('/block_diagram_global_information/dataset/c2t_data/') # Create empty folder anywhere result = model.inference(image=image, prompt=f"")["predictions"][0] # Extract the relevant information from the result if 'c2t' in result: return result['c2t'] else: return result['text_sequence'] # Example usage image_path = 'image.png' # Input image file result = process_image(image_path) ```