--- license: apache-2.0 language: - en metrics: - cer pipeline_tag: image-to-text --- ```markdown # OCR with Hugging Face Transformers ``` This repository demonstrates how to perform Optical Character Recognition (OCR) using the Hugging Face Transformers library. The code in this repository utilizes a pretrained model for OCR on images. ## Prerequisites Before you can run the code, you'll need to install the required libraries. You can do this with `pip`: ```python pip install transformers pip install pillow ``` ## Usage You can use the provided code to perform OCR on images. Here are the basic steps: 1. Import the necessary libraries: ```python from transformers import VisionEncoderDecoderModel from transformers import TrOCRProcessor, VisionEncoderDecoderModel from PIL import Image import requests ``` 2. Load the pretrained OCR model and processor: ```python model = VisionEncoderDecoderModel.from_pretrained("vanshp123/ocrmnist") processor = TrOCRProcessor.from_pretrained('microsoft/trocr-base-stage1') ``` 3. Load an image for OCR. You can replace `"/content/left_digit_section_4.png"` with the path to your image: ```python image = Image.open("/content/left_digit_section_4.png").convert("RGB") ``` 4. Process the image using the OCR processor and generate the text: ```python pixel_values = processor(images=image, return_tensors="pt").pixel_values generated_ids = model.generate(pixel_values) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` 5. `generated_text` will contain the text recognized from the image. ## Example You can use this code as a starting point for your OCR projects. It's important to adapt it to your specific use case and customize it as needed. ## License This code uses models from the Hugging Face Transformers library, and you should review their licensing and usage terms for the pretrained models. ```