---
license: apache-2.0
language:
- en
metrics:
- cer
pipeline_tag: image-to-text
---


```markdown
# OCR with Hugging Face Transformers
```
This repository demonstrates how to perform Optical Character Recognition (OCR) using the Hugging Face Transformers library. The code in this repository utilizes a pretrained model for OCR on images.

## Prerequisites

Before you can run the code, you'll need to install the required libraries. You can do this with `pip`:

```python
pip install transformers
pip install pillow
```

## Usage

You can use the provided code to perform OCR on images. Here are the basic steps:

1. Import the necessary libraries:

```python
from transformers import VisionEncoderDecoderModel
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
```

2. Load the pretrained OCR model and processor:

```python
model = VisionEncoderDecoderModel.from_pretrained("vanshp123/ocrmnist")
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-base-stage1')
```

3. Load an image for OCR. You can replace `"/content/left_digit_section_4.png"` with the path to your image:

```python
image = Image.open("/content/left_digit_section_4.png").convert("RGB")
```

4. Process the image using the OCR processor and generate the text:

```python
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

5. `generated_text` will contain the text recognized from the image.

## Example

You can use this code as a starting point for your OCR projects. It's important to adapt it to your specific use case and customize it as needed.

## License

This code uses models from the Hugging Face Transformers library, and you should review their licensing and usage terms for the pretrained models.

```