ocrmnist / README.md
vanshp123's picture
Update README.md
42032a7
---
license: apache-2.0
language:
- en
metrics:
- cer
pipeline_tag: image-to-text
---
```markdown
# OCR with Hugging Face Transformers
```
This repository demonstrates how to perform Optical Character Recognition (OCR) using the Hugging Face Transformers library. The code in this repository utilizes a pretrained model for OCR on images.
## Prerequisites
Before you can run the code, you'll need to install the required libraries. You can do this with `pip`:
```python
pip install transformers
pip install pillow
```
## Usage
You can use the provided code to perform OCR on images. Here are the basic steps:
1. Import the necessary libraries:
```python
from transformers import VisionEncoderDecoderModel
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
```
2. Load the pretrained OCR model and processor:
```python
model = VisionEncoderDecoderModel.from_pretrained("vanshp123/ocrmnist")
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-base-stage1')
```
3. Load an image for OCR. You can replace `"/content/left_digit_section_4.png"` with the path to your image:
```python
image = Image.open("/content/left_digit_section_4.png").convert("RGB")
```
4. Process the image using the OCR processor and generate the text:
```python
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
5. `generated_text` will contain the text recognized from the image.
## Example
You can use this code as a starting point for your OCR projects. It's important to adapt it to your specific use case and customize it as needed.
## License
This code uses models from the Hugging Face Transformers library, and you should review their licensing and usage terms for the pretrained models.
```