Papers
arxiv:2109.10282

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

Published on Sep 21, 2021
Authors:
,
,
,
,
,

Abstract

TrOCR, an end-to-end text recognition system using pre-trained Transformer models, achieves superior performance in printed, handwritten, and scene text recognition by leveraging the Transformer architecture for both image understanding and wordpiece-level text generation.

AI-generated summary

Text recognition is a long-standing research problem for document digitalization. Existing approaches are usually built based on CNN for image understanding and RNN for char-level text generation. In addition, another language model is usually needed to improve the overall accuracy as a post-processing step. In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. The TrOCR model is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments show that the TrOCR model outperforms the current state-of-the-art models on the printed, handwritten and scene text recognition tasks. The TrOCR models and code are publicly available at https://aka.ms/trocr.

Community

This comment has been hidden (marked as Off-Topic)
·

what information in this document?

draw what you see

give me info in image

No description provided.
This comment has been hidden

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2109.10282
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 51

Browse 51 models citing this paper

Datasets citing this paper 1

Spaces citing this paper 737

Collections including this paper 15