ko-trocr / README.md
ddobokki's picture
Update README.md
fe4f13c verified
|
raw
history blame
2.17 kB
metadata
language:
  - ko
tags:
  - ocr
widget:
  - src: https://raw.githubusercontent.com/ddobokki/ocr_img_example/master/g.jpg
    example_title: word1
  - src: https://raw.githubusercontent.com/ddobokki/ocr_img_example/master/khs.jpg
    example_title: word2
  - src: https://raw.githubusercontent.com/ddobokki/ocr_img_example/master/m.jpg
    example_title: word3
pipeline_tag: image-to-text
license: apache-2.0

korean trocr model

  • trocr λͺ¨λΈμ€ λ””μ½”λ”μ˜ ν† ν¬λ‚˜μ΄μ €μ— μ—†λŠ” κΈ€μžλŠ” ocr ν•˜μ§€ λͺ»ν•˜κΈ° λ•Œλ¬Έμ—, μ΄ˆμ„±μ„ μ‚¬μš©ν•˜λŠ” ν† ν¬λ‚˜μ΄μ €λ₯Ό μ‚¬μš©ν•˜λŠ” 디코더 λͺ¨λΈμ„ μ‚¬μš©ν•˜μ—¬ μ΄ˆμ„±λ„ UNK둜 λ‚˜μ˜€μ§€ μ•Šκ²Œ λ§Œλ“  trocr λͺ¨λΈμž…λ‹ˆλ‹€.
  • 2023 ꡐ원그룹 AI OCR μ±Œλ¦°μ§€ μ—μ„œ μ–»μ—ˆλ˜ λ…Έν•˜μš°λ₯Ό ν™œμš©ν•˜μ—¬ μ œμž‘ν•˜μ˜€μŠ΅λ‹ˆλ‹€.

train datasets

AI Hub

model structure

how to use

from transformers import TrOCRProcessor, VisionEncoderDecoderModel, AutoTokenizer
import requests 
import unicodedata
from io import BytesIO
from PIL import Image

processor = TrOCRProcessor.from_pretrained("ddobokki/ko-trocr") 
model = VisionEncoderDecoderModel.from_pretrained("ddobokki/ko-trocr")
tokenizer = AutoTokenizer.from_pretrained("ddobokki/ko-trocr")

url = "https://raw.githubusercontent.com/ddobokki/ocr_img_example/master/g.jpg"
response = requests.get(url)
img = Image.open(BytesIO(response.content))

pixel_values = processor(img, return_tensors="pt").pixel_values 
generated_ids = model.generate(pixel_values, max_length=64)
generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
generated_text = unicodedata.normalize("NFC", generated_text)
print(generated_text)