deplot_kr / README.md
dltjwl
Modify : Widget
7cf4a3e
|
raw
history blame
2.52 kB

deplot_kr

deplot_kr is a Image-to-Data(Text) model based on the google's pix2struct architecture. It was fine-tuned from DePlot, using korean chart image-text pairs.

deplot_kr์€ google์˜ pix2struct ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ํ•œ๊ตญ์–ด image-to-data(ํ…์ŠคํŠธ ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ ํ…Œ์ด๋ธ”) ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. DePlot ๋ชจ๋ธ์„ ํ•œ๊ตญ์–ด ์ฐจํŠธ ์ด๋ฏธ์ง€-ํ…์ŠคํŠธ ์Œ ๋ฐ์ดํ„ฐ์„ธํŠธ(30๋งŒ ๊ฐœ)๋ฅผ ์ด์šฉํ•˜์—ฌ fine-tuning ํ–ˆ์Šต๋‹ˆ๋‹ค.

How to use

You can run a prediction by input an image.
Model predict the data table of text form in the image.

์ด๋ฏธ์ง€๋ฅผ ๋ชจ๋ธ์— ์ž…๋ ฅํ•˜๋ฉด ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ํ‘œ ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ ํ…Œ์ด๋ธ”์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

from transformers import Pix2StructForConditionalGeneration, Pix2StructImageProcessor, AutoTokenizer, Pix2StructProcessor
from PIL import Image

image_processor = Pix2StructImageProcessor()
tokenizer = AutoTokenizer.from_pretrained("brainventures/deplot_kr")
processor = Pix2StructProcessor(image_processor=image_processor, tokenizer=tokenizer)

model = Pix2StructForConditionalGeneration.from_pretrained("brainventures/deplot_kr")

image_path = "IMAGE_PATH"
image = Image.open(image_path)

inputs = processor(images=image, return_tensors="pt")
pred = model.generate(flattened_patches=flattened_patches, attention_mask=attention_mask, max_length=1024)
print(processor.batch_decode(deplot_generated_ids, skip_special_token=True)[0])

Preprocessing

According to Liu et al.(2023)...

  • markdown format
  • | : seperating cells (์—ด ๊ตฌ๋ถ„)
  • \n : seperating rows (ํ–‰ ๊ตฌ๋ถ„)

Train

The model was trained in a TPU environment.

  • num_warmup_steps : 1,000
  • num_training_steps : 40,000

Evaluation Results

This model achieves the following results:

metrics name %
RNSS (Relative Number Set Similarity) 99.5483
RMS F1 (Relative Mapping Similarity) 16.6401

Contact

For questions and comments, please use the discussion tab or email gloria@brainventur.com