ko-deplot

ko-deplot is a korean Visual-QA model based on the Google's Pix2Struct architecture. It was fine-tuned from Deplot, using korean chart image-text pairs.

ko-deplot은 Google의 Pix2Struct 구조를 기반으로 한 한국어 Visual-QA 모델입니다. Deplot 모델을 한국어 차트 이미지-텍스트 쌍 데이터셋을 이용하여 파인튜닝하였습니다.

Developed by: NUUA
Model type: Visual Question Answering
License: apache-2.0
Finetuned from model: google/deplot

Model Usage

You can run a prediction by querying an input image together with a question as follows:

아래의 코드를 이용하여 모델 추론을 할 수 있습니다:

from transformers import Pix2StructProcessor, Pix2StructForConditionalGeneration
from PIL import Image

processor = Pix2StructProcessor.from_pretrained('nuua/ko-deplot')
model = Pix2StructForConditionalGeneration.from_pretrained('nuua/ko-deplot')

IMAGE_PATH = "LOCAL_PATH_TO_IMAGE"
image = Image.open(IMAGE_PATH)

inputs = processor(images=image, text="Generate underlying data table of the figure below:", return_tensors="pt")
predictions = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(predictions[0], skip_special_tokens=True))

Tokenizer Details

The model's tokenizer vocab was extended from 50,344 to 65,536 tokens using the following:

Complete Korean Jamo
Additional Korean Jamo
Ko-Electra tokens

모델의 tokenizer vocab을 50344개에서 65536개로 아래를 이용하여 확장시킨 후 학습을 진행하였습니다:

완성형 한글 자모
추가 완성형 한글 자모
Ko-Electra 한글 토큰

Training Details

Training Data

Synthetic chart data from three libraries were used:

세 개의 라이브러리에서 합성 차트 데이터를 생성하여 사용하였습니다:

Training Procedure

The model was first exposed to a short warmup stage, following its original paper. It was then trained using the chart data for 50,000 steps.

학습을 위해 처음 짧은 "warmup" 단계를 거쳐 한글을 학습시킨 후 50,000 스텝 동안 차트 데이터를 학습시켰습니다.

Technical Specifications

Hardware

ko-deplot was trained by using A100 80G.

A100 80G GPU를 이용하여 학습하였습니다.

Contact

Any questions and suggestions, please use the discussion tab. If you want to contact us directly, email robin@nuua.ai.

nuua
/

ko-deplot