deplot_kr
deplot_kr is a Image-to-Data(Text) model based on the google's pix2struct architecture. It was fine-tuned from DePlot, using korean chart image-text pairs.
deplot_kr์ google์ pix2struct ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ํ ํ๊ตญ์ด image-to-data(ํ ์คํธ ํํ์ ๋ฐ์ดํฐ ํ ์ด๋ธ) ๋ชจ๋ธ์ ๋๋ค. DePlot ๋ชจ๋ธ์ ํ๊ตญ์ด ์ฐจํธ ์ด๋ฏธ์ง-ํ ์คํธ ์ ๋ฐ์ดํฐ์ธํธ(30๋ง ๊ฐ)๋ฅผ ์ด์ฉํ์ฌ fine-tuning ํ์ต๋๋ค.
How to use
You can run a prediction by input an image.
Model predict the data table of text form in the image.
์ด๋ฏธ์ง๋ฅผ ๋ชจ๋ธ์ ์ ๋ ฅํ๋ฉด ๋ชจ๋ธ์ ์ด๋ฏธ์ง๋ก๋ถํฐ ํ ํํ์ ๋ฐ์ดํฐ ํ ์ด๋ธ์ ์์ธกํฉ๋๋ค.
from transformers import Pix2StructForConditionalGeneration, Pix2StructImageProcessor, AutoTokenizer, Pix2StructProcessor
from PIL import Image
image_processor = Pix2StructImageProcessor()
tokenizer = AutoTokenizer.from_pretrained("brainventures/deplot_kr")
processor = Pix2StructProcessor(image_processor=image_processor, tokenizer=tokenizer)
model = Pix2StructForConditionalGeneration.from_pretrained("brainventures/deplot_kr")
image_path = "IMAGE_PATH"
image = Image.open(image_path)
inputs = processor(images=image, return_tensors="pt")
pred = model.generate(flattened_patches=flattened_patches, attention_mask=attention_mask, max_length=1024)
print(processor.batch_decode(deplot_generated_ids, skip_special_token=True)[0])
Preprocessing
According to Liu et al.(2023)...
- markdown format
- | : seperating cells (์ด ๊ตฌ๋ถ)
- \n : seperating rows (ํ ๊ตฌ๋ถ)
Train
The model was trained in a TPU environment.
- num_warmup_steps : 1,000
- num_training_steps : 40,000
Evaluation Results
This model achieves the following results:
metrics name | % |
---|---|
RNSS (Relative Number Set Similarity) | 99.5483 |
RMS F1 (Relative Mapping Similarity) | 16.6401 |
Contact
For questions and comments, please use the discussion tab or email gloria@brainventur.com