mychen76's picture
Update README.md
1734081 verified
metadata
library_name: transformers
tags: []

Model Card for Model ID

Extract POS Receipt Image Data To JSON Record

Model Details

Finetuned Google's PaliGemma Model for Receipt Image extraction to JSON Record.

gradio demo app: https://github.com/minyang-chen/paligemma-receipt-json-v2

Model Usage

Setup Environment

pip install transformers==4.42.2
pip install datasets
pip install peft accelerate bitsandbytes

Specify Device

import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device_map={"":0}

Step-1 Load Image Processor

from transformers import AutoProcessor

FINETUNED_MODEL_ID = "mychen76/paligemma-receipt-json-3b-mix-448-v2b"
processor = AutoProcessor.from_pretrained(FINETUNED_MODEL_ID)

Step-2 Set Task Prompt

TASK_PROMPT = "EXTRACT_JSON_RECEIPT"
MAX_LENGTH = 512

inputs = processor(text=TASK_PROMPT, images=test_image, return_tensors="pt").to(device)
for k,v in inputs.items():
  print(k,v.shape)

Step-3 load model

import torch
from transformers import PaliGemmaForConditionalGeneration
from transformers import BitsAndBytesConfig
from transformers import BitsAndBytesConfig
from peft import get_peft_model, LoraConfig

# Load Full model
model = PaliGemmaForConditionalGeneration.from_pretrained(FINETUNED_MODEL_ID,device_map={"":0})

OR Load Quantized

# Q-LoRa 
bnb_config = BitsAndBytesConfig(
         load_in_4bit=True,
         bnb_4bit_quant_type="nf4",
         bnb_4bit_compute_type=torch.bfloat16
)
lora_config = LoraConfig(
     r=8,
     target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
     task_type="CAUSAL_LM"
)
model = PaliGemmaForConditionalGeneration.from_pretrained(FINETUNED_MODEL_ID, quantization_config=bnb_config, device_map={"":0})

Step-4 Inference

# Autoregressively generate,use greedy decoding here, for more fancy methods see https://huggingface.co/blog/how-to-generate
generated_ids = model.generate(**inputs, max_new_tokens=MAX_LENGTH)

# Next turn each predicted token ID back into a string using the decode method
# chop of the prompt, which consists of image tokens and text prompt
image_token_index = model.config.image_token_index
num_image_tokens = len(generated_ids[generated_ids==image_token_index])
num_text_tokens = len(processor.tokenizer.encode(PROMPT))
num_prompt_tokens = num_image_tokens + num_text_tokens + 2
generated_text = processor.batch_decode(generated_ids[:, num_prompt_tokens:], skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(generated_text)

Result Tokens

'<s_total></s_total><s_tips></s_tips><s_time></s_time><s_telephone>(718)308-1118</s_telephone><s_tax></s_tax><s_subtotal></s_subtotal><s_store_name></s_store_name><s_store_addr>Brooklyn,NY11211</s_store_addr><s_line_items><s_item_value>2.98</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>NORI</s_item_name><s_item_key></s_item_key><sep/><s_item_value>2.35</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>TOMATOESPLUM</s_item_name><s_item_key></s_item_key><sep/><s_item_value>0.97</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>ONIONSVIDALIA</s_item_name><s_item_key></s_item_key><sep/><s_item_value>2.48</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>HAMBURRN</s_item_name><s_item_key></s_item_key><sep/><s_item_value>0.99</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>FTRAWBERRY</s_item_name><s_item_key></s_item_key><sep/><s_item_value>0.99</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>FTRAWBERRY</s_item_name><s_item_key></s_item_key><sep/><s_item_value>0.57</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>PILSNER</'

Step-5 Convert Result to Json (borrow from donut model)

import re

# let's turn that into JSON
def token2json(tokens, is_inner_value=False, added_vocab=None):
        """
        Convert a (generated) token sequence into an ordered JSON format.
        """
        if added_vocab is None:
            added_vocab = processor.tokenizer.get_added_vocab()

        output = {}

        while tokens:
            start_token = re.search(r"<s_(.*?)>", tokens, re.IGNORECASE)
            if start_token is None:
                break
            key = start_token.group(1)
            key_escaped = re.escape(key)

            end_token = re.search(rf"</s_{key_escaped}>", tokens, re.IGNORECASE)
            start_token = start_token.group()
            if end_token is None:
                tokens = tokens.replace(start_token, "")
            else:
                end_token = end_token.group()
                start_token_escaped = re.escape(start_token)
                end_token_escaped = re.escape(end_token)
                content = re.search(
                    f"{start_token_escaped}(.*?){end_token_escaped}", tokens, re.IGNORECASE | re.DOTALL
                )
                if content is not None:
                    content = content.group(1).strip()
                    if r"<s_" in content and r"</s_" in content:  # non-leaf node
                        value = token2json(content, is_inner_value=True, added_vocab=added_vocab)
                        if value:
                            if len(value) == 1:
                                value = value[0]
                            output[key] = value
                    else:  # leaf nodes
                        output[key] = []
                        for leaf in content.split(r"<sep/>"):
                            leaf = leaf.strip()
                            if leaf in added_vocab and leaf[0] == "<" and leaf[-2:] == "/>":
                                leaf = leaf[1:-2]  # for categorical special tokens
                            output[key].append(leaf)
                        if len(output[key]) == 1:
                            output[key] = output[key][0]

                tokens = tokens[tokens.find(end_token) + len(end_token) :].strip()
                if tokens[:6] == r"<sep/>":  # non-leaf nodes
                    return [output] + token2json(tokens[6:], is_inner_value=True, added_vocab=added_vocab)

        if len(output):
            return [output] if is_inner_value else output
        else:
            return [] if is_inner_value else {"text_sequence": tokens}


## generated
generated_json = token2json(generated_text)
print(generated_json)

Final Result in Json

[{'total': '',
  'tips': '',
  'time': '',
  'telephone': '(718)308-1118',
  'tax': '',
  'subtotal': '',
  'store_name': '',
  'store_addr': 'Brooklyn,NY11211',
  'item_value': '2.98',
  'item_quantity': '1',
  'item_name': 'NORI',
  'item_key': ''},
 {'item_value': '2.35',
  'item_quantity': '1',
  'item_name': 'TOMATOESPLUM',
  'item_key': ''},
 {'item_value': '0.97',
  'item_quantity': '1',
  'item_name': 'ONIONSVIDALIA',
  'item_key': ''},
 {'item_value': '2.48',
  'item_quantity': '1',
  'item_name': 'HAMBURRN',
  'item_key': ''},
 {'item_value': '0.99',
  'item_quantity': '1',
  'item_name': 'FTRAWBERRY',
  'item_key': ''},
 {'item_value': '0.99',
  'item_quantity': '1',
  'item_name': 'FTRAWBERRY',
  'item_key': ''},
 {'item_value': '0.57', 'item_quantity': '1'}]

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: mychen76@gmail.com
  • Model type: Vision Model for Receipt Image Data Extraction
  • Language(s) (NLP): [More Information Needed]
  • License: [More Information Needed]
  • Finetuned from model [optional]: PaliGemma-3b-pt-224

Model Sources [optional]

  • Repository: [More Information Needed]
  • Paper [optional]: [More Information Needed]
  • Demo [optional]: [More Information Needed]

Uses

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

see here: mychen76/invoices-and-receipts_ocr_v1

[More Information Needed]

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

  • Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: [More Information Needed]
  • Hours used: [More Information Needed]
  • Cloud Provider: [More Information Needed]
  • Compute Region: [More Information Needed]
  • Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]