Model Card for Model ID
Extract POS Receipt Image Data To JSON Record
Model Details
Finetuned Google's PaliGemma Model for Receipt Image extraction to JSON Record.
gradio demo app: https://github.com/minyang-chen/paligemma-receipt-json-v2
Model Usage
Setup Environment
pip install transformers==4.42.2
pip install datasets
pip install peft accelerate bitsandbytes
Specify Device
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device_map={"":0}
Step-1 Load Image Processor
from transformers import AutoProcessor
FINETUNED_MODEL_ID = "mychen76/paligemma-receipt-json-3b-mix-448-v2b"
processor = AutoProcessor.from_pretrained(FINETUNED_MODEL_ID)
Step-2 Set Task Prompt
TASK_PROMPT = "EXTRACT_JSON_RECEIPT"
MAX_LENGTH = 512
inputs = processor(text=TASK_PROMPT, images=test_image, return_tensors="pt").to(device)
for k,v in inputs.items():
print(k,v.shape)
Step-3 load model
import torch
from transformers import PaliGemmaForConditionalGeneration
from transformers import BitsAndBytesConfig
from transformers import BitsAndBytesConfig
from peft import get_peft_model, LoraConfig
# Load Full model
model = PaliGemmaForConditionalGeneration.from_pretrained(FINETUNED_MODEL_ID,device_map={"":0})
OR Load Quantized
# Q-LoRa
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_type=torch.bfloat16
)
lora_config = LoraConfig(
r=8,
target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
task_type="CAUSAL_LM"
)
model = PaliGemmaForConditionalGeneration.from_pretrained(FINETUNED_MODEL_ID, quantization_config=bnb_config, device_map={"":0})
Step-4 Inference
# Autoregressively generate,use greedy decoding here, for more fancy methods see https://huggingface.co/blog/how-to-generate
generated_ids = model.generate(**inputs, max_new_tokens=MAX_LENGTH)
# Next turn each predicted token ID back into a string using the decode method
# chop of the prompt, which consists of image tokens and text prompt
image_token_index = model.config.image_token_index
num_image_tokens = len(generated_ids[generated_ids==image_token_index])
num_text_tokens = len(processor.tokenizer.encode(PROMPT))
num_prompt_tokens = num_image_tokens + num_text_tokens + 2
generated_text = processor.batch_decode(generated_ids[:, num_prompt_tokens:], skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(generated_text)
Result Tokens
'<s_total></s_total><s_tips></s_tips><s_time></s_time><s_telephone>(718)308-1118</s_telephone><s_tax></s_tax><s_subtotal></s_subtotal><s_store_name></s_store_name><s_store_addr>Brooklyn,NY11211</s_store_addr><s_line_items><s_item_value>2.98</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>NORI</s_item_name><s_item_key></s_item_key><sep/><s_item_value>2.35</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>TOMATOESPLUM</s_item_name><s_item_key></s_item_key><sep/><s_item_value>0.97</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>ONIONSVIDALIA</s_item_name><s_item_key></s_item_key><sep/><s_item_value>2.48</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>HAMBURRN</s_item_name><s_item_key></s_item_key><sep/><s_item_value>0.99</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>FTRAWBERRY</s_item_name><s_item_key></s_item_key><sep/><s_item_value>0.99</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>FTRAWBERRY</s_item_name><s_item_key></s_item_key><sep/><s_item_value>0.57</s_item_value><s_item_quantity>1</s_item_quantity><s_item_name>PILSNER</'
Step-5 Convert Result to Json (borrow from donut model)
import re
# let's turn that into JSON
def token2json(tokens, is_inner_value=False, added_vocab=None):
"""
Convert a (generated) token sequence into an ordered JSON format.
"""
if added_vocab is None:
added_vocab = processor.tokenizer.get_added_vocab()
output = {}
while tokens:
start_token = re.search(r"<s_(.*?)>", tokens, re.IGNORECASE)
if start_token is None:
break
key = start_token.group(1)
key_escaped = re.escape(key)
end_token = re.search(rf"</s_{key_escaped}>", tokens, re.IGNORECASE)
start_token = start_token.group()
if end_token is None:
tokens = tokens.replace(start_token, "")
else:
end_token = end_token.group()
start_token_escaped = re.escape(start_token)
end_token_escaped = re.escape(end_token)
content = re.search(
f"{start_token_escaped}(.*?){end_token_escaped}", tokens, re.IGNORECASE | re.DOTALL
)
if content is not None:
content = content.group(1).strip()
if r"<s_" in content and r"</s_" in content: # non-leaf node
value = token2json(content, is_inner_value=True, added_vocab=added_vocab)
if value:
if len(value) == 1:
value = value[0]
output[key] = value
else: # leaf nodes
output[key] = []
for leaf in content.split(r"<sep/>"):
leaf = leaf.strip()
if leaf in added_vocab and leaf[0] == "<" and leaf[-2:] == "/>":
leaf = leaf[1:-2] # for categorical special tokens
output[key].append(leaf)
if len(output[key]) == 1:
output[key] = output[key][0]
tokens = tokens[tokens.find(end_token) + len(end_token) :].strip()
if tokens[:6] == r"<sep/>": # non-leaf nodes
return [output] + token2json(tokens[6:], is_inner_value=True, added_vocab=added_vocab)
if len(output):
return [output] if is_inner_value else output
else:
return [] if is_inner_value else {"text_sequence": tokens}
## generated
generated_json = token2json(generated_text)
print(generated_json)
Final Result in Json
[{'total': '',
'tips': '',
'time': '',
'telephone': '(718)308-1118',
'tax': '',
'subtotal': '',
'store_name': '',
'store_addr': 'Brooklyn,NY11211',
'item_value': '2.98',
'item_quantity': '1',
'item_name': 'NORI',
'item_key': ''},
{'item_value': '2.35',
'item_quantity': '1',
'item_name': 'TOMATOESPLUM',
'item_key': ''},
{'item_value': '0.97',
'item_quantity': '1',
'item_name': 'ONIONSVIDALIA',
'item_key': ''},
{'item_value': '2.48',
'item_quantity': '1',
'item_name': 'HAMBURRN',
'item_key': ''},
{'item_value': '0.99',
'item_quantity': '1',
'item_name': 'FTRAWBERRY',
'item_key': ''},
{'item_value': '0.99',
'item_quantity': '1',
'item_name': 'FTRAWBERRY',
'item_key': ''},
{'item_value': '0.57', 'item_quantity': '1'}]
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: mychen76@gmail.com
- Model type: Vision Model for Receipt Image Data Extraction
- Language(s) (NLP): [More Information Needed]
- License: [More Information Needed]
- Finetuned from model [optional]: PaliGemma-3b-pt-224
Model Sources [optional]
- Repository: [More Information Needed]
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Uses
Direct Use
[More Information Needed]
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
Training Details
Training Data
see here: mychen76/invoices-and-receipts_ocr_v1
[More Information Needed]
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: [More Information Needed]
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications [optional]
Model Architecture and Objective
[More Information Needed]
Compute Infrastructure
[More Information Needed]
Hardware
[More Information Needed]
Software
[More Information Needed]
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]