hsarfraz's picture
Update README.md
d3e21d7 verified
metadata
license: mit
language:
  - en
base_model:
  - google/paligemma-3b-pt-896
pipeline_tag: image-to-text

Google/paligemma2-3b-pt-896 model fine-tuned for US IRS Form 1040 (2023) data parsing and extraction

The repository only provides Peft LORA weights. The lora layers have been fine-tuned to parse and extract data from IRS (US) tax form 1040 (year 2023) first page only. It performs OCR and returns extracted data in JSON format using zero shot prompt.



from PIL import Image
import torch
import json

from transformers import PaliGemmaForConditionalGeneration, AutoProcessor
from peft import PeftModel


model_id = 'google/paligemma-3b-pt-896'
peft_model_id = 'hsarfraz/google-paligemma-irs-form-1040-2023-parser-pg1'

device = "cuda:0" if torch.cuda.is_available() else "cpu"

# load base model 
processor = AutoProcessor.from_pretrained(model_id,padding_side = "right",add_eos_token = True)
model = PaliGemmaForConditionalGeneration.from_pretrained(model_id, device_map={"":0}, torch_dtype=torch.bfloat16)

# load fine-tuned peft weights
fine_tuned_model = PeftModel.from_pretrained(model, peft_model_id)
fine_tuned_model.to(device)

# prompt for OCR
prompt = "<image>extract data in JSON format"

# path to local image file
image_file = '<replace with path to input image>'
image = Image.open(image_file)

# get tokens
inputs = processor(images=image, text=prompt, return_tensors="pt").to(device)
prefix_length = inputs["input_ids"].shape[-1] 

#switch to inference mode
with torch.inference_mode():       
    generation = fine_tuned_model.generate(**inputs, max_new_tokens=1152)
    generation = generation[0][prefix_length:]
    decoded = processor.decode(generation, skip_special_tokens=True)
    
    # parse output as json 
    try:
        output_json =json.dumps(json.loads(decoded), indent=4) 
    except (Exception) as error:
        print('Error: %s' % error)
        output_json = decoded 
    
    # display parsed json
    print(output_json)  


Fake Synthetic Data for IRS 1040 2023 Form Page 1

fake form

Parsed output in json

{
    "lbl_0_03": "Andrew Huffman",
    "lbl_0_04": "Phillips",
    "lbl_0_05": "247-27-3525",
    "lbl_0_06": "Martin",
    "lbl_0_08": "797-83-3491",
    "lbl_0_09": "PSC 8861, Box 7908 APO AE 15945",
    "lbl_0_11": "Andrewhaven",
    "lbl_0_12": "IA",
    "lbl_0_13": "16560",
    "lbl_0_55": "504583.65",
    "lbl_0_66": "473782.31",
    "lbl_0_67": "626674.66",
    "lbl_0_79": "559436.54"
}