Update README.md

0b73710 about 1 year ago

8.3 kB

	---
	thumbnail: "url to a thumbnail used in social sharing"
	tags:
	- tag1
	- tag2
	license: apache-2.0
	datasets:
	- dataset1
	- dataset2
	metrics:
	- metric1
	- metric2
	---

	Model Architecture
	The mychen76/mistral7b_ocr_to_json_v1 (LLM) is a finetuned for convert OCR text to Json object task. this experimental model is based on Mistral-7B-v0.1 which outperforms Llama 2 13B on all benchmarks tested.

	Motivation
	current OCR engines are well tested on image detection and text recognition. LLM model are well train for text process and generation.
	Hence, leveraging output from OCR engine could save LLM training time for image-to-text use case such as invoice or receipt image to json object convertion task.

	Model Usage:
	Take a random receipt image picture, perform Image OCR to get text boxes then feed into LLM model to generate as well-pformed receipt json object.

	```
	### Instruction:
	You are POS receipt data expert, parse, detect, recognize and convert following receipt OCR image result into structure receipt data object.
	Don't make up value not in the Input. Output must be a well-formed JSON object.```json

	### Input:
	[[[[184.0, 42.0], [278.0, 45.0], [278.0, 62.0], [183.0, 59.0]], ('BAJA FRESH', 0.9551795721054077)], [[[242.0, 113.0], [379.0, 118.0], [378.0, 136.0], [242.0, 131.0]], ('GENERAL MANAGER:', 0.9462024569511414)], [[[240.0, 133.0], [300.0, 135.0], [300.0, 153.0], [240.0, 151.0]], ('NORMAN', 0.9913229942321777)], [[[143.0, 166.0], [234.0, 171.0], [233.0, 192.0], [142.0, 187.0]], ('176 Rosa C', 0.9229503870010376)], [[[130.0, 207.0], [206.0, 210.0], [205.0, 231.0], [129.0, 228.0]], ('Chk 7545', 0.9349349141120911)], [[[283.0, 215.0], [431.0, 221.0], [431.0, 239.0], [282.0, 233.0]], ("Dec26'0707:26PM", 0.9290117025375366)], [[[440.0, 221.0], [489.0, 221.0], [489.0, 239.0], [440.0, 239.0]], ('Gst0', 0.9164432883262634)], [[[164.0, 252.0], [308.0, 256.0], [308.0, 276.0], [164.0, 272.0]], ('TAKE OUT', 0.9367803335189819)], [[[145.0, 274.0], [256.0, 278.0], [255.0, 296.0], [144.0, 292.0]], ('1 BAJA STEAK', 0.9167789816856384)], [[[423.0, 282.0], [465.0, 282.0], [465.0, 304.0], [423.0, 304.0]], ('6.95', 0.9965073466300964)], [[[180.0, 296.0], [292.0, 299.0], [292.0, 319.0], [179.0, 316.0]], ('NO GUACAMOLE', 0.9631438255310059)], [[[179.0, 317.0], [319.0, 322.0], [318.0, 343.0], [178.0, 338.0]], ('ENCHILADO STYLE', 0.9704310894012451)], [[[423.0, 325.0], [467.0, 325.0], [467.0, 347.0], [423.0, 347.0]], ('1.49', 0.988395631313324)], [[[159.0, 339.0], [201.0, 341.0], [200.0, 360.0], [158.0, 358.0]], ('CASH', 0.9982023239135742)], [[[417.0, 348.0], [466.0, 348.0], [466.0, 367.0], [417.0, 367.0]], ('20.00', 0.9921982884407043)], [[[156.0, 380.0], [200.0, 382.0], [198.0, 404.0], [155.0, 402.0]], ('FOOD', 0.9906187057495117)], [[[426.0, 390.0], [468.0, 390.0], [468.0, 409.0], [426.0, 409.0]], ('8.44', 0.9963030219078064)], [[[154.0, 402.0], [190.0, 405.0], [188.0, 427.0], [152.0, 424.0]], ('TAX', 0.9963871836662292)], [[[427.0, 413.0], [468.0, 413.0], [468.0, 432.0], [427.0, 432.0]], ('0.61', 0.9934712648391724)], [[[153.0, 427.0], [224.0, 429.0], [224.0, 450.0], [153.0, 448.0]], ('PAYMENT', 0.9948703646659851)], [[[428.0, 436.0], [470.0, 436.0], [470.0, 455.0], [428.0, 455.0]], ('9.05', 0.9961490631103516)], [[[152.0, 450.0], [251.0, 453.0], [250.0, 475.0], [152.0, 472.0]], ('Change Due', 0.9556287527084351)], [[[420.0, 458.0], [471.0, 458.0], [471.0, 480.0], [420.0, 480.0]], ('10.95', 0.997236430644989)], [[[209.0, 498.0], [382.0, 503.0], [381.0, 524.0], [208.0, 519.0]], ('$2.000FF', 0.9757758378982544)], [[[169.0, 522.0], [422.0, 528.0], [421.0, 548.0], [169.0, 542.0]], ('NEXT PURCHASE', 0.962527871131897)], [[[167.0, 546.0], [365.0, 552.0], [365.0, 570.0], [167.0, 564.0]], ('CALL800 705 5754or', 0.926964521408081)], [[[146.0, 570.0], [416.0, 577.0], [415.0, 597.0], [146.0, 590.0]], ('Go www.mshare.net/bajafresh', 0.9759786128997803)], [[[147.0, 594.0], [356.0, 601.0], [356.0, 621.0], [146.0, 614.0]], ('Take our brief survey', 0.9390400648117065)], [[[143.0, 620.0], [410.0, 626.0], [409.0, 647.0], [143.0, 641.0]], ('When Prompted, Enter Store', 0.9385656118392944)], [[[142.0, 646.0], [408.0, 653.0], [407.0, 673.0], [142.0, 666.0]], ('Write down redemption code', 0.9536812901496887)], [[[141.0, 672.0], [409.0, 679.0], [408.0, 699.0], [141.0, 692.0]], ('Use this receipt as coupon', 0.9658807516098022)], [[[138.0, 697.0], [448.0, 701.0], [448.0, 725.0], [138.0, 721.0]], ('Discount on purchases of $5.00', 0.9624248743057251)], [[[139.0, 726.0], [466.0, 729.0], [466.0, 750.0], [139.0, 747.0]], ('or more,Offer expires in 30 day', 0.9263916611671448)], [[[137.0, 750.0], [459.0, 755.0], [459.0, 778.0], [137.0, 773.0]], ('Good at participating locations', 0.963909924030304)]]

	### Output:
	```
	```json
	{
	"receipt": {
	"store": "BAJA FRESH",
	"manager": "GENERAL MANAGER: NORMAN",
	"address": "176 Rosa C",
	"check": "Chk 7545",
	"date": "Dec26'0707:26PM",
	"tax": "Gst0",
	"total": "20.00",
	"payment": "CASH",
	"change": "0.61",
	"discount": "Discount on purchases of $5.00 or more,Offer expires in 30 day",
	"coupon": "Use this receipt as coupon",
	"survey": "Take our brief survey",
	"redemption": "Write down redemption code",
	"prompt": "When Prompted, Enter Store Write down redemption code Use this receipt as coupon",
	"items": [
	{
	"name": "1 BAJA STEAK",
	"price": "6.95",
	"modifiers": [
	"NO GUACAMOLE",
	"ENCHILADO STYLE"
	]
	},
	{
	"name": "TAKE OUT",
	"price": "1.49"
	}
	]
	}
	}
	```

	# Load model directly
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("mychen76/mistral7b_ocr_to_json_v1")
	model = AutoModelForCausalLM.from_pretrained("mychen76/mistral7b_ocr_to_json_v1")

	prompt=f"""### Instruction:
	You are POS receipt data expert, parse, detect, recognize and convert following receipt OCR image result into structure receipt data object.
	Don't make up value not in the Input. Output must be a well-formed JSON object.```json

	### Input:
	{receipt_boxes}

	### Output:
	"""
	with torch.inference_mode():
	inputs = tokenizer(prompt,return_tensors="pt",truncation=True).to(device)
	outputs = model.generate(**inputs, max_new_tokens=512)
	result_text = tokenizer.batch_decode(outputs)[0]
	print(result_text)
	```

	## Get OCR Image boxes
	```python
	from paddleocr import PaddleOCR, draw_ocr
	from ast import literal_eval
	import json

	paddleocr = PaddleOCR(lang="en",ocr_version="PP-OCRv4",show_log = False,use_gpu=True)

	def paddle_scan(paddleocr,img_path_or_nparray):
	result = paddleocr.ocr(img_path_or_nparray,cls=True)
	result = result[0]
	boxes = [line[0] for line in result] #boundign box
	txts = [line[1][0] for line in result] #raw text
	scores = [line[1][1] for line in result] # scores
	return txts, result

	# perform ocr scan
	receipt_texts, receipt_boxes = paddle_scan(paddleocr,receipt_image_array)
	print(50*"--","\ntext only:\n",receipt_texts)
	print(50*"--","\nocr boxes:\n",receipt_boxes)

	```

	# Load model in 4bits
	```python

	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, BitsAndBytesConfig

	# quantization_config = BitsAndBytesConfig(llm_int8_enable_fp32_cpu_offload=True)
	bnb_config = BitsAndBytesConfig(
	llm_int8_enable_fp32_cpu_offload=True,
	load_in_4bit=True,
	bnb_4bit_use_double_quant=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16,
	)
	# control model memory allocation between devices for low GPU resource (0,cpu)
	device_map = {
	"transformer.word_embeddings": 0,
	"transformer.word_embeddings_layernorm": 0,
	"lm_head": 0,
	"transformer.h": 0,
	"transformer.ln_f": 0,
	"model.embed_tokens": 0,
	"model.layers":0,
	"model.norm":0
	}
	device = "cuda" if torch.cuda.is_available() else "cpu"

	# model use for inference
	model_id="mychen76/mistral7b_ocr_to_json_v1"
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	trust_remote_code=True,
	torch_dtype=torch.float16,
	quantization_config=bnb_config,
	device_map=device_map)
	# tokenizer
	tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
	```