File size: 8,301 Bytes
0222496 40db127 0b73710 360bc4e 7ffe766 0b73710 7ffe766 360bc4e 7ffe766 360bc4e 40db127 acabcec aa3f36b acabcec 2259d9e a62fae4 2259d9e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
---
thumbnail: "url to a thumbnail used in social sharing"
tags:
- tag1
- tag2
license: apache-2.0
datasets:
- dataset1
- dataset2
metrics:
- metric1
- metric2
---
Model Architecture
The mychen76/mistral7b_ocr_to_json_v1 (LLM) is a finetuned for convert OCR text to Json object task. this experimental model is based on Mistral-7B-v0.1 which outperforms Llama 2 13B on all benchmarks tested.
Motivation
current OCR engines are well tested on image detection and text recognition. LLM model are well train for text process and generation.
Hence, leveraging output from OCR engine could save LLM training time for image-to-text use case such as invoice or receipt image to json object convertion task.
Model Usage:
Take a random receipt image picture, perform Image OCR to get text boxes then feed into LLM model to generate as well-pformed receipt json object.
```
### Instruction:
You are POS receipt data expert, parse, detect, recognize and convert following receipt OCR image result into structure receipt data object.
Don't make up value not in the Input. Output must be a well-formed JSON object.```json
### Input:
[[[[184.0, 42.0], [278.0, 45.0], [278.0, 62.0], [183.0, 59.0]], ('BAJA FRESH', 0.9551795721054077)], [[[242.0, 113.0], [379.0, 118.0], [378.0, 136.0], [242.0, 131.0]], ('GENERAL MANAGER:', 0.9462024569511414)], [[[240.0, 133.0], [300.0, 135.0], [300.0, 153.0], [240.0, 151.0]], ('NORMAN', 0.9913229942321777)], [[[143.0, 166.0], [234.0, 171.0], [233.0, 192.0], [142.0, 187.0]], ('176 Rosa C', 0.9229503870010376)], [[[130.0, 207.0], [206.0, 210.0], [205.0, 231.0], [129.0, 228.0]], ('Chk 7545', 0.9349349141120911)], [[[283.0, 215.0], [431.0, 221.0], [431.0, 239.0], [282.0, 233.0]], ("Dec26'0707:26PM", 0.9290117025375366)], [[[440.0, 221.0], [489.0, 221.0], [489.0, 239.0], [440.0, 239.0]], ('Gst0', 0.9164432883262634)], [[[164.0, 252.0], [308.0, 256.0], [308.0, 276.0], [164.0, 272.0]], ('TAKE OUT', 0.9367803335189819)], [[[145.0, 274.0], [256.0, 278.0], [255.0, 296.0], [144.0, 292.0]], ('1 BAJA STEAK', 0.9167789816856384)], [[[423.0, 282.0], [465.0, 282.0], [465.0, 304.0], [423.0, 304.0]], ('6.95', 0.9965073466300964)], [[[180.0, 296.0], [292.0, 299.0], [292.0, 319.0], [179.0, 316.0]], ('NO GUACAMOLE', 0.9631438255310059)], [[[179.0, 317.0], [319.0, 322.0], [318.0, 343.0], [178.0, 338.0]], ('ENCHILADO STYLE', 0.9704310894012451)], [[[423.0, 325.0], [467.0, 325.0], [467.0, 347.0], [423.0, 347.0]], ('1.49', 0.988395631313324)], [[[159.0, 339.0], [201.0, 341.0], [200.0, 360.0], [158.0, 358.0]], ('CASH', 0.9982023239135742)], [[[417.0, 348.0], [466.0, 348.0], [466.0, 367.0], [417.0, 367.0]], ('20.00', 0.9921982884407043)], [[[156.0, 380.0], [200.0, 382.0], [198.0, 404.0], [155.0, 402.0]], ('FOOD', 0.9906187057495117)], [[[426.0, 390.0], [468.0, 390.0], [468.0, 409.0], [426.0, 409.0]], ('8.44', 0.9963030219078064)], [[[154.0, 402.0], [190.0, 405.0], [188.0, 427.0], [152.0, 424.0]], ('TAX', 0.9963871836662292)], [[[427.0, 413.0], [468.0, 413.0], [468.0, 432.0], [427.0, 432.0]], ('0.61', 0.9934712648391724)], [[[153.0, 427.0], [224.0, 429.0], [224.0, 450.0], [153.0, 448.0]], ('PAYMENT', 0.9948703646659851)], [[[428.0, 436.0], [470.0, 436.0], [470.0, 455.0], [428.0, 455.0]], ('9.05', 0.9961490631103516)], [[[152.0, 450.0], [251.0, 453.0], [250.0, 475.0], [152.0, 472.0]], ('Change Due', 0.9556287527084351)], [[[420.0, 458.0], [471.0, 458.0], [471.0, 480.0], [420.0, 480.0]], ('10.95', 0.997236430644989)], [[[209.0, 498.0], [382.0, 503.0], [381.0, 524.0], [208.0, 519.0]], ('$2.000FF', 0.9757758378982544)], [[[169.0, 522.0], [422.0, 528.0], [421.0, 548.0], [169.0, 542.0]], ('NEXT PURCHASE', 0.962527871131897)], [[[167.0, 546.0], [365.0, 552.0], [365.0, 570.0], [167.0, 564.0]], ('CALL800 705 5754or', 0.926964521408081)], [[[146.0, 570.0], [416.0, 577.0], [415.0, 597.0], [146.0, 590.0]], ('Go www.mshare.net/bajafresh', 0.9759786128997803)], [[[147.0, 594.0], [356.0, 601.0], [356.0, 621.0], [146.0, 614.0]], ('Take our brief survey', 0.9390400648117065)], [[[143.0, 620.0], [410.0, 626.0], [409.0, 647.0], [143.0, 641.0]], ('When Prompted, Enter Store', 0.9385656118392944)], [[[142.0, 646.0], [408.0, 653.0], [407.0, 673.0], [142.0, 666.0]], ('Write down redemption code', 0.9536812901496887)], [[[141.0, 672.0], [409.0, 679.0], [408.0, 699.0], [141.0, 692.0]], ('Use this receipt as coupon', 0.9658807516098022)], [[[138.0, 697.0], [448.0, 701.0], [448.0, 725.0], [138.0, 721.0]], ('Discount on purchases of $5.00', 0.9624248743057251)], [[[139.0, 726.0], [466.0, 729.0], [466.0, 750.0], [139.0, 747.0]], ('or more,Offer expires in 30 day', 0.9263916611671448)], [[[137.0, 750.0], [459.0, 755.0], [459.0, 778.0], [137.0, 773.0]], ('Good at participating locations', 0.963909924030304)]]
### Output:
```
```json
{
"receipt": {
"store": "BAJA FRESH",
"manager": "GENERAL MANAGER: NORMAN",
"address": "176 Rosa C",
"check": "Chk 7545",
"date": "Dec26'0707:26PM",
"tax": "Gst0",
"total": "20.00",
"payment": "CASH",
"change": "0.61",
"discount": "Discount on purchases of $5.00 or more,Offer expires in 30 day",
"coupon": "Use this receipt as coupon",
"survey": "Take our brief survey",
"redemption": "Write down redemption code",
"prompt": "When Prompted, Enter Store Write down redemption code Use this receipt as coupon",
"items": [
{
"name": "1 BAJA STEAK",
"price": "6.95",
"modifiers": [
"NO GUACAMOLE",
"ENCHILADO STYLE"
]
},
{
"name": "TAKE OUT",
"price": "1.49"
}
]
}
}
```
# Load model directly
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("mychen76/mistral7b_ocr_to_json_v1")
model = AutoModelForCausalLM.from_pretrained("mychen76/mistral7b_ocr_to_json_v1")
prompt=f"""### Instruction:
You are POS receipt data expert, parse, detect, recognize and convert following receipt OCR image result into structure receipt data object.
Don't make up value not in the Input. Output must be a well-formed JSON object.```json
### Input:
{receipt_boxes}
### Output:
"""
with torch.inference_mode():
inputs = tokenizer(prompt,return_tensors="pt",truncation=True).to(device)
outputs = model.generate(**inputs, max_new_tokens=512)
result_text = tokenizer.batch_decode(outputs)[0]
print(result_text)
```
## Get OCR Image boxes
```python
from paddleocr import PaddleOCR, draw_ocr
from ast import literal_eval
import json
paddleocr = PaddleOCR(lang="en",ocr_version="PP-OCRv4",show_log = False,use_gpu=True)
def paddle_scan(paddleocr,img_path_or_nparray):
result = paddleocr.ocr(img_path_or_nparray,cls=True)
result = result[0]
boxes = [line[0] for line in result] #boundign box
txts = [line[1][0] for line in result] #raw text
scores = [line[1][1] for line in result] # scores
return txts, result
# perform ocr scan
receipt_texts, receipt_boxes = paddle_scan(paddleocr,receipt_image_array)
print(50*"--","\ntext only:\n",receipt_texts)
print(50*"--","\nocr boxes:\n",receipt_boxes)
```
# Load model in 4bits
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, BitsAndBytesConfig
# quantization_config = BitsAndBytesConfig(llm_int8_enable_fp32_cpu_offload=True)
bnb_config = BitsAndBytesConfig(
llm_int8_enable_fp32_cpu_offload=True,
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
# control model memory allocation between devices for low GPU resource (0,cpu)
device_map = {
"transformer.word_embeddings": 0,
"transformer.word_embeddings_layernorm": 0,
"lm_head": 0,
"transformer.h": 0,
"transformer.ln_f": 0,
"model.embed_tokens": 0,
"model.layers":0,
"model.norm":0
}
device = "cuda" if torch.cuda.is_available() else "cpu"
# model use for inference
model_id="mychen76/mistral7b_ocr_to_json_v1"
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.float16,
quantization_config=bnb_config,
device_map=device_map)
# tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
```
|