Shopping Receipts

by douglasg14b - opened Feb 25

Discussion

douglasg14b

Feb 25

This seems to do incredibly poorly on shopping receipts from grocery stores, big box stores, hardware stores...etc

Is this to be expected? Is there a good dataset of receipts that are focused on these sorts of businesses?

Just mulling

AdamCodd

Owner Feb 25

Yes, the result is not satisfactory due to the fact that there are many errors on the original dataset (which was produced with OCR without verification). I am gradually correcting the errors manually, but it takes a long time to correct ~1500 receipts. On HuggingFace, aside from the original Cordv2 dataset and the one I use, there is not much else, at least not for free.

douglasg14b

Feb 25

I wonder how effective something like https://labelstud.io might be in assisting with re-tagging. Combine that with mechanical turk 🤔

Kind of related, any ideas on how to get a hold of more receipts (Classified or not)? There's definitely an under-representation of non-restaurants in the dataset, and some businesses like Safeway for example have additional receipt syntax that throws off document extraction software, more samples more better.

AdamCodd

Owner Feb 25

That could be a good idea, there are a few vision models that could help with a proper prompt (I also tried GPT4-V), but ultimately you'll need to correct some mistakes here and there so I don't think this process can be fully automated. To get more receipts I can think of Roboflow and also Kaggle. I didn't go for it because they're not tagged at all (or badly), compared to the ones on HF.

hauthorn

Feb 26

I can recommend the SROIE dataset, although it's not tagged as well as the CORD. I guess it also depends on what data you need from the receipts.

We get around 85% accuracy (text similarity) when trying out GPT4 Vision on our dataset at work, so I wouldn't suggest simply using that data as ground truth.

But then again - I still find mistakes in our test data every other time I sample it, so who knows 🤷

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment