
Prasanna Iyer
AI & ML interests
Recent Activity
Organizations
prasiyer's activity
Open source olmOCR just dropped and the results are impressive.
Tested the free demo with various documents, including a handwritten Claes Oldenburg letter. The speed is impressive: 3000 tokens/second on your own GPU - that's 1/32 the cost of GPT-4o ($190/million pages). Game-changer for content extraction and digital archives.
To achieve this, Ai2 trained a 7B vision language model on 260K pages from 100K PDFs using "document anchoring" - combining PDF metadata with page images.
Best part: it actually understands document structure (columns, tables, equations) instead of just jumbling everything together like most OCR tools. Their human eval results back this up.
๐ Try the demo: https://olmocr.allenai.org
Going right into the AI toolkit: JournalistsonHF/ai-toolkit
I have attached the image and the prompt. This is the response from the chatbot -
The image depicts a table comparing various companies and their carbon footprints. The table lists the top 10 largest greenhouse gas emitters in the world, with their respective carbon footprints expressed in millions of metric tons of carbon dioxide equivalent (MtCO2e). The first column features the company name, while the second column displays the year in which the carbon footprint was measured. The third column indicates the sector in which the company operates, and the fourth column provides the company's carbon footprint. The last column shows the company's market capitalization, which is the total value of all outstanding shares of a company's stock. The table also includes a row labeled "Total" that shows the total carbon footprint of all the companies listed in the table.
Thanks for the post and your efforts to share the knowledge.
https://huggingface.co/spaces/merve/chameleon-7b -- Space does not seem to work. When I ask for a summary of an image, the result is a summary of some random table and not of the one I uploaded. Please check when you can