Ahmed Masry PRO

ahmed-masry

AI & ML interests

Multimodal Chart Understanding, Multimodal Document AI, Multimodal Vision - Language Models,

Organizations

None yet

Posts 2

view post
Post
2824
πŸ“’ Exciting News! Our latest paper "ChartGemma" is out! πŸ“Š

🧡1/3: ChartGemma overcomes existing chart models key limitations that rely too much on data tables. Instead, it is trained on data generated directly from chart images, capturing crucial visual trendsπŸ“ΈπŸ”

🧡2/3: ChartGemma builds upon PaliGemma from Google Research and is fine-tuned on a high-quality visual instruction tuning dataset generated from Gemini Flash 1.5. πŸŒŸπŸ“Š

🧡3/3: Achieves state-of-the-art results in chart summarization, question answering, and fact-checking tasks. πŸ…πŸ“Š It can also generate more accurate and realistic chart summaries. πŸ“πŸ”

Our model and data are publicly available. We also have a cool web demo. Check it out! πŸš€
Demo: ahmed-masry/ChartGemma
Code: https://github.com/vis-nlp/ChartGemma
Paper: ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild (2407.04172)