Ahmed Masry PRO

ahmed-masry

AI & ML interests

Multimodal Chart Understanding, Multimodal Document AI, Multimodal Vision - Language Models,

Recent Activity

Articles

Organizations

None yet

ahmed-masry's activity

New activity in ahmed-masry/unichart-qa-data 6 days ago

Dataset Source?

4
#3 opened 6 days ago by
veason
New activity in ahmed-masry/ColFlor 2 months ago

Update README.md

1
#1 opened 2 months ago by
omkar334
reacted to merve's post with πŸš€ 3 months ago
view post
Post
1976
It's raining depth estimation models β˜”οΈ
DepthPro is a zero-shot depth estimation model by Apple, it's fast, sharp and accurate πŸ”₯
Demo: akhaliq/depth-pro
Model: apple/DepthPro
Paper page: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second (2410.02073)

The model consists of two encoders: an encoder for patches and an image encoder πŸ–ΌοΈ The outputs of both are merged to decode to depth maps and get the focal length.
The model outperforms the previous state-of-the-art models in average of various benchmarks πŸ“‘
posted an update 3 months ago
view post
Post
1348
πŸš€ Introducing ColFlor: An Efficient, OCR-Free Vision-Language Document Retrieval Model 🌟

Earlier this year, ColPali revolutionized document retrieval by eliminating the need for error-prone OCR pipelines. Instead, it directly processes the document images. However, with its 3 billion parameters, ColPali is computationally heavy for large-scale applications.

That’s where ColFlor comes inβ€”a smaller, faster alternative! πŸŽ‰ At 17x smaller than ColPali, ColFlor offers a more efficient, OCR-free document retrieval solution, making it ideal for users with limited computing resources (GPU Poor). πŸ’‘

Key Highlights:
🧠 174M parameters (vs. 3B for ColPali)
⚑ 9.8x faster query encoding, 5.25x faster image encoding
πŸ“‰ Only 1.8% performance drop on text-rich English documents

Check out the full blog post for more insights on modeling, training, and evaluations across various document retrieval tasks! πŸš€
Also, feel free to try our demo on huggingface πŸ€—

πŸ”— Resources:
πŸ“„ Blog post: https://huggingface.co/blog/ahmed-masry/colflor
🧠 Model: ahmed-masry/ColFlor
πŸ’» Demo: ahmed-masry/ColFlor-Demo
πŸ‹οΈβ€β™‚οΈ Training code: https://github.com/AhmedMasryKU/colflor
πŸ“Š Evaluation code: https://github.com/AhmedMasryKU/vidore-benchmark-colflor
updated a Space 3 months ago
published an article 3 months ago
view article
Article

ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models

By ahmed-masry β€’
β€’ 16
New activity in ahmed-masry/ChartGemma 4 months ago
New activity in ahmed-masry/unichart-table-data 5 months ago

License

1
#2 opened 5 months ago by
OlenaVyn
liked a Space 5 months ago
New activity in ahmed-masry/chartgemma 6 months ago

Missing license

1
#1 opened 6 months ago by
maskedviper