It's raining depth estimation models βοΈ DepthPro is a zero-shot depth estimation model by Apple, it's fast, sharp and accurate π₯ Demo: akhaliq/depth-pro Model: apple/DepthPro Paper page: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second (2410.02073) The model consists of two encoders: an encoder for patches and an image encoder πΌοΈ The outputs of both are merged to decode to depth maps and get the focal length. The model outperforms the previous state-of-the-art models in average of various benchmarks π
π Introducing ColFlor: An Efficient, OCR-Free Vision-Language Document Retrieval Model π
Earlier this year, ColPali revolutionized document retrieval by eliminating the need for error-prone OCR pipelines. Instead, it directly processes the document images. However, with its 3 billion parameters, ColPali is computationally heavy for large-scale applications.
Thatβs where ColFlor comes inβa smaller, faster alternative! π At 17x smaller than ColPali, ColFlor offers a more efficient, OCR-free document retrieval solution, making it ideal for users with limited computing resources (GPU Poor). π‘ Key Highlights: π§ 174M parameters (vs. 3B for ColPali) β‘ 9.8x faster query encoding, 5.25x faster image encoding π Only 1.8% performance drop on text-rich English documents
Check out the full blog post for more insights on modeling, training, and evaluations across various document retrieval tasks! π Also, feel free to try our demo on huggingface π€
π’ Exciting News! Our latest paper "ChartGemma" is out! π
π§΅1/3: ChartGemma overcomes existing chart models key limitations that rely too much on data tables. Instead, it is trained on data generated directly from chart images, capturing crucial visual trendsπΈπ
π§΅2/3: ChartGemma builds upon PaliGemma from Google Research and is fine-tuned on a high-quality visual instruction tuning dataset generated from Gemini Flash 1.5. ππ
π§΅3/3: Achieves state-of-the-art results in chart summarization, question answering, and fact-checking tasks. π π It can also generate more accurate and realistic chart summaries. ππ
I'm excited to share a new demo for my ChartInstruct model from our ACL 2024 paper. It excels at various chart understanding tasks like QA, captioning, open-ended QA, fact checking and more! Thanks to Hugging Face's ZeroGPU program, the demo runs smoothly even with the model's 7B parameters!