s k

madstuntman11

AI & ML interests

None yet

Recent Activity

Reacted to merve's post with ❤️ about 1 month ago

If you have documents that do not only have text and you're doing retrieval or RAG (using OCR and LLMs), give it up and give ColPali and vision language models a try 🤗 Why? Documents consist of multiple modalities: layout, table, text, chart, images. Document processing pipelines often consist of multiple models and they're immensely brittle and slow. 🥲 How? ColPali is a ColBERT-like document retrieval model built on PaliGemma, it operates over image patches directly, and indexing takes far less time with more accuracy. You can use it for retrieval, and if you want to do retrieval augmented generation, find the closest document, and do not process it, give it directly to a VLM like Qwen2-VL (as image input) and give your text query. 🤝 This is much faster + you do not lose out on any information + much easier to maintain too! 🥳 Multimodal RAG https://huggingface.co/collections/merve/multimodal-rag-66d97602e781122aae0a5139 💬 Document AI (made it way before, for folks who want structured input/output and can fine-tune a model) https://huggingface.co/collections/merve/awesome-document-ai-65ef1cdc2e97ef9cc85c898e 📖

Reacted to merve's post with ❤️ about 1 month ago

I have put together a notebook on Multimodal RAG, where we do not process the documents with hefty pipelines but natively use: - https://huggingface.co/vidore/colpali for retrieval 📖 it doesn't need indexing with image-text pairs but just images! - https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct for generation 💬 directly feed images as is to a vision language model with no processing to text! I used ColPali implementation of the new 🐭 Byaldi library by @bclavie 🤗 https://github.com/answerdotai/byaldi Link to notebook: https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb

upvoted an article about 1 month ago

Document Similarity Search with ColPali

View all activity

Organizations

None yet

madstuntman11's activity

Reacted to merve's post with ❤️ about 1 month ago

Post

3810

If you have documents that do not only have text and you're doing retrieval or RAG (using OCR and LLMs), give it up and give ColPali and vision language models a try 🤗

Why? Documents consist of multiple modalities: layout, table, text, chart, images. Document processing pipelines often consist of multiple models and they're immensely brittle and slow. 🥲

How? ColPali is a ColBERT-like document retrieval model built on PaliGemma, it operates over image patches directly, and indexing takes far less time with more accuracy. You can use it for retrieval, and if you want to do retrieval augmented generation, find the closest document, and do not process it, give it directly to a VLM like Qwen2-VL (as image input) and give your text query. 🤝

This is much faster + you do not lose out on any information + much easier to maintain too! 🥳

Multimodal RAG merve/multimodal-rag-66d97602e781122aae0a5139 💬
Document AI (made it way before, for folks who want structured input/output and can fine-tune a model) merve/awesome-document-ai-65ef1cdc2e97ef9cc85c898e 📖

2 replies

Reacted to merve's post with ❤️ about 1 month ago

Post

5514

I have put together a notebook on Multimodal RAG, where we do not process the documents with hefty pipelines but natively use:
- vidore/colpali for retrieval 📖 it doesn't need indexing with image-text pairs but just images!
- Qwen/Qwen2-VL-2B-Instruct for generation 💬 directly feed images as is to a vision language model with no processing to text!
I used ColPali implementation of the new 🐭 Byaldi library by @bclavie 🤗
https://github.com/answerdotai/byaldi
Link to notebook: https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb

upvoted an article about 1 month ago

Article

Document Similarity Search with ColPali

•

Sep 21

• 47

liked a Space about 2 months ago

Running

🔅

Diffusers Image Outpaint

upvoted an article 2 months ago

Article

🤗 PEFT welcomes new merging methods

Feb 19

• 13

liked a model 4 months ago

manu/colpali-3b-mix-448-docmatix-only-mined-ib

Updated Jul 31 • 3 • 2

liked a Space 4 months ago

Running

🥇

Vidore Leaderboard

liked a model 4 months ago

nvidia/MambaVision-T-1K

Image Feature Extraction • Updated Jul 25 • 5.37k • 26

New activity in Tevatron/dse-phi3-docmatix-v1 4 months ago

Rename Tevatron-DSE-Phi3-Docmatix-V1(ZeroShot)_metrics.json to results.json

#1 opened 4 months ago by

manu

liked a Space 4 months ago

Running on Zero

📉

Florence 2

liked a dataset 4 months ago

Tevatron/docmatix-ir

Viewer • Updated Aug 12 • 5.61M • 5.08k • 12

upvoted an article 4 months ago

Article

ColPali: Efficient Document Retrieval with Vision Language Models 👀

•

Jul 5

• 164

liked 2 models 4 months ago

Tevatron/dse-phi3-docmatix-v1

Updated Aug 12 • 55 • 9

manu/colpali-3b-mix-448-docmatix

Updated Jul 23 • 7

upvoted a paper 4 months ago

ColPali: Efficient Document Retrieval with Vision Language Models

Paper • 2407.01449 • Published Jun 27 • 41

liked a Space 5 months ago

Running

3.72k

🏆🤖

Chatbot Arena Leaderboard

Reacted to smangrul's post with ❤️ 7 months ago

Post

🚀 Exciting news from 🤗 PEFT!

We are introducing new merging methods for LoRA adapters. These methods allow for retaining the unique capabilities of individual LoRAs while enabling them to combine their strengths: https://huggingface.co/blog/peft_merging

We explored the application of merging LoRA adapters in the context of personal code copilot before 🚀👾✨. Please go through the below thread on it: https://x.com/sourab_m/status/1718008115726283004?s=20

New merging methods ties, dare, and magnitude_prune introduced alongside existing methods cat, linear, and svd. Blogpost details each method. These methods can be applied on-the-fly during inference time instead of merging offline enabling great developer UX. ✨

How do I merge my LoRA adapters?
Easy, use class method add_weighted_adapter(). For example, below you can see how we can combine three LoRA adapters using ties method. We can observe that merged adapter can retain the capabilities of individual adapters!

Now that we have seen they can retain individual LoRAs, how about use cases wherein we require the capabilities from multiple LoRAs being merged/combined? Below is an application of it in text-to-image domain. 🖼️

Kudos to @prateeky2806 (TIES author) and Le Yu (DARE author) for their kind and generous guidance on the PRs! Also, if you want to explore full model merging, refer to super cool projects like https://github.com/arcee-ai/mergekit/tree/main, https://github.com/Gryphe/BlockMerge_Gradient and https://github.com/yule-BUAA/MergeLM/tree/main.

Excited to see what the community creates on top of this! 🚀✨ #LetsBuildTogether

updated 3 models 9 months ago