3 12 11

IndustryAI

AI4Industry

AI & ML interests

None yet

Recent Activity

new activity 6 days ago

ds4sd/SubGrapher:Similar findings in MolParser

liked a dataset 7 days ago

allenai/olmOCR-mix-0225

updated a dataset about 1 month ago

AI4Industry/MolParser-7M

View all activity

Organizations

None yet

AI4Industry's activity

New activity in ds4sd/SubGrapher 6 days ago

Similar findings in MolParser

#1 opened 6 days ago by

AI4Industry

liked a dataset 7 days ago

allenai/olmOCR-mix-0225

Viewer • Updated Feb 25 • 259k • 2.81k • 116

updated a dataset about 1 month ago

AI4Industry/MolParser-7M

Viewer • Updated Mar 9 • 7.83M • 698 • 10

upvoted an article about 2 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

Feb 20

• 232

liked a model about 2 months ago

ibm-granite/granite-vision-3.1-2b-preview

Image-Text-to-Text • Updated Feb 26 • 4.18k • 95

upvoted an article 2 months ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.22k

upvoted an article 3 months ago

Article

Timm ❤️ Transformers: Use any timm model with transformers

Jan 16

• 46

commented a paper 4 months ago

MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild

Paper • 2411.11098 • Published Nov 17, 2024 • 1 •

upvoted a paper 4 months ago

MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild

Paper • 2411.11098 • Published Nov 17, 2024 • 1

liked a dataset 5 months ago

AI4Industry/MolParser-7M

Viewer • Updated Mar 9 • 7.83M • 698 • 10

upvoted a collection 7 months ago

Qwen2-VL

Collection

Vision-language model series based on Qwen2 • 16 items • Updated Dec 6, 2024 • 211

reacted to rwightman's post with ❤️ 7 months ago

Post

2574

A 'small' MobileNet-V4 update, I just pushed weights for the smallest model I've trained in the series, a 0.5 width multiplier version of the MobileNet-V4 Conv Small.

Now you may look at this and say hey, why is this impressive? 64.8% top-1 and 2.2M params? MobileNetV3-Small 0.75, and MobileNet-V2 0.5 are both fewer params (at ~2M) and over 65% top-1, what gives? Well this is where MobileNet-V4 differs from the previous versions of the model family, it trades off (gives up) a little parameter efficiency for some computational efficiency.

So, let's look at the speed. On a 4090 w/ torchcompile
* 98K img/sec - timm/mobilenetv4_conv_small_050.e3000_r224_in1k
* 58K img/sec - timm/mobilenetv3_small_075.lamb_in1k
* 37K img/sec - timm/mobilenetv2_050.lamb_in1k

And there you go, if you have a need for speed, MNV4 is the better option.

upvoted a collection 7 months ago

MobileNetV4 pretrained weights

Collection

Weights for MobileNet-V4 pretrained in timm • 17 items • Updated Sep 22, 2024 • 18

reacted to merve's post with 👍 7 months ago

Post

3879

If you have documents that do not only have text and you're doing retrieval or RAG (using OCR and LLMs), give it up and give ColPali and vision language models a try 🤗

Why? Documents consist of multiple modalities: layout, table, text, chart, images. Document processing pipelines often consist of multiple models and they're immensely brittle and slow. 🥲

How? ColPali is a ColBERT-like document retrieval model built on PaliGemma, it operates over image patches directly, and indexing takes far less time with more accuracy. You can use it for retrieval, and if you want to do retrieval augmented generation, find the closest document, and do not process it, give it directly to a VLM like Qwen2-VL (as image input) and give your text query. 🤝

This is much faster + you do not lose out on any information + much easier to maintain too! 🥳

Multimodal RAG merve/multimodal-rag-66d97602e781122aae0a5139 💬
Document AI (made it way before, for folks who want structured input/output and can fine-tune a model) merve/awesome-document-ai-65ef1cdc2e97ef9cc85c898e 📖