7 66 28

Frank Sommers PRO

fsommers

fsommers

AI & ML interests

None yet

Recent Activity

upvoted a paper about 8 hours ago

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

upvoted a paper 2 days ago

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

liked a model 2 days ago

Qwen/Qwen2.5-VL-32B-Instruct

View all activity

Organizations

fsommers's activity

upvoted a paper about 8 hours ago

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Paper • 2503.13964 • Published 9 days ago • 16

upvoted a paper 2 days ago

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Paper • 2312.02949 • Published Dec 5, 2023 • 15

upvoted a paper 6 days ago

TULIP: Towards Unified Language-Image Pretraining

Paper • 2503.15485 • Published 8 days ago • 43

upvoted a paper 7 days ago

Aligning Multimodal LLM with Human Preference: A Survey

Paper • 2503.14504 • Published 9 days ago • 21

upvoted a collection 13 days ago

Gemma 3 Release

Collection

9 items • Updated 13 days ago • 296

upvoted a paper 20 days ago

NitiBench: A Comprehensive Studies of LLM Frameworks Capabilities for Thai Legal Question Answering

Paper • 2502.10868 • Published Feb 15 • 2

upvoted a paper 23 days ago

ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents

Paper • 2502.18017 • Published about 1 month ago • 19

upvoted 2 articles 27 days ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

Feb 20

• 218

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21

• 145

upvoted a paper 28 days ago

Executable Code Actions Elicit Better LLM Agents

Paper • 2402.01030 • Published Feb 1, 2024 • 105

upvoted 2 papers about 1 month ago

Scalable Vision Language Model Training via High Quality Data Curation

Paper • 2501.05952 • Published Jan 10 • 1

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 173

upvoted 2 collections about 1 month ago

ColQwen2 Models

Collection

Pre-trained checkpoints for the ColQwen2 model. • 4 items • Updated Jan 23 • 4

Qwen2.5-VL

Collection

Vision-language model series based on Qwen2.5 • 10 items • Updated 3 days ago • 417

upvoted an article about 1 month ago

Article

We now support VLMs in smolagents!

Jan 24

• 98

upvoted a paper about 2 months ago

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Paper • 2502.01100 • Published Feb 3 • 17

upvoted an article about 2 months ago

Article

Visualize and understand GPU memory in PyTorch

Dec 24, 2024

• 200

upvoted 2 papers about 2 months ago

Question Answering on Patient Medical Records with Private Fine-Tuned LLMs

Paper • 2501.13687 • Published Jan 23 • 9

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 100

upvoted an article 2 months ago

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

Jan 23

• 165