70 64 160

Andres Marafioti

andito

AI & ML interests

Multimodal models, VLM and TTS

Recent Activity

updated a Space about 8 hours ago

HuggingFaceTB/smolvlm-web-benchmarking-all

published a Space about 8 hours ago

HuggingFaceTB/smolvlm-web-benchmarking-all

new activity about 9 hours ago

HuggingFaceTB/SmolVLM-Instruct:How many parameters are there in the model?

View all activity

Organizations

andito's activity

upvoted a paper about 14 hours ago

OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts

Paper • 2503.22952 • Published 6 days ago • 17

upvoted a paper 1 day ago

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published 9 days ago • 113

upvoted a paper 18 days ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published 20 days ago • 81

upvoted 2 articles about 1 month ago

Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

Mar 4

• 71

Article

SmolVLM2: Bringing Video Understanding to Every Device

Feb 20

• 223

upvoted a paper about 2 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 216

upvoted 2 articles about 2 months ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.2k

Article

Fixing Gradient Accumulation

Oct 16, 2024

• 53

upvoted 2 articles 2 months ago

Article

We now support VLMs in smolagents!

Jan 24

• 99

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

Jan 23

• 168

upvoted a collection 2 months ago

SmolVLM 256M & 500M

Collection

Collection for models & demos for even smoller SmolVLM release • 12 items • Updated Feb 20 • 72

upvoted a paper 2 months ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 150

upvoted an article 2 months ago

Article

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Aug 22, 2023

• 31

upvoted 2 papers 4 months ago

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Paper • 2412.10302 • Published Dec 13, 2024 • 17

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 146

upvoted a collection 4 months ago

Nov 29 Releases 🌲🌲

Collection

25 items • Updated Dec 2, 2024 • 10

upvoted an article 5 months ago

Article

Llama 3.2 in Keras

Oct 21, 2024

• 12