9 127 156

Emanuele Vivoli

emanuelevivoli

https://emanuelevivoli.github.io

AI & ML interests

I work on Comics/Manga :)

Recent Activity

upvoted a paper 2 days ago

SmolVLM: Redefining small and efficient multimodal models

liked a model 7 days ago

andreagemelli/Phi-3.5-mini-thinking-function_calling-V0

updated a dataset 9 days ago

VLR-CVC/ComicsPAP

View all activity

Organizations

emanuelevivoli's activity

upvoted a paper 2 days ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published 3 days ago • 139

upvoted a paper 9 days ago

Your ViT is Secretly an Image Segmentation Model

Paper • 2503.19108 • Published 17 days ago • 19

upvoted a paper 12 days ago

Video-R1: Reinforcing Video Reasoning in MLLMs

Paper • 2503.21776 • Published 14 days ago • 76

upvoted a paper 13 days ago

Gemma 3 Technical Report

Paper • 2503.19786 • Published 16 days ago • 43

upvoted a paper 23 days ago

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Paper • 2503.11579 • Published 27 days ago • 18

upvoted a collection 24 days ago

Comics Pick-A-Panel

Collection

Dataset, Models and Paper from ComicsPAP: understanding comic strips by picking the correct panel • 4 items • Updated 27 days ago • 3

upvoted 2 papers 30 days ago

R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 34

R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model

Paper • 2503.05132 • Published Mar 7 • 55

upvoted 2 papers about 1 month ago

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

Paper • 2503.06749 • Published Mar 9 • 25

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Paper • 2503.01743 • Published Mar 3 • 83

upvoted 3 papers about 2 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 141

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 180

LM2: Large Memory Models

Paper • 2502.06049 • Published Feb 9 • 30

upvoted an article 2 months ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 837

upvoted 5 papers 3 months ago

TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space

Paper • 2501.12224 • Published Jan 21 • 48

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21 • 86

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Paper • 2501.05452 • Published Jan 9 • 15

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published Jan 9 • 92

VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control

Paper • 2501.01427 • Published Jan 2 • 55

upvoted a paper 4 months ago

Parallelized Autoregressive Visual Generation

Paper • 2412.15119 • Published Dec 19, 2024 • 54