Federico Minutoli

DiTo97

DiTo97

AI & ML interests

anything machine learning. I am strongly passionate in computer vision and robotics, and how machine learning will help achieve autonomous behavior, perception and continuous learning.

Recent Activity

upvoted a paper 5 days ago

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

liked a dataset about 2 months ago

faweigend/wearmocap

upvoted an article 3 months ago

Deriving DPO's Loss

View all activity

Organizations

DiTo97's activity

upvoted a paper 5 days ago

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Paper • 2503.04724 • Published 7 days ago • 60

upvoted an article 3 months ago

Article

Deriving DPO's Loss

•

Dec 24, 2024

• 26

upvoted a paper 3 months ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 140

upvoted a paper 4 months ago

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Paper • 2411.02337 • Published Nov 4, 2024 • 35

upvoted a paper 7 months ago

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16, 2024 • 99

upvoted an article 10 months ago

Article

License to Call: Introducing Transformers Agents 2.0

May 13, 2024

• 130

upvoted a paper 11 months ago

LEGENT: Open Platform for Embodied Agents

Paper • 2404.18243 • Published Apr 28, 2024 • 22

upvoted 8 papers about 1 year ago

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Paper • 2401.15071 • Published Jan 26, 2024 • 37

AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

Paper • 2401.12963 • Published Jan 23, 2024 • 12

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Paper • 2401.04081 • Published Jan 8, 2024 • 71

upvoted 2 papers over 1 year ago

Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

Paper • 2312.03491 • Published Dec 6, 2023 • 35

Merlin:Empowering Multimodal LLMs with Foresight Minds

Paper • 2312.00589 • Published Nov 30, 2023 • 27

upvoted a collection over 1 year ago

Seamless Communication

Collection

A significant step towards removing language barriers through expressive, fast and high-quality AI translation. • 16 items • Updated Jan 16, 2024 • 153

upvoted 2 papers over 1 year ago

SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering

Paper • 2311.12775 • Published Nov 21, 2023 • 28

Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs

Paper • 2311.05657 • Published Nov 9, 2023 • 32