Umberto Cappellazzo's picture

1 7 2

Umberto Cappellazzo

hisoka94

·

https://umbertocappellazzo.github.io/

AI & ML interests

Multimodal Large Language Models and audio-visual speech processing at @ Imperial College London.

Recent Activity

upvoted a paper about 1 month ago

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

upvoted a paper about 1 month ago

Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs

authored a paper about 1 month ago

Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs

View all activity

Organizations

None yet

hisoka94's activity

upvoted 2 papers about 1 month ago

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

Paper • 2503.06749 • Published Mar 9 • 29

Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs

Paper • 2503.06362 • Published Mar 9 • 3

upvoted 3 papers about 1 year ago

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 189

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 130

FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19, 2024 • 49

upvoted a collection about 1 year ago

MoEs papers reading list

60 items • Updated Nov 4, 2024 • 140

upvoted a paper over 1 year ago

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Paper • 2312.07987 • Published Dec 13, 2023 • 41