Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features Paper โข 2504.00557 โข Published 26 days ago โข 15
Running 552 552 Talking Face Generation with Multilingual TTS ๐ Generate a talking face video from text