Look Once to Hear: Target Speech Hearing with Noisy Examples Paper • 2405.06289 • Published 19 days ago • 3
CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner Paper • 2405.14979 • Published 6 days ago • 11
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 238
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs Paper • 2402.15627 • Published Feb 23 • 31
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11 • 28
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published Apr 8 • 57
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 87
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs Paper • 2401.11708 • Published Jan 22 • 27
InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes Paper • 2401.05335 • Published Jan 10 • 26
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces Paper • 2312.15715 • Published Dec 25, 2023 • 19
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation Paper • 2312.03641 • Published Dec 6, 2023 • 19
DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models Paper • 2312.05107 • Published Dec 8, 2023 • 32