6 OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for Object-Centric Learning · 6 authors
5 CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models · 8 authors