CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models Paper • 2306.09635 • Published Jun 16, 2023 • 5