Model Checkpoints

SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation.

Github

The repository's model_checkpoints directory contains checkpoints for both student and teacher models. Each model is available in three variants:

1. ac_v1_iclr

  • Training Data: AudioCaps
  • Conditioning: Uses the last layer of the CLAP text branch.
  • Details: This variant corresponds to the checkpoint used in ICLR'25 publication.

2. ac_v2

  • Training Data: AudioCaps
  • Conditioning: Uses the second last layer of the CLAP text branch.

3. as_ac_v2

  • Training Data: AudioSet and AudioCaps
  • Conditioning: Uses the second last layer of the CLAP text branch.
  • Additional Information: For training, we use text descriptions of Audioset in here.

Auxiliary Checkpoints

The utils_checkpoint directory includes additional checkpoints for auxiliary components, such as the audio compression model.

Citation

@inproceedings{saito2025soundctm,
  title={Sound{CTM}: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation},
  author={Koichi Saito and Dongjun Kim and Takashi Shibuya and Chieh-Hsin Lai and Zhi Zhong and Yuhta Takida and Yuki Mitsufuji},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=KrK6zXbjfO}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support