Model Checkpoints
SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation.
The repository's model_checkpoints
directory contains checkpoints for both student and teacher models. Each model is available in three variants:
1. ac_v1_iclr
- Training Data: AudioCaps
- Conditioning: Uses the last layer of the CLAP text branch.
- Details: This variant corresponds to the checkpoint used in ICLR'25 publication.
2. ac_v2
- Training Data: AudioCaps
- Conditioning: Uses the second last layer of the CLAP text branch.
3. as_ac_v2
- Training Data: AudioSet and AudioCaps
- Conditioning: Uses the second last layer of the CLAP text branch.
- Additional Information: For training, we use text descriptions of Audioset in here.
Auxiliary Checkpoints
The utils_checkpoint
directory includes additional checkpoints for auxiliary components, such as the audio compression model.
Citation
@inproceedings{saito2025soundctm,
title={Sound{CTM}: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation},
author={Koichi Saito and Dongjun Kim and Takashi Shibuya and Chieh-Hsin Lai and Zhi Zhong and Yuhta Takida and Yuki Mitsufuji},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=KrK6zXbjfO}
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.