AEmotionStudio
/

prismaudio-models

+---
+license: mit
+base_model:
+  - FunAudioLLM/PrismAudio
+tags:
+  - audio
+  - video2audio
+  - generation
+  - safetensors
+pipeline_tag: text-to-audio
+---
+# PrismAudio Models (SafeTensors Mirror)
+Mirrored and converted from [FunAudioLLM/PrismAudio](https://huggingface.co/FunAudioLLM/PrismAudio).
+All weights have been converted from PyTorch `.ckpt`/`.pth` to **SafeTensors** format for:
+- ✅ Faster loading
+- ✅ Memory-mapped I/O
+- ✅ No arbitrary code execution risk
+## Files
+| File | Description |
+|------|-------------|
+| `prismaudio.safetensors` | Main PrismAudio model weights (518M params) |
+| `synchformer_state_dict.safetensors` | Synchformer temporal alignment encoder |
+| `vae.safetensors` | Oobleck VAE decoder |
+## Usage
+These weights are used by the MAESTRO AI Workstation's PrismAudio panel for
+decomposed Chain-of-Thought video-to-audio generation.
+## Citation
+```bibtex
+@misc{liu2025thinksound,
+  title={ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing},
+  author={Huadai Liu and Jialei Wang and Kaicheng Luo and Wen Wang and Qian Chen and Zhou Zhao and Wei Xue},
+  year={2025},
+  eprint={2506.21448},
+  archivePrefix={arXiv},
+}
+```