prismaudio-models / README.md
AEmotionStudio's picture
Upload README.md with huggingface_hub
ba722b4 verified
metadata
license: mit
base_model:
  - FunAudioLLM/PrismAudio
tags:
  - audio
  - video2audio
  - generation
  - safetensors
pipeline_tag: text-to-audio

PrismAudio Models (SafeTensors Mirror)

Mirrored and converted from FunAudioLLM/PrismAudio.

All weights have been converted from PyTorch .ckpt/.pth to SafeTensors format for:

  • ✅ Faster loading
  • ✅ Memory-mapped I/O
  • ✅ No arbitrary code execution risk

Files

File Description
prismaudio.safetensors Main PrismAudio model weights (518M params)
synchformer_state_dict.safetensors Synchformer temporal alignment encoder
vae.safetensors Oobleck VAE decoder

Usage

These weights are used by the MAESTRO AI Workstation's PrismAudio panel for decomposed Chain-of-Thought video-to-audio generation.

Citation

@misc{liu2025thinksound,
  title={ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing},
  author={Huadai Liu and Jialei Wang and Kaicheng Luo and Wen Wang and Qian Chen and Zhou Zhao and Wei Xue},
  year={2025},
  eprint={2506.21448},
  archivePrefix={arXiv},
}