AEmotionStudio commited on
Commit
ba722b4
·
verified ·
1 Parent(s): f155eaa

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - FunAudioLLM/PrismAudio
5
+ tags:
6
+ - audio
7
+ - video2audio
8
+ - generation
9
+ - safetensors
10
+ pipeline_tag: text-to-audio
11
+ ---
12
+
13
+ # PrismAudio Models (SafeTensors Mirror)
14
+
15
+ Mirrored and converted from [FunAudioLLM/PrismAudio](https://huggingface.co/FunAudioLLM/PrismAudio).
16
+
17
+ All weights have been converted from PyTorch `.ckpt`/`.pth` to **SafeTensors** format for:
18
+ - ✅ Faster loading
19
+ - ✅ Memory-mapped I/O
20
+ - ✅ No arbitrary code execution risk
21
+
22
+ ## Files
23
+
24
+ | File | Description |
25
+ |------|-------------|
26
+ | `prismaudio.safetensors` | Main PrismAudio model weights (518M params) |
27
+ | `synchformer_state_dict.safetensors` | Synchformer temporal alignment encoder |
28
+ | `vae.safetensors` | Oobleck VAE decoder |
29
+
30
+ ## Usage
31
+
32
+ These weights are used by the MAESTRO AI Workstation's PrismAudio panel for
33
+ decomposed Chain-of-Thought video-to-audio generation.
34
+
35
+ ## Citation
36
+
37
+ ```bibtex
38
+ @misc{liu2025thinksound,
39
+ title={ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing},
40
+ author={Huadai Liu and Jialei Wang and Kaicheng Luo and Wen Wang and Qian Chen and Zhou Zhao and Wei Xue},
41
+ year={2025},
42
+ eprint={2506.21448},
43
+ archivePrefix={arXiv},
44
+ }
45
+ ```