YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
ZipVoice.AXERA
ZipVoice AXERA 板端推理 demo。
功能
- 支持中文和英文语音生成。
- 支持语音克隆。
- 支持 ZipVoice、ZipVoice Distill
模型说明
ZipVoice Distill 是 ZipVoice 的蒸馏版本,主要优势是在较小性能损失下提升推理速度。初步测试,AX650 ZipVoice Distill 在长文本场景下相比基础版模型约有 3 倍速度提升,RTF 在 0.3 左右,效果没有明显下降。
AX630C 版本当前推理结果差,RTF 约为 1.5 左右,需要继续调优。
模型转换
模型量化参考:
支持平台
- AX650
- AX650 demo 板
- M4N-Dock(爱芯派Pro)
- M.2 Accelerator Card
目录结构
ZipVoice.AXERA/
├── assets/
│ ├── moss_prompts/
│ └── paragraphs/
├── models/
│ ├── zipvoice_ax650/
│ ├── zipvoice_distill_ax650/
│ └── zipvoice_distill_ax630C/
├── resources/
│ ├── vocos-mel-24khz/
│ └── zipvoice_hf/
├── scripts/
├── infer_zipvoice_axera.py
├── requirements.txt
└── README.md
环境
安装 pyaxengine:
pip3 install axengine-x.x.x-py3-none-any.whl
安装依赖:
conda create -n ZipVoice python=3.10
conda activate ZipVoice
pip3 install -r requirements.txt
推理命令
进入目录:
cd ZipVoice.AXERA
AX650 ZipVoice
中文句子:
python3 infer_zipvoice_axera.py \
--model-name zipvoice_ax650 \
--text "今天午后天气很好,我打开窗户,听见远处有人聊天,水杯也轻轻晃了一下。" \
--prompt-text "不管怎么样我和汤姆还是要感谢贝尔卡金的援手" \
--prompt-wav assets/moss_prompts/zh_1_4p5s.wav \
--output-wav outputs/zh_sentence_ax650.wav \
--seed 42
推理结果:
推理耗时: 5.781s
生成语音时长: 6.411s
RTF: 0.9018
音频:outputs/zh_sentence_ax650.wav
提示音:assets/moss_prompts/zh_1_4p5s.wav
英文句子:
python3 infer_zipvoice_axera.py \
--model-name zipvoice_ax650 \
--text "This morning, a small train left the station, carrying sleepy passengers toward a bright coastal town." \
--prompt-text "This is almost twice the current industry production level per train." \
--prompt-wav assets/moss_prompts/en_4_4p5s.wav \
--output-wav outputs/en_sentence_ax650.wav \
--seed 42
推理结果:
推理耗时: 5.711s
生成语音时长: 6.411s
RTF: 0.8909
音频:outputs/en_sentence_ax650.wav
提示音:assets/moss_prompts/en_4_4p5s.wav
中文段落:
python3 infer_zipvoice_axera.py \
--model-name zipvoice_ax650 \
--text-file assets/paragraphs/zh_ginkgo.txt \
--prompt-text "不管怎么样我和汤姆还是要感谢贝尔卡金的援手" \
--prompt-wav assets/moss_prompts/zh_1_4p5s.wav \
--output-wav outputs/zh_long_paragraph_ax650.wav \
--seed 42
推理结果:
推理耗时: 40.292s
生成语音时长: 44.744s
RTF: 0.9005
音频:outputs/zh_long_paragraph_ax650.wav
提示音:assets/moss_prompts/zh_1_4p5s.wav
英文段落:
python3 infer_zipvoice_axera.py \
--model-name zipvoice_ax650 \
--text-file assets/paragraphs/en_scavenger.txt \
--prompt-text "This is almost twice the current industry production level per train." \
--prompt-wav assets/moss_prompts/en_4_4p5s.wav \
--output-wav outputs/en_long_paragraph_ax650.wav \
--seed 42
推理结果:
推理耗时: 62.161s
生成语音时长: 64.749s
RTF: 0.9600
音频:outputs/en_long_paragraph_ax650.wav
提示音:assets/moss_prompts/en_4_4p5s.wav
AX650 ZipVoice Distill
中文句子:
python3 infer_zipvoice_axera.py \
--model-name zipvoice_distill_ax650 \
--text "今天午后天气很好,我打开窗户,听见远处有人聊天,水杯也轻轻晃了一下。" \
--prompt-text "不管怎么样我和汤姆还是要感谢贝尔卡金的援手" \
--prompt-wav assets/moss_prompts/zh_1_4p5s.wav \
--output-wav outputs/zh_sentence_distill_ax650.wav \
--seed 42
推理结果:
推理耗时: 1.992s
生成语音时长: 6.411s
RTF: 0.3107
音频:outputs/zh_sentence_distill_ax650.wav
提示音:assets/moss_prompts/zh_1_4p5s.wav
英文句子:
python3 infer_zipvoice_axera.py \
--model-name zipvoice_distill_ax650 \
--text "This morning, a small train left the station, carrying sleepy passengers toward a bright coastal town." \
--prompt-text "This is almost twice the current industry production level per train." \
--prompt-wav assets/moss_prompts/en_4_4p5s.wav \
--output-wav outputs/en_sentence_distill_ax650.wav \
--seed 42
推理结果:
推理耗时: 2.045s
生成语音时长: 6.411s
RTF: 0.3189
音频:outputs/en_sentence_distill_ax650.wav
提示音:assets/moss_prompts/en_4_4p5s.wav
中文段落:
python3 infer_zipvoice_axera.py \
--model-name zipvoice_distill_ax650 \
--text-file assets/paragraphs/zh_ginkgo.txt \
--prompt-text "不管怎么样我和汤姆还是要感谢贝尔卡金的援手" \
--prompt-wav assets/moss_prompts/zh_1_4p5s.wav \
--output-wav outputs/zh_long_paragraph_distill_ax650.wav \
--seed 42
推理结果:
推理耗时: 13.457s
生成语音时长: 44.744s
RTF: 0.3008
音频:outputs/zh_long_paragraph_distill_ax650.wav
提示音:assets/moss_prompts/zh_1_4p5s.wav
英文段落:
python3 infer_zipvoice_axera.py \
--model-name zipvoice_distill_ax650 \
--text-file assets/paragraphs/en_scavenger.txt \
--prompt-text "This is almost twice the current industry production level per train." \
--prompt-wav assets/moss_prompts/en_4_4p5s.wav \
--output-wav outputs/en_long_paragraph_distill_ax650.wav \
--seed 42
推理结果:
推理耗时: 19.715s
生成语音时长: 64.749s
RTF: 0.3045
音频:outputs/en_long_paragraph_distill_ax650.wav
提示音:assets/moss_prompts/en_4_4p5s.wav
参数说明
--model-name:选择模型目录。可选zipvoice_ax650、zipvoice_distill_ax650、zipvoice_distill_ax630C。--prompt-wav:参考音频,用于控制音色,建议 3-5s。--prompt-text:参考音频对应文本,必须尽量和prompt-wav内容一致。--num-step:采样步数。默认从模型目录的runtime_config.json读取。--max-feat-len:decoder 固定 feature 长度,当前模型均为 1024。
参考
- Downloads last month
- 172