Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,74 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
# Westlake-Omni
|
3 |
+
|
4 |
+
<p align="center"><strong style="font-size: 18px;">
|
5 |
+
Westlake-Omni: Open-Source Chinese Emotional Speech Interaction Large Language Model with Unified Discrete Sequence Modeling
|
6 |
+
</strong>
|
7 |
+
</p>
|
8 |
+
|
9 |
+
<p align="center">
|
10 |
+
🤗 <a href="https://huggingface.co/xinchen-ai/Westlake-Omni">Hugging Face</a> | 📖 <a href="https://github.com/xinchen-ai/Westlake-Omni">Github</a>
|
11 |
+
</p>
|
12 |
+
|
13 |
+
Westlake-Omni is an open-source Chinese emotional speech interaction large language model that utilizes discrete representations to achieve unified processing of speech and text modalities. The model supports low-latency generation and high-quality Chinese emotional speech interaction.
|
14 |
+
|
15 |
+
<p align="center">
|
16 |
+
<img src="model.jpeg" width="100%"/>
|
17 |
+
</p>
|
18 |
+
|
19 |
+
|
20 |
+
## Highlights
|
21 |
+
|
22 |
+
🎙️ **Utilizes discrete representations to unify the processing of speech and text modalities.**.
|
23 |
+
|
24 |
+
🎭 **Trained on a high-quality Chinese emotional speech dataset, enabling native emotional speech interaction in Chinese.**
|
25 |
+
|
26 |
+
⚡ **Low-latency speech interaction, simultaneously generating text and speech responses.**
|
27 |
+
|
28 |
+
https://github.com/user-attachments/assets/02a71c01-3384-4845-8e7f-4e0dda35d8f3
|
29 |
+
|
30 |
+
|
31 |
+
|
32 |
+
## Install
|
33 |
+
|
34 |
+
Create a new conda environment and install the required packages:
|
35 |
+
|
36 |
+
```sh
|
37 |
+
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 -c pytorch
|
38 |
+
|
39 |
+
git clone git@github.com:xinchen-ai/Westlake-Omni.git
|
40 |
+
cd Westlake-Omni
|
41 |
+
pip install -r requirements.txt
|
42 |
+
```
|
43 |
+
|
44 |
+
## Quick start
|
45 |
+
|
46 |
+
**Interactive demo**
|
47 |
+
|
48 |
+
- run gradio demo
|
49 |
+
```sh
|
50 |
+
python gradio_demo.py
|
51 |
+
```
|
52 |
+
|
53 |
+
**Local test**
|
54 |
+
|
55 |
+
- cli
|
56 |
+
```sh
|
57 |
+
python generate.py --user-audio data/sounds/input.wav --user-text 嗯,最近心情不是很好,能聊聊吗?
|
58 |
+
python generate.py --user-audio data/sounds/input.wav
|
59 |
+
```
|
60 |
+
|
61 |
+
|
62 |
+
## Acknowledgements
|
63 |
+
|
64 |
+
- [fish-speech](https://github.com/fishaudio/fish-speech) The codebase we built upon.
|
65 |
+
- [Qwen2](https://github.com/QwenLM/Qwen2/) as the LLM backbone.
|
66 |
+
|
67 |
+
## Lincese
|
68 |
+
The current code and the vqgan model weights are provided under the CC-BY-NC-SA-4.0 License. The large language model weights are provided under the Apache 2.0 License. Note that parts of this code are based on Fish speech, released under the CC-BY-NC-SA-4.0 License.
|
69 |
+
|
70 |
+
## Contact
|
71 |
+
If you have any questions, please raise an issue or contact us at [service@xinchenai.com](service@xinchenai.com).
|
72 |
+
|
73 |
+
## Star History
|
74 |
+
[![Star History Chart](https://api.star-history.com/svg?repos=xinchen-ai/Westlake-Omni&type=Date)](https://star-history.com/#xinchen-ai/Westlake-Omni&Date)
|