nlper2022 commited on
Commit
eb79ceb
1 Parent(s): 287e0d9

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -0
README.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Westlake-Omni
3
+
4
+ <p align="center"><strong style="font-size: 18px;">
5
+ Westlake-Omni: Open-Source Chinese Emotional Speech Interaction Large Language Model with Unified Discrete Sequence Modeling
6
+ </strong>
7
+ </p>
8
+
9
+ <p align="center">
10
+ 🤗 <a href="https://huggingface.co/xinchen-ai/Westlake-Omni">Hugging Face</a> | 📖 <a href="https://github.com/xinchen-ai/Westlake-Omni">Github</a>
11
+ </p>
12
+
13
+ Westlake-Omni is an open-source Chinese emotional speech interaction large language model that utilizes discrete representations to achieve unified processing of speech and text modalities. The model supports low-latency generation and high-quality Chinese emotional speech interaction.
14
+
15
+ <p align="center">
16
+ <img src="model.jpeg" width="100%"/>
17
+ </p>
18
+
19
+
20
+ ## Highlights
21
+
22
+ 🎙️ **Utilizes discrete representations to unify the processing of speech and text modalities.**.
23
+
24
+ 🎭 **Trained on a high-quality Chinese emotional speech dataset, enabling native emotional speech interaction in Chinese.**
25
+
26
+ ⚡ **Low-latency speech interaction, simultaneously generating text and speech responses.**
27
+
28
+ https://github.com/user-attachments/assets/02a71c01-3384-4845-8e7f-4e0dda35d8f3
29
+
30
+
31
+
32
+ ## Install
33
+
34
+ Create a new conda environment and install the required packages:
35
+
36
+ ```sh
37
+ conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 -c pytorch
38
+
39
+ git clone git@github.com:xinchen-ai/Westlake-Omni.git
40
+ cd Westlake-Omni
41
+ pip install -r requirements.txt
42
+ ```
43
+
44
+ ## Quick start
45
+
46
+ **Interactive demo**
47
+
48
+ - run gradio demo
49
+ ```sh
50
+ python gradio_demo.py
51
+ ```
52
+
53
+ **Local test**
54
+
55
+ - cli
56
+ ```sh
57
+ python generate.py --user-audio data/sounds/input.wav --user-text 嗯,最近心情不是很好,能聊聊吗?
58
+ python generate.py --user-audio data/sounds/input.wav
59
+ ```
60
+
61
+
62
+ ## Acknowledgements
63
+
64
+ - [fish-speech](https://github.com/fishaudio/fish-speech) The codebase we built upon.
65
+ - [Qwen2](https://github.com/QwenLM/Qwen2/) as the LLM backbone.
66
+
67
+ ## Lincese
68
+ The current code and the vqgan model weights are provided under the CC-BY-NC-SA-4.0 License. The large language model weights are provided under the Apache 2.0 License. Note that parts of this code are based on Fish speech, released under the CC-BY-NC-SA-4.0 License.
69
+
70
+ ## Contact
71
+ If you have any questions, please raise an issue or contact us at [service@xinchenai.com](service@xinchenai.com).
72
+
73
+ ## Star History
74
+ [![Star History Chart](https://api.star-history.com/svg?repos=xinchen-ai/Westlake-Omni&type=Date)](https://star-history.com/#xinchen-ai/Westlake-Omni&Date)