Instructions to use MoYoYoTech/VoiceDialogue with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MoYoYoTech/VoiceDialogue with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-to-speech", model="MoYoYoTech/VoiceDialogue")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("MoYoYoTech/VoiceDialogue", dtype="auto")

llama-cpp-python

How to use MoYoYoTech/VoiceDialogue with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MoYoYoTech/VoiceDialogue",
	filename="assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf",
)

llm.create_chat_completion(
	messages = "\"The answer to the universe is 42\""
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use MoYoYoTech/VoiceDialogue with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use Docker

docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K

LM Studio
Jan
Ollama
How to use MoYoYoTech/VoiceDialogue with Ollama:
```
ollama run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Unsloth Studio new

How to use MoYoYoTech/VoiceDialogue with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Pi new

How to use MoYoYoTech/VoiceDialogue with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MoYoYoTech/VoiceDialogue:Q6_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MoYoYoTech/VoiceDialogue with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MoYoYoTech/VoiceDialogue:Q6_K

Run Hermes

hermes

Docker Model Runner
How to use MoYoYoTech/VoiceDialogue with Docker Model Runner:
```
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Lemonade

How to use MoYoYoTech/VoiceDialogue with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MoYoYoTech/VoiceDialogue:Q6_K

Run and chat with the model

lemonade run user.VoiceDialogue-Q6_K

List all available models

lemonade list

liumaolin commited on May 30, 2025

Commit

e80f558

1 Parent(s): d691bbc

Add thread readiness checks and is_ready property across services

Browse files

Files changed (8) hide show

src/VoiceDialogue/main.py +5 -0
src/VoiceDialogue/services/audio/aec_audio_capture.py +2 -0
src/VoiceDialogue/services/audio/audio_answer.py +12 -9
src/VoiceDialogue/services/audio/audio_player.py +11 -8
src/VoiceDialogue/services/core/base.py +12 -0
src/VoiceDialogue/services/speech/asr_service.py +2 -0
src/VoiceDialogue/services/speech/speech_monitor.py +2 -0
src/VoiceDialogue/services/text/text_generator.py +2 -0

src/VoiceDialogue/main.py CHANGED Viewed

@@ -1,3 +1,4 @@
 import typing
 from multiprocessing import Queue
 from pathlib import Path
@@ -79,6 +80,10 @@ def launch_system(
     audio_playing_worker = AudioStreamPlayer(audio_playing_queue=tts_generated_audio_queue)
     audio_playing_worker.start()
     threads.append(audio_playing_worker)
     # audio_frame_probe.start_record()
     print(f'{"=" * 80}\n服务启动成功\n{"=" * 80}')
     for thread in threads:

+import time
 import typing
 from multiprocessing import Queue
 from pathlib import Path
     audio_playing_worker = AudioStreamPlayer(audio_playing_queue=tts_generated_audio_queue)
     audio_playing_worker.start()
     threads.append(audio_playing_worker)
+    while not all([thread.is_ready for thread in threads]):
+        time.sleep(0.1)
     # audio_frame_probe.start_record()
     print(f'{"=" * 80}\n服务启动成功\n{"=" * 80}')
     for thread in threads:

src/VoiceDialogue/services/audio/aec_audio_capture.py CHANGED Viewed

@@ -32,6 +32,8 @@ class EchoCancellingAudioCapture(BaseThread):
         audio_recorder.freeAudioData.argtypes = [ctypes.POINTER(ctypes.c_ubyte)]
         audio_recorder.startRecord()
         try:
             while not self.stopped():
                 size = ctypes.c_int(0)

         audio_recorder.freeAudioData.argtypes = [ctypes.POINTER(ctypes.c_ubyte)]
         audio_recorder.startRecord()
+        self.is_ready = True
         try:
             while not self.stopped():
                 size = ctypes.c_int(0)

src/VoiceDialogue/services/audio/audio_answer.py CHANGED Viewed

@@ -23,15 +23,8 @@ class TTSAudioGenerator(BaseThread):
         self.processed_answer_queue: Queue = processed_answer_queue
         self.tts_generated_audio_queue: Queue = tts_generated_audio_queue
-        device = "cpu"  # mps slower 11.66(cpu) vs 39.42(mps)
-        tts_config = self.setup_tts_config(device, voice_role)
-        self.tts_module = TTSModule(tts_config)
-        self.tts_module.setup_inference_params(
-            ref_audio=voice_role.reference_audio_path,
-            parallel_infer=False,
-            **voice_role.inference_parameters
-        )
     def setup_tts_config(self, device, voice_role: VoiceModel):
         config = {
@@ -60,8 +53,18 @@ class TTSAudioGenerator(BaseThread):
     def run(self):
         self.warmup()
         while not self.stopped():
             try:
                 voice_task: VoiceTask = self.processed_answer_queue.get(block=False, timeout=0.1)

         self.processed_answer_queue: Queue = processed_answer_queue
         self.tts_generated_audio_queue: Queue = tts_generated_audio_queue
+        self._device = "cpu"  # mps slower 11.66(cpu) vs 39.42(mps)
+        self._voice_role = voice_role
     def setup_tts_config(self, device, voice_role: VoiceModel):
         config = {
     def run(self):
+        tts_config = self.setup_tts_config(self._device, self._voice_role)
+        self.tts_module = TTSModule(tts_config)
+        self.tts_module.setup_inference_params(
+            ref_audio=self._voice_role.reference_audio_path,
+            parallel_infer=False,
+            **self._voice_role.inference_parameters
+        )
         self.warmup()
+        self.is_ready = True
         while not self.stopped():
             try:
                 voice_task: VoiceTask = self.processed_answer_queue.get(block=False, timeout=0.1)

src/VoiceDialogue/services/audio/audio_player.py CHANGED Viewed

@@ -1,4 +1,5 @@
 import tempfile
 from collections import OrderedDict
 from multiprocessing import Queue
 from queue import Empty
@@ -23,6 +24,8 @@ class AudioStreamPlayer(BaseThread):
         self.audio_playing_queue: Queue = audio_playing_queue
     def run(self):
         while not self.stopped():
             try:
@@ -54,14 +57,14 @@ class AudioStreamPlayer(BaseThread):
                     if answer_id not in voice_state_manager.waiting_second_answer_mapping:
                         continue
-                # now = time.time()
-                # print(
-                #     f'整体耗时: {(now - voice_task.send_time):.2f}\n'
-                #     f'Whisper 耗时: {(voice_task.whisper_end_time - voice_task.whisper_start_time):.2f}\n'
-                #     f'LLM 耗时: {(voice_task.llm_end_time - voice_task.llm_start_time):.2f}\n'
-                #     f'TTS generate sentence: {voice_task.answer_sentence}\n'
-                #     f'TTS 耗时: {(voice_task.tts_end_time - voice_task.tts_start_time):.2f}\n\n'
-                # )
                 self.update_chat_history(voice_task)

 import tempfile
+import time
 from collections import OrderedDict
 from multiprocessing import Queue
 from queue import Empty
         self.audio_playing_queue: Queue = audio_playing_queue
     def run(self):
+        self.is_ready = True
         while not self.stopped():
             try:
                     if answer_id not in voice_state_manager.waiting_second_answer_mapping:
                         continue
+                now = time.time()
+                print(
+                    f'整体耗时: {(now - voice_task.send_time):.2f}\n'
+                    f'Whisper/FunASR 耗时: {(voice_task.whisper_end_time - voice_task.whisper_start_time):.2f}\n'
+                    f'LLM 耗时: {(voice_task.llm_end_time - voice_task.llm_start_time):.2f}\n'
+                    f'TTS generate sentence: {voice_task.answer_sentence}\n'
+                    f'TTS 耗时: {(voice_task.tts_end_time - voice_task.tts_start_time):.2f}\n\n'
+                )
                 self.update_chat_history(voice_task)

src/VoiceDialogue/services/core/base.py CHANGED Viewed

@@ -6,9 +6,21 @@ class BaseThread(threading.Thread):
     def __init__(self, group=None, target=None, name=None, args=(), kwargs=None, *, daemon=None):
         super().__init__(group, target, name, args, kwargs, daemon=daemon)
         self._stop_event = threading.Event()
     def stop(self):
         self._stop_event.set()
     def stopped(self):
         return self._stop_event.is_set()

     def __init__(self, group=None, target=None, name=None, args=(), kwargs=None, *, daemon=None):
         super().__init__(group, target, name, args, kwargs, daemon=daemon)
         self._stop_event = threading.Event()
+        self._is_ready_event = threading.Event()
     def stop(self):
         self._stop_event.set()
     def stopped(self):
         return self._stop_event.is_set()
+    @property
+    def is_ready(self):
+        return self._is_ready_event.is_set()
+    @is_ready.setter
+    def is_ready(self, value: bool):
+        if value:
+            self._is_ready_event.set()
+        else:
+            self._is_ready_event.clear()

src/VoiceDialogue/services/speech/asr_service.py CHANGED Viewed

@@ -182,6 +182,8 @@ class ASRWorker(BaseThread):
         self.client = UnifiedASRClient(self.language)
         self.client.warmup()
         while not self.stopped():
             voice_task: VoiceTask = self.user_voice_queue.get()
             voice_task.language = self.language

         self.client = UnifiedASRClient(self.language)
         self.client.warmup()
+        self.is_ready = True
         while not self.stopped():
             voice_task: VoiceTask = self.user_voice_queue.get()
             voice_task.language = self.language

src/VoiceDialogue/services/speech/speech_monitor.py CHANGED Viewed

@@ -210,6 +210,8 @@ class SpeechStateMonitor(BaseThread):
         主运行循环 - 监控语音状态并处理音频帧
         """
         # 初始化状态变量
         audio_frames = np.array([])
         is_audio_sent_for_processing = False

         主运行循环 - 监控语音状态并处理音频帧
         """
+        self.is_ready = True
         # 初始化状态变量
         audio_frames = np.array([])
         is_audio_sent_for_processing = False

src/VoiceDialogue/services/text/text_generator.py CHANGED Viewed

@@ -182,6 +182,8 @@ class LLMResponseGenerator(BaseThread):
         pipeline = create_langchain_pipeline(self.model_instance, CHINESE_SYSTEM_PROMPT, self.get_session_history)
         warmup_langchain_pipeline(pipeline)
         """主运行循环"""
         while not self.stopped():
             try:

         pipeline = create_langchain_pipeline(self.model_instance, CHINESE_SYSTEM_PROMPT, self.get_session_history)
         warmup_langchain_pipeline(pipeline)
+        self.is_ready = True
         """主运行循环"""
         while not self.stopped():
             try: