Spaces:

coqui
/

voice-chat-with-mistral

Paused

App Files Files Community

audio-while-streaming

by gorkemgoknar - opened Oct 13, 2023

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

+369

-117

gorkemgoknar

Coqui.ai org Oct 13, 2023

•

edited Oct 17, 2023

Uses XTTS streaming backend now, it can stream just fine but for better user experiences (due to backend lags) we need to combine chunks and serve.
There is a AUDIO_WAIT_MODIFIER environment varialbe (0.9 default for a T4 GPU case) .. it can be set to 1.0 for a faster GPU.. Until Gradio supports real bytestreaming (with the calculated length) , this is the only solution as if we do not wait for audio play it will switch to next one too fast.
Final merged audio is available (so mobile users can listen, as on mobile autoplay will not work due to security )
system message changable now , mistral can act what we like to .
There is example for direct voice streaming (DIRECT_STREAM=1) , but it will produce lags due to mistral also streaming on backend.

faster voice554d1f68

fix trasncription to mistral5c0eed51

fix stt outputf34dc34a

Fixed STT to TTS and uses streaming TTSd3d83c11

fix repo nameda4b0742

stream voice with combined wav at end, optional direct streama38b58d9

system message update10f2f464

warning not requiredc6df1a5a

make interactive after speech, update gradio404ae8a1

add 0.5 second at paragraph end to calculate fulla udiof24201bf

set default modifier to 0.9 for a T4 GPUbd470e76

gorkemgoknar changed pull request status to open Oct 17, 2023

gorkemgoknar

Coqui.ai org Oct 17, 2023

•

edited Oct 17, 2023

For A10 or faster gpu AUDIO_WAIT_MODIFIER can be set 1 , but 0.9 for T4 (or Turing based models)
DIRECT_STREAM=1 parameter will use continuous streaming like on xtts-streaming but audio is choppy as mistral is streaming too and it goes to mistral loop to complete sentence once first xtts yield is done.
Opted to go with DIRECT_STREAM=0 , as results are kind of good. AUDIO_WAIT_MODIFIER=1 will work just fine but there will be too much delays between sentences, so 0.9 for T4 seems like a sweet spot

limit speech to 250 characters for now3f2e1a87

add a silence instead of noned346a71c

fix last sentece, use file for iose0aeb7ab

remove code part9cde7f1f

add mistral error handling26b68c80

fix initial history, remove unnecessary comment900bc634

ylacombe

Oct 18, 2023

Thanks for this!

ylacombe changed pull request status to merged Oct 18, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment