Marcos Remar
Claude
commited on
Commit
·
51563dd
1
Parent(s):
471903a
feat: Implementação completa do sistema Speech-to-Speech com WebRTC
Browse filesPrincipais melhorias:
- ✅ Descoberta e documentação do formato correto de áudio para Ultravox (tuple)
- ✅ Interface Push-to-Talk com histórico completo de mensagens
- ✅ Remoção de todas referências ao Orchestrator (arquitetura simplificada)
- ✅ Scripts de túnel SSH para acesso remoto via MacBook
- ✅ Testes automatizados funcionando com 100% de sucesso
- ✅ Suporte a múltiplas interfaces (iOS, Material, Tailwind)
- ✅ Documentação completa de troubleshooting no README
Arquitetura atual:
- WebRTC Gateway (porta 8082) → Ultravox (50051) + TTS (50054)
- Latência end-to-end: ~286ms
- Audio: PCM 16-bit, 16kHz, Float32 normalizado
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
This view is limited to 50 files because it contains too many changes.
See raw diff
- README.md +95 -16
- manage_services.sh +282 -0
- services/webrtc_gateway/conversation-memory.js +290 -0
- services/webrtc_gateway/favicon.ico +1 -0
- services/webrtc_gateway/opus-decoder.js +150 -0
- services/webrtc_gateway/package-lock.json +470 -5
- services/webrtc_gateway/package.json +4 -3
- services/webrtc_gateway/response_1757390722112.pcm +1 -0
- services/webrtc_gateway/response_1757391966860.pcm +1 -0
- services/webrtc_gateway/start.sh +0 -2
- services/webrtc_gateway/test-audio-cli.js +178 -0
- services/webrtc_gateway/test-memory.js +108 -0
- services/webrtc_gateway/test-portuguese-audio.js +410 -0
- services/webrtc_gateway/test-websocket-speech.js +184 -0
- services/webrtc_gateway/test-websocket.js +317 -0
- services/webrtc_gateway/ultravox-chat-backup.html +964 -0
- services/webrtc_gateway/ultravox-chat-ios.html +1843 -0
- services/webrtc_gateway/ultravox-chat-material.html +1116 -0
- services/webrtc_gateway/ultravox-chat-opus.html +581 -0
- services/webrtc_gateway/ultravox-chat-original.html +964 -0
- services/webrtc_gateway/ultravox-chat-server.js +158 -10
- services/webrtc_gateway/ultravox-chat-tailwind.html +393 -0
- services/webrtc_gateway/ultravox-chat.html +964 -0
- services/webrtc_gateway/webrtc.pid +1 -0
- test-24khz-support.html +243 -0
- test-audio-cli.js +178 -0
- test-grpc-updated.py +161 -0
- test-opus-support.html +337 -0
- test-simple.py +70 -0
- test-tts-button.html +65 -0
- test-ultravox-auto.py +172 -0
- test-ultravox-librosa.py +166 -0
- test-ultravox-simple-prompt.py +206 -0
- test-ultravox-tts.py +121 -0
- test-ultravox-tuple.py +202 -0
- test-ultravox-vllm.py +113 -0
- test-vllm-openai.py +90 -0
- tts_server_kokoro.py +255 -0
- tunnel-macbook.sh +70 -0
- tunnel.sh +95 -0
- ultravox/restart_ultravox.sh +39 -0
- ultravox/server.py +38 -20
- ultravox/server_backup.py +446 -0
- ultravox/server_vllm_090_broken.py +447 -0
- ultravox/server_working_original.py +440 -0
- ultravox/speech.proto +94 -0
- ultravox/start_ultravox.sh +67 -0
- ultravox/stop_ultravox.sh +60 -0
- ultravox/test-tts.py +121 -0
- ultravox/test_audio_coherence.py +193 -0
README.md
CHANGED
|
@@ -10,9 +10,9 @@ cd /workspace/ultravox-pipeline
|
|
| 10 |
./scripts/setup_background.sh
|
| 11 |
|
| 12 |
# Or start individual services:
|
| 13 |
-
cd
|
| 14 |
-
|
| 15 |
-
cd services/webrtc_gateway &&
|
| 16 |
```
|
| 17 |
|
| 18 |
## 📊 Current Status (September 2025)
|
|
@@ -21,7 +21,7 @@ cd services/webrtc_gateway && ./start.sh # WebRTC on port 8081
|
|
| 21 |
|---------|---------|----------|--------|
|
| 22 |
| **TTS Service** | ~91ms | Kokoro v1.0, streaming | ✅ Production |
|
| 23 |
| **Ultravox STT+LLM** | ~180ms | vLLM, custom prompts | ✅ Production |
|
| 24 |
-
| **
|
| 25 |
| **End-to-End** | ~286ms | Full pipeline | ✅ Achieved |
|
| 26 |
|
| 27 |
### ✨ New Features (September 2025)
|
|
@@ -35,17 +35,15 @@ cd services/webrtc_gateway && ./start.sh # WebRTC on port 8081
|
|
| 35 |
|
| 36 |
```mermaid
|
| 37 |
graph TB
|
| 38 |
-
Browser[🌐 Browser] -->|WebSocket| WRG[WebRTC Gateway :
|
| 39 |
-
WRG -->|gRPC|
|
| 40 |
-
|
| 41 |
-
ORCH -->|gRPC| TTS[TTS Service :50054<br/>Kokoro Engine]
|
| 42 |
```
|
| 43 |
|
| 44 |
### Service Responsibilities
|
| 45 |
-
- **WebRTC Gateway**: Browser interface, WebSocket signaling
|
| 46 |
-
- **Orchestrator**: Pipeline coordination, session management, buffering
|
| 47 |
- **Ultravox**: Multimodal Speech-to-Text + LLM (fixie-ai/ultravox-v0_5-llama-3_2-1b)
|
| 48 |
-
- **TTS Service**: Text-to-speech with
|
| 49 |
|
| 50 |
## 🔧 Configuration
|
| 51 |
|
|
@@ -66,9 +64,8 @@ services:
|
|
| 66 |
port: 50051
|
| 67 |
model: "fixie-ai/ultravox-v0_5"
|
| 68 |
|
| 69 |
-
|
| 70 |
-
port:
|
| 71 |
-
buffer_size_ms: 100
|
| 72 |
```
|
| 73 |
|
| 74 |
## 📄 Technical References
|
|
@@ -103,7 +100,6 @@ ultravox-pipeline/
|
|
| 103 |
├── ultravox/ # Speech-to-Text + LLM (submodule ready)
|
| 104 |
├── tts-service/ # Unified TTS Service with Kokoro
|
| 105 |
├── services/
|
| 106 |
-
│ ├── orchestrator/ # Central pipeline coordinator
|
| 107 |
│ └── webrtc_gateway/ # Browser WebRTC interface
|
| 108 |
├── config/ # Centralized YAML configuration
|
| 109 |
├── protos/ # gRPC protocol definitions
|
|
@@ -117,7 +113,7 @@ ultravox-pipeline/
|
|
| 117 |
|
| 118 |
```bash
|
| 119 |
# Test gRPC connections
|
| 120 |
-
grpcurl -plaintext localhost:
|
| 121 |
|
| 122 |
# Run integration tests
|
| 123 |
cd tests/integration
|
|
@@ -133,6 +129,89 @@ python benchmark_latency.py
|
|
| 133 |
- **[gRPC Integration Guide](docs/GRPC_INTEGRATION_GUIDE.md)** - Complete service integration details
|
| 134 |
- **[Context Window Analysis](docs/CONTEXT_WINDOW_ANALYSIS.md)** - Streaming TTS research
|
| 135 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 136 |
## 🤝 Contributing
|
| 137 |
|
| 138 |
Focus areas:
|
|
|
|
| 10 |
./scripts/setup_background.sh
|
| 11 |
|
| 12 |
# Or start individual services:
|
| 13 |
+
cd ultravox && python server.py # Ultravox on port 50051
|
| 14 |
+
python3 tts_server_gtts.py # TTS on port 50054
|
| 15 |
+
cd services/webrtc_gateway && npm start # WebRTC on port 8082
|
| 16 |
```
|
| 17 |
|
| 18 |
## 📊 Current Status (September 2025)
|
|
|
|
| 21 |
|---------|---------|----------|--------|
|
| 22 |
| **TTS Service** | ~91ms | Kokoro v1.0, streaming | ✅ Production |
|
| 23 |
| **Ultravox STT+LLM** | ~180ms | vLLM, custom prompts | ✅ Production |
|
| 24 |
+
| **WebRTC Gateway** | ~20ms | Browser interface | ✅ Production |
|
| 25 |
| **End-to-End** | ~286ms | Full pipeline | ✅ Achieved |
|
| 26 |
|
| 27 |
### ✨ New Features (September 2025)
|
|
|
|
| 35 |
|
| 36 |
```mermaid
|
| 37 |
graph TB
|
| 38 |
+
Browser[🌐 Browser] -->|WebSocket| WRG[WebRTC Gateway :8082]
|
| 39 |
+
WRG -->|gRPC| UV[Ultravox :50051<br/>STT + LLM]
|
| 40 |
+
WRG -->|gRPC| TTS[TTS Service :50054<br/>gTTS Engine]
|
|
|
|
| 41 |
```
|
| 42 |
|
| 43 |
### Service Responsibilities
|
| 44 |
+
- **WebRTC Gateway**: Browser interface, WebSocket signaling, pipeline coordination
|
|
|
|
| 45 |
- **Ultravox**: Multimodal Speech-to-Text + LLM (fixie-ai/ultravox-v0_5-llama-3_2-1b)
|
| 46 |
+
- **TTS Service**: Text-to-speech with gTTS engine
|
| 47 |
|
| 48 |
## 🔧 Configuration
|
| 49 |
|
|
|
|
| 64 |
port: 50051
|
| 65 |
model: "fixie-ai/ultravox-v0_5"
|
| 66 |
|
| 67 |
+
webrtc_gateway:
|
| 68 |
+
port: 8082
|
|
|
|
| 69 |
```
|
| 70 |
|
| 71 |
## 📄 Technical References
|
|
|
|
| 100 |
├── ultravox/ # Speech-to-Text + LLM (submodule ready)
|
| 101 |
├── tts-service/ # Unified TTS Service with Kokoro
|
| 102 |
├── services/
|
|
|
|
| 103 |
│ └── webrtc_gateway/ # Browser WebRTC interface
|
| 104 |
├── config/ # Centralized YAML configuration
|
| 105 |
├── protos/ # gRPC protocol definitions
|
|
|
|
| 113 |
|
| 114 |
```bash
|
| 115 |
# Test gRPC connections
|
| 116 |
+
grpcurl -plaintext localhost:50051 speech.SpeechService/HealthCheck
|
| 117 |
|
| 118 |
# Run integration tests
|
| 119 |
cd tests/integration
|
|
|
|
| 129 |
- **[gRPC Integration Guide](docs/GRPC_INTEGRATION_GUIDE.md)** - Complete service integration details
|
| 130 |
- **[Context Window Analysis](docs/CONTEXT_WINDOW_ANALYSIS.md)** - Streaming TTS research
|
| 131 |
|
| 132 |
+
## 🐛 Troubleshooting & Solutions
|
| 133 |
+
|
| 134 |
+
### Ultravox Audio Processing Issues
|
| 135 |
+
|
| 136 |
+
#### Problem: Model returning garbage responses ("???", "!!!", random characters)
|
| 137 |
+
**Root Cause**: Incorrect audio format being sent to vLLM
|
| 138 |
+
|
| 139 |
+
**Solution**:
|
| 140 |
+
```python
|
| 141 |
+
# ❌ WRONG - Sending raw array
|
| 142 |
+
vllm_input = {
|
| 143 |
+
"prompt": prompt,
|
| 144 |
+
"multi_modal_data": {
|
| 145 |
+
"audio": audio_array # This doesn't work!
|
| 146 |
+
}
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
# ✅ CORRECT - Send as tuple (audio, sample_rate)
|
| 150 |
+
audio_tuple = (audio_array, 16000) # Must be 16kHz
|
| 151 |
+
vllm_input = {
|
| 152 |
+
"prompt": formatted_prompt,
|
| 153 |
+
"multi_modal_data": {
|
| 154 |
+
"audio": [audio_tuple] # List of tuples!
|
| 155 |
+
}
|
| 156 |
+
}
|
| 157 |
+
```
|
| 158 |
+
|
| 159 |
+
#### Problem: Model not understanding audio content
|
| 160 |
+
**Root Cause**: Missing chat template and tokenizer formatting
|
| 161 |
+
|
| 162 |
+
**Solution**:
|
| 163 |
+
```python
|
| 164 |
+
# Import tokenizer for proper formatting
|
| 165 |
+
from transformers import AutoTokenizer
|
| 166 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 167 |
+
|
| 168 |
+
# Format messages with audio token
|
| 169 |
+
messages = [{"role": "user", "content": f"<|audio|>\n{prompt}"}]
|
| 170 |
+
|
| 171 |
+
# Apply chat template
|
| 172 |
+
formatted_prompt = tokenizer.apply_chat_template(
|
| 173 |
+
messages,
|
| 174 |
+
tokenize=False,
|
| 175 |
+
add_generation_prompt=True
|
| 176 |
+
)
|
| 177 |
+
```
|
| 178 |
+
|
| 179 |
+
#### Optimal vLLM Configuration
|
| 180 |
+
```python
|
| 181 |
+
# Best parameters for Ultravox v0.5
|
| 182 |
+
sampling_params = SamplingParams(
|
| 183 |
+
temperature=0.2, # Low temperature for accurate responses
|
| 184 |
+
max_tokens=64 # Sufficient for complete answers
|
| 185 |
+
)
|
| 186 |
+
|
| 187 |
+
# vLLM initialization
|
| 188 |
+
llm = LLM(
|
| 189 |
+
model="fixie-ai/ultravox-v0_5-llama-3_2-1b",
|
| 190 |
+
trust_remote_code=True,
|
| 191 |
+
enforce_eager=True,
|
| 192 |
+
max_model_len=4096,
|
| 193 |
+
gpu_memory_utilization=0.3
|
| 194 |
+
)
|
| 195 |
+
```
|
| 196 |
+
|
| 197 |
+
#### Audio Format Requirements
|
| 198 |
+
- **Sample Rate**: Must be 16kHz
|
| 199 |
+
- **Format**: Float32 normalized between -1 and 1
|
| 200 |
+
- **Recommended Library**: Use `librosa` for loading audio
|
| 201 |
+
```python
|
| 202 |
+
import librosa
|
| 203 |
+
# Librosa automatically normalizes to [-1, 1]
|
| 204 |
+
audio, sr = librosa.load(audio_file, sr=16000)
|
| 205 |
+
```
|
| 206 |
+
|
| 207 |
+
### GPU Memory Issues
|
| 208 |
+
|
| 209 |
+
#### Problem: "No available memory for the cache blocks"
|
| 210 |
+
**Solution**: Run the cleanup script
|
| 211 |
+
```bash
|
| 212 |
+
bash /workspace/ultravox-pipeline/scripts/cleanup_gpu.sh
|
| 213 |
+
```
|
| 214 |
+
|
| 215 |
## 🤝 Contributing
|
| 216 |
|
| 217 |
Focus areas:
|
manage_services.sh
ADDED
|
@@ -0,0 +1,282 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
|
| 3 |
+
# Script mestre para gerenciar todos os serviços do Ultravox Pipeline
|
| 4 |
+
# Inclui limpeza de processos órfãos e verificação de recursos
|
| 5 |
+
|
| 6 |
+
# Cores para output
|
| 7 |
+
RED='\033[0;31m'
|
| 8 |
+
GREEN='\033[0;32m'
|
| 9 |
+
YELLOW='\033[1;33m'
|
| 10 |
+
BLUE='\033[0;34m'
|
| 11 |
+
NC='\033[0m' # No Color
|
| 12 |
+
|
| 13 |
+
# Diretórios
|
| 14 |
+
ULTRAVOX_DIR="/workspace/ultravox-pipeline/ultravox"
|
| 15 |
+
WEBRTC_DIR="/workspace/ultravox-pipeline/services/webrtc_gateway"
|
| 16 |
+
TTS_DIR="/workspace/tts-service-kokoro"
|
| 17 |
+
|
| 18 |
+
# Função para imprimir com cor
|
| 19 |
+
print_colored() {
|
| 20 |
+
echo -e "${2}${1}${NC}"
|
| 21 |
+
}
|
| 22 |
+
|
| 23 |
+
# Função para verificar status da GPU
|
| 24 |
+
check_gpu() {
|
| 25 |
+
print_colored "📊 Status da GPU:" "$BLUE"
|
| 26 |
+
GPU_INFO=$(nvidia-smi --query-gpu=memory.used,memory.free,memory.total --format=csv,noheader,nounits 2>/dev/null | head -1)
|
| 27 |
+
if [ -n "$GPU_INFO" ]; then
|
| 28 |
+
IFS=',' read -r USED FREE TOTAL <<< "$GPU_INFO"
|
| 29 |
+
echo " Usado: ${USED}MB | Livre: ${FREE}MB | Total: ${TOTAL}MB"
|
| 30 |
+
|
| 31 |
+
if [ "$FREE" -lt "20000" ]; then
|
| 32 |
+
print_colored " ⚠️ AVISO: Menos de 20GB livres!" "$YELLOW"
|
| 33 |
+
return 1
|
| 34 |
+
fi
|
| 35 |
+
else
|
| 36 |
+
print_colored " ❌ Não foi possível verificar GPU" "$RED"
|
| 37 |
+
return 1
|
| 38 |
+
fi
|
| 39 |
+
return 0
|
| 40 |
+
}
|
| 41 |
+
|
| 42 |
+
# Função para limpar processos órfãos
|
| 43 |
+
cleanup_orphans() {
|
| 44 |
+
print_colored "🧹 Limpando processos órfãos..." "$YELLOW"
|
| 45 |
+
|
| 46 |
+
# Limpar vLLM e EngineCore
|
| 47 |
+
pkill -f "VLLM::EngineCore" 2>/dev/null
|
| 48 |
+
pkill -f "vllm.*engine" 2>/dev/null
|
| 49 |
+
pkill -f "multiprocessing.resource_tracker" 2>/dev/null
|
| 50 |
+
|
| 51 |
+
sleep 2
|
| 52 |
+
|
| 53 |
+
# Verificar se limpou
|
| 54 |
+
REMAINING=$(ps aux | grep -E "vllm|EngineCore" | grep -v grep | wc -l)
|
| 55 |
+
if [ "$REMAINING" -eq "0" ]; then
|
| 56 |
+
print_colored " ✅ Processos órfãos limpos" "$GREEN"
|
| 57 |
+
else
|
| 58 |
+
print_colored " ⚠️ Ainda existem $REMAINING processos órfãos" "$YELLOW"
|
| 59 |
+
pkill -9 -f "vllm" 2>/dev/null
|
| 60 |
+
pkill -9 -f "EngineCore" 2>/dev/null
|
| 61 |
+
fi
|
| 62 |
+
}
|
| 63 |
+
|
| 64 |
+
# Função para iniciar Ultravox
|
| 65 |
+
start_ultravox() {
|
| 66 |
+
print_colored "\n🚀 Iniciando Ultravox..." "$BLUE"
|
| 67 |
+
|
| 68 |
+
# Limpar antes de iniciar
|
| 69 |
+
cleanup_orphans
|
| 70 |
+
|
| 71 |
+
# Verificar GPU
|
| 72 |
+
if ! check_gpu; then
|
| 73 |
+
print_colored " Tentando liberar GPU..." "$YELLOW"
|
| 74 |
+
cleanup_orphans
|
| 75 |
+
sleep 3
|
| 76 |
+
check_gpu
|
| 77 |
+
fi
|
| 78 |
+
|
| 79 |
+
# Iniciar servidor
|
| 80 |
+
cd "$ULTRAVOX_DIR"
|
| 81 |
+
if [ -f "start_ultravox.sh" ]; then
|
| 82 |
+
nohup bash start_ultravox.sh > ultravox.log 2>&1 &
|
| 83 |
+
print_colored " ✅ Ultravox iniciado (PID: $!)" "$GREEN"
|
| 84 |
+
echo $! > ultravox.pid
|
| 85 |
+
else
|
| 86 |
+
print_colored " ❌ Script start_ultravox.sh não encontrado" "$RED"
|
| 87 |
+
fi
|
| 88 |
+
}
|
| 89 |
+
|
| 90 |
+
# Função para parar Ultravox
|
| 91 |
+
stop_ultravox() {
|
| 92 |
+
print_colored "\n🛑 Parando Ultravox..." "$YELLOW"
|
| 93 |
+
|
| 94 |
+
cd "$ULTRAVOX_DIR"
|
| 95 |
+
if [ -f "stop_ultravox.sh" ]; then
|
| 96 |
+
bash stop_ultravox.sh
|
| 97 |
+
else
|
| 98 |
+
pkill -f "python.*server.py" 2>/dev/null
|
| 99 |
+
cleanup_orphans
|
| 100 |
+
fi
|
| 101 |
+
|
| 102 |
+
if [ -f "ultravox.pid" ]; then
|
| 103 |
+
kill -9 $(cat ultravox.pid) 2>/dev/null
|
| 104 |
+
rm ultravox.pid
|
| 105 |
+
fi
|
| 106 |
+
}
|
| 107 |
+
|
| 108 |
+
# Função para iniciar WebRTC Gateway
|
| 109 |
+
start_webrtc() {
|
| 110 |
+
print_colored "\n🌐 Iniciando WebRTC Gateway..." "$BLUE"
|
| 111 |
+
|
| 112 |
+
cd "$WEBRTC_DIR"
|
| 113 |
+
nohup npm start > webrtc.log 2>&1 &
|
| 114 |
+
print_colored " ✅ WebRTC Gateway iniciado (PID: $!)" "$GREEN"
|
| 115 |
+
echo $! > webrtc.pid
|
| 116 |
+
}
|
| 117 |
+
|
| 118 |
+
# Função para parar WebRTC Gateway
|
| 119 |
+
stop_webrtc() {
|
| 120 |
+
print_colored "\n🛑 Parando WebRTC Gateway..." "$YELLOW"
|
| 121 |
+
|
| 122 |
+
pkill -f "node.*ultravox-chat-server" 2>/dev/null
|
| 123 |
+
|
| 124 |
+
cd "$WEBRTC_DIR"
|
| 125 |
+
if [ -f "webrtc.pid" ]; then
|
| 126 |
+
kill -9 $(cat webrtc.pid) 2>/dev/null
|
| 127 |
+
rm webrtc.pid
|
| 128 |
+
fi
|
| 129 |
+
}
|
| 130 |
+
|
| 131 |
+
# Função para iniciar TTS
|
| 132 |
+
start_tts() {
|
| 133 |
+
print_colored "\n🔊 Iniciando TTS Service..." "$BLUE"
|
| 134 |
+
|
| 135 |
+
cd "$TTS_DIR"
|
| 136 |
+
|
| 137 |
+
# Verificar se venv existe, senão criar
|
| 138 |
+
if [ ! -d "venv" ]; then
|
| 139 |
+
print_colored " Criando ambiente virtual..." "$YELLOW"
|
| 140 |
+
python3 -m venv venv
|
| 141 |
+
fi
|
| 142 |
+
|
| 143 |
+
source venv/bin/activate 2>/dev/null
|
| 144 |
+
nohup python3 server.py > tts.log 2>&1 &
|
| 145 |
+
print_colored " ✅ TTS Service iniciado (PID: $!)" "$GREEN"
|
| 146 |
+
echo $! > tts.pid
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
# Função para parar TTS
|
| 150 |
+
stop_tts() {
|
| 151 |
+
print_colored "\n🛑 Parando TTS Service..." "$YELLOW"
|
| 152 |
+
|
| 153 |
+
pkill -f "tts.*server.py" 2>/dev/null
|
| 154 |
+
|
| 155 |
+
cd "$TTS_DIR"
|
| 156 |
+
if [ -f "tts.pid" ]; then
|
| 157 |
+
kill -9 $(cat tts.pid) 2>/dev/null
|
| 158 |
+
rm tts.pid
|
| 159 |
+
fi
|
| 160 |
+
}
|
| 161 |
+
|
| 162 |
+
# Função para verificar status dos serviços
|
| 163 |
+
check_status() {
|
| 164 |
+
print_colored "\n📊 Status dos Serviços:" "$BLUE"
|
| 165 |
+
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
| 166 |
+
|
| 167 |
+
# Ultravox
|
| 168 |
+
if lsof -i :50051 >/dev/null 2>&1; then
|
| 169 |
+
print_colored "✅ Ultravox: RODANDO (porta 50051)" "$GREEN"
|
| 170 |
+
else
|
| 171 |
+
print_colored "❌ Ultravox: PARADO" "$RED"
|
| 172 |
+
fi
|
| 173 |
+
|
| 174 |
+
# WebRTC
|
| 175 |
+
if lsof -i :8082 >/dev/null 2>&1; then
|
| 176 |
+
print_colored "✅ WebRTC Gateway: RODANDO (porta 8082)" "$GREEN"
|
| 177 |
+
else
|
| 178 |
+
print_colored "❌ WebRTC Gateway: PARADO" "$RED"
|
| 179 |
+
fi
|
| 180 |
+
|
| 181 |
+
# TTS
|
| 182 |
+
if lsof -i :50054 >/dev/null 2>&1; then
|
| 183 |
+
print_colored "✅ TTS Service: RODANDO (porta 50054)" "$GREEN"
|
| 184 |
+
else
|
| 185 |
+
print_colored "❌ TTS Service: PARADO" "$RED"
|
| 186 |
+
fi
|
| 187 |
+
|
| 188 |
+
echo ""
|
| 189 |
+
check_gpu
|
| 190 |
+
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
| 191 |
+
}
|
| 192 |
+
|
| 193 |
+
# Menu principal
|
| 194 |
+
case "$1" in
|
| 195 |
+
start)
|
| 196 |
+
print_colored "🚀 Iniciando todos os serviços..." "$BLUE"
|
| 197 |
+
cleanup_orphans
|
| 198 |
+
start_ultravox
|
| 199 |
+
sleep 10 # Aguardar Ultravox carregar
|
| 200 |
+
start_webrtc
|
| 201 |
+
start_tts
|
| 202 |
+
check_status
|
| 203 |
+
;;
|
| 204 |
+
|
| 205 |
+
stop)
|
| 206 |
+
print_colored "🛑 Parando todos os serviços..." "$YELLOW"
|
| 207 |
+
stop_webrtc
|
| 208 |
+
stop_tts
|
| 209 |
+
stop_ultravox
|
| 210 |
+
cleanup_orphans
|
| 211 |
+
check_status
|
| 212 |
+
;;
|
| 213 |
+
|
| 214 |
+
restart)
|
| 215 |
+
print_colored "🔄 Reiniciando todos os serviços..." "$BLUE"
|
| 216 |
+
$0 stop
|
| 217 |
+
sleep 5
|
| 218 |
+
$0 start
|
| 219 |
+
;;
|
| 220 |
+
|
| 221 |
+
status)
|
| 222 |
+
check_status
|
| 223 |
+
;;
|
| 224 |
+
|
| 225 |
+
cleanup)
|
| 226 |
+
cleanup_orphans
|
| 227 |
+
check_gpu
|
| 228 |
+
;;
|
| 229 |
+
|
| 230 |
+
ultravox-start)
|
| 231 |
+
start_ultravox
|
| 232 |
+
;;
|
| 233 |
+
|
| 234 |
+
ultravox-stop)
|
| 235 |
+
stop_ultravox
|
| 236 |
+
;;
|
| 237 |
+
|
| 238 |
+
ultravox-restart)
|
| 239 |
+
stop_ultravox
|
| 240 |
+
sleep 3
|
| 241 |
+
start_ultravox
|
| 242 |
+
;;
|
| 243 |
+
|
| 244 |
+
webrtc-start)
|
| 245 |
+
start_webrtc
|
| 246 |
+
;;
|
| 247 |
+
|
| 248 |
+
webrtc-stop)
|
| 249 |
+
stop_webrtc
|
| 250 |
+
;;
|
| 251 |
+
|
| 252 |
+
tts-start)
|
| 253 |
+
start_tts
|
| 254 |
+
;;
|
| 255 |
+
|
| 256 |
+
tts-stop)
|
| 257 |
+
stop_tts
|
| 258 |
+
;;
|
| 259 |
+
|
| 260 |
+
*)
|
| 261 |
+
echo "Uso: $0 {start|stop|restart|status|cleanup}"
|
| 262 |
+
echo ""
|
| 263 |
+
echo "Comandos disponíveis:"
|
| 264 |
+
echo " start - Inicia todos os serviços"
|
| 265 |
+
echo " stop - Para todos os serviços"
|
| 266 |
+
echo " restart - Reinicia todos os serviços"
|
| 267 |
+
echo " status - Verifica status dos serviços"
|
| 268 |
+
echo " cleanup - Limpa processos órfãos"
|
| 269 |
+
echo ""
|
| 270 |
+
echo "Comandos específicos:"
|
| 271 |
+
echo " ultravox-start - Inicia apenas Ultravox"
|
| 272 |
+
echo " ultravox-stop - Para apenas Ultravox"
|
| 273 |
+
echo " ultravox-restart- Reinicia apenas Ultravox"
|
| 274 |
+
echo " webrtc-start - Inicia apenas WebRTC Gateway"
|
| 275 |
+
echo " webrtc-stop - Para apenas WebRTC Gateway"
|
| 276 |
+
echo " tts-start - Inicia apenas TTS Service"
|
| 277 |
+
echo " tts-stop - Para apenas TTS Service"
|
| 278 |
+
exit 1
|
| 279 |
+
;;
|
| 280 |
+
esac
|
| 281 |
+
|
| 282 |
+
exit 0
|
services/webrtc_gateway/conversation-memory.js
ADDED
|
@@ -0,0 +1,290 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/**
|
| 2 |
+
* Sistema de Memória em Processo para Conversações
|
| 3 |
+
* Mantém contexto de conversas com limite de mensagens
|
| 4 |
+
*/
|
| 5 |
+
|
| 6 |
+
const crypto = require('crypto');
|
| 7 |
+
|
| 8 |
+
class ConversationMemory {
|
| 9 |
+
constructor() {
|
| 10 |
+
// Armazena conversações por ID
|
| 11 |
+
this.conversations = new Map();
|
| 12 |
+
|
| 13 |
+
// Configurações
|
| 14 |
+
this.config = {
|
| 15 |
+
maxMessagesPerConversation: 10, // Máximo de mensagens por conversa
|
| 16 |
+
maxConversations: 100, // Máximo de conversas em memória
|
| 17 |
+
ttlMinutes: 60, // Tempo de vida em minutos
|
| 18 |
+
cleanupIntervalMinutes: 10 // Intervalo de limpeza
|
| 19 |
+
};
|
| 20 |
+
|
| 21 |
+
// Estatísticas
|
| 22 |
+
this.stats = {
|
| 23 |
+
totalMessages: 0,
|
| 24 |
+
totalConversations: 0,
|
| 25 |
+
activeConversations: 0
|
| 26 |
+
};
|
| 27 |
+
|
| 28 |
+
// Iniciar limpeza automática
|
| 29 |
+
this.startCleanup();
|
| 30 |
+
}
|
| 31 |
+
|
| 32 |
+
/**
|
| 33 |
+
* Gera ID único para conversação
|
| 34 |
+
*/
|
| 35 |
+
generateConversationId() {
|
| 36 |
+
return `conv_${Date.now()}_${crypto.randomBytes(8).toString('hex')}`;
|
| 37 |
+
}
|
| 38 |
+
|
| 39 |
+
/**
|
| 40 |
+
* Cria nova conversação
|
| 41 |
+
*/
|
| 42 |
+
createConversation(conversationId = null, metadata = {}) {
|
| 43 |
+
const id = conversationId || this.generateConversationId();
|
| 44 |
+
|
| 45 |
+
// Verificar limite de conversações
|
| 46 |
+
if (this.conversations.size >= this.config.maxConversations) {
|
| 47 |
+
this.removeOldestConversation();
|
| 48 |
+
}
|
| 49 |
+
|
| 50 |
+
const conversation = {
|
| 51 |
+
id,
|
| 52 |
+
createdAt: Date.now(),
|
| 53 |
+
lastActivity: Date.now(),
|
| 54 |
+
messages: [],
|
| 55 |
+
metadata: {
|
| 56 |
+
...metadata,
|
| 57 |
+
messageCount: 0,
|
| 58 |
+
userAgent: metadata.userAgent || 'unknown'
|
| 59 |
+
}
|
| 60 |
+
};
|
| 61 |
+
|
| 62 |
+
this.conversations.set(id, conversation);
|
| 63 |
+
this.stats.totalConversations++;
|
| 64 |
+
this.stats.activeConversations = this.conversations.size;
|
| 65 |
+
|
| 66 |
+
console.log(`📝 Nova conversação criada: ${id}`);
|
| 67 |
+
return conversation;
|
| 68 |
+
}
|
| 69 |
+
|
| 70 |
+
/**
|
| 71 |
+
* Recupera conversação existente
|
| 72 |
+
*/
|
| 73 |
+
getConversation(conversationId) {
|
| 74 |
+
const conversation = this.conversations.get(conversationId);
|
| 75 |
+
if (conversation) {
|
| 76 |
+
conversation.lastActivity = Date.now();
|
| 77 |
+
}
|
| 78 |
+
return conversation;
|
| 79 |
+
}
|
| 80 |
+
|
| 81 |
+
/**
|
| 82 |
+
* Adiciona mensagem à conversação
|
| 83 |
+
*/
|
| 84 |
+
addMessage(conversationId, message) {
|
| 85 |
+
let conversation = this.getConversation(conversationId);
|
| 86 |
+
|
| 87 |
+
// Criar conversação se não existir
|
| 88 |
+
if (!conversation) {
|
| 89 |
+
conversation = this.createConversation(conversationId);
|
| 90 |
+
}
|
| 91 |
+
|
| 92 |
+
// Estrutura da mensagem
|
| 93 |
+
const msg = {
|
| 94 |
+
id: `msg_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`,
|
| 95 |
+
timestamp: Date.now(),
|
| 96 |
+
role: message.role || 'user',
|
| 97 |
+
content: message.content || '',
|
| 98 |
+
metadata: {
|
| 99 |
+
audioSize: message.audioSize || 0,
|
| 100 |
+
latency: message.latency || 0,
|
| 101 |
+
...message.metadata
|
| 102 |
+
}
|
| 103 |
+
};
|
| 104 |
+
|
| 105 |
+
// Adicionar mensagem
|
| 106 |
+
conversation.messages.push(msg);
|
| 107 |
+
conversation.metadata.messageCount++;
|
| 108 |
+
conversation.lastActivity = Date.now();
|
| 109 |
+
|
| 110 |
+
// Limitar número de mensagens
|
| 111 |
+
if (conversation.messages.length > this.config.maxMessagesPerConversation) {
|
| 112 |
+
conversation.messages.shift(); // Remove a mais antiga
|
| 113 |
+
}
|
| 114 |
+
|
| 115 |
+
this.stats.totalMessages++;
|
| 116 |
+
|
| 117 |
+
console.log(`💬 Mensagem adicionada: ${conversationId} - ${msg.role}: ${msg.content.substring(0, 50)}...`);
|
| 118 |
+
return msg;
|
| 119 |
+
}
|
| 120 |
+
|
| 121 |
+
/**
|
| 122 |
+
* Constrói contexto para Ultravox
|
| 123 |
+
*/
|
| 124 |
+
buildContext(conversationId, maxMessages = 5) {
|
| 125 |
+
const conversation = this.getConversation(conversationId);
|
| 126 |
+
if (!conversation || conversation.messages.length === 0) {
|
| 127 |
+
return '';
|
| 128 |
+
}
|
| 129 |
+
|
| 130 |
+
// Pegar últimas N mensagens
|
| 131 |
+
const recentMessages = conversation.messages.slice(-maxMessages);
|
| 132 |
+
|
| 133 |
+
// Formatar contexto de forma simples
|
| 134 |
+
const context = recentMessages
|
| 135 |
+
.map(msg => `${msg.role === 'user' ? 'Usuário' : 'Assistente'}: ${msg.content}`)
|
| 136 |
+
.join('\n');
|
| 137 |
+
|
| 138 |
+
return context;
|
| 139 |
+
}
|
| 140 |
+
|
| 141 |
+
/**
|
| 142 |
+
* Recupera histórico de mensagens
|
| 143 |
+
*/
|
| 144 |
+
getMessages(conversationId, limit = 10, offset = 0) {
|
| 145 |
+
const conversation = this.getConversation(conversationId);
|
| 146 |
+
if (!conversation) {
|
| 147 |
+
return [];
|
| 148 |
+
}
|
| 149 |
+
|
| 150 |
+
const start = Math.max(0, conversation.messages.length - offset - limit);
|
| 151 |
+
const end = conversation.messages.length - offset;
|
| 152 |
+
|
| 153 |
+
return conversation.messages.slice(start, end);
|
| 154 |
+
}
|
| 155 |
+
|
| 156 |
+
/**
|
| 157 |
+
* Remove conversação mais antiga
|
| 158 |
+
*/
|
| 159 |
+
removeOldestConversation() {
|
| 160 |
+
let oldest = null;
|
| 161 |
+
let oldestTime = Date.now();
|
| 162 |
+
|
| 163 |
+
for (const [id, conv] of this.conversations) {
|
| 164 |
+
if (conv.lastActivity < oldestTime) {
|
| 165 |
+
oldest = id;
|
| 166 |
+
oldestTime = conv.lastActivity;
|
| 167 |
+
}
|
| 168 |
+
}
|
| 169 |
+
|
| 170 |
+
if (oldest) {
|
| 171 |
+
this.conversations.delete(oldest);
|
| 172 |
+
console.log(`🗑️ Conversação removida (limite atingido): ${oldest}`);
|
| 173 |
+
}
|
| 174 |
+
}
|
| 175 |
+
|
| 176 |
+
/**
|
| 177 |
+
* Limpa conversações expiradas
|
| 178 |
+
*/
|
| 179 |
+
cleanupExpired() {
|
| 180 |
+
const now = Date.now();
|
| 181 |
+
const ttlMs = this.config.ttlMinutes * 60 * 1000;
|
| 182 |
+
let removed = 0;
|
| 183 |
+
|
| 184 |
+
for (const [id, conv] of this.conversations) {
|
| 185 |
+
if (now - conv.lastActivity > ttlMs) {
|
| 186 |
+
this.conversations.delete(id);
|
| 187 |
+
removed++;
|
| 188 |
+
}
|
| 189 |
+
}
|
| 190 |
+
|
| 191 |
+
if (removed > 0) {
|
| 192 |
+
console.log(`🧹 ${removed} conversações expiradas removidas`);
|
| 193 |
+
this.stats.activeConversations = this.conversations.size;
|
| 194 |
+
}
|
| 195 |
+
}
|
| 196 |
+
|
| 197 |
+
/**
|
| 198 |
+
* Inicia limpeza automática
|
| 199 |
+
*/
|
| 200 |
+
startCleanup() {
|
| 201 |
+
setInterval(() => {
|
| 202 |
+
this.cleanupExpired();
|
| 203 |
+
}, this.config.cleanupIntervalMinutes * 60 * 1000);
|
| 204 |
+
}
|
| 205 |
+
|
| 206 |
+
/**
|
| 207 |
+
* Retorna estatísticas
|
| 208 |
+
*/
|
| 209 |
+
getStats() {
|
| 210 |
+
return {
|
| 211 |
+
...this.stats,
|
| 212 |
+
conversationsInMemory: this.conversations.size,
|
| 213 |
+
memoryUsage: this.getMemoryUsage()
|
| 214 |
+
};
|
| 215 |
+
}
|
| 216 |
+
|
| 217 |
+
/**
|
| 218 |
+
* Estima uso de memória
|
| 219 |
+
*/
|
| 220 |
+
getMemoryUsage() {
|
| 221 |
+
let totalSize = 0;
|
| 222 |
+
|
| 223 |
+
for (const conv of this.conversations.values()) {
|
| 224 |
+
// Estimar tamanho aproximado
|
| 225 |
+
totalSize += JSON.stringify(conv).length;
|
| 226 |
+
}
|
| 227 |
+
|
| 228 |
+
return {
|
| 229 |
+
bytes: totalSize,
|
| 230 |
+
kb: (totalSize / 1024).toFixed(2),
|
| 231 |
+
mb: (totalSize / 1024 / 1024).toFixed(2)
|
| 232 |
+
};
|
| 233 |
+
}
|
| 234 |
+
|
| 235 |
+
/**
|
| 236 |
+
* Lista conversações ativas
|
| 237 |
+
*/
|
| 238 |
+
listConversations() {
|
| 239 |
+
const list = [];
|
| 240 |
+
|
| 241 |
+
for (const [id, conv] of this.conversations) {
|
| 242 |
+
list.push({
|
| 243 |
+
id: conv.id,
|
| 244 |
+
createdAt: new Date(conv.createdAt).toISOString(),
|
| 245 |
+
lastActivity: new Date(conv.lastActivity).toISOString(),
|
| 246 |
+
messageCount: conv.metadata.messageCount,
|
| 247 |
+
metadata: conv.metadata
|
| 248 |
+
});
|
| 249 |
+
}
|
| 250 |
+
|
| 251 |
+
return list.sort((a, b) => b.lastActivity - a.lastActivity);
|
| 252 |
+
}
|
| 253 |
+
|
| 254 |
+
/**
|
| 255 |
+
* Exporta conversação (para backup futuro)
|
| 256 |
+
*/
|
| 257 |
+
exportConversation(conversationId) {
|
| 258 |
+
const conversation = this.getConversation(conversationId);
|
| 259 |
+
if (!conversation) {
|
| 260 |
+
return null;
|
| 261 |
+
}
|
| 262 |
+
|
| 263 |
+
return {
|
| 264 |
+
...conversation,
|
| 265 |
+
exported: new Date().toISOString(),
|
| 266 |
+
version: '1.0'
|
| 267 |
+
};
|
| 268 |
+
}
|
| 269 |
+
|
| 270 |
+
/**
|
| 271 |
+
* Importa conversação (de backup)
|
| 272 |
+
*/
|
| 273 |
+
importConversation(data) {
|
| 274 |
+
if (!data || !data.id) {
|
| 275 |
+
throw new Error('Dados de conversação inválidos');
|
| 276 |
+
}
|
| 277 |
+
|
| 278 |
+
this.conversations.set(data.id, {
|
| 279 |
+
...data,
|
| 280 |
+
lastActivity: Date.now() // Atualizar última atividade
|
| 281 |
+
});
|
| 282 |
+
|
| 283 |
+
this.stats.activeConversations = this.conversations.size;
|
| 284 |
+
console.log(`📥 Conversação importada: ${data.id}`);
|
| 285 |
+
|
| 286 |
+
return data.id;
|
| 287 |
+
}
|
| 288 |
+
}
|
| 289 |
+
|
| 290 |
+
module.exports = ConversationMemory;
|
services/webrtc_gateway/favicon.ico
ADDED
|
|
services/webrtc_gateway/opus-decoder.js
ADDED
|
@@ -0,0 +1,150 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/**
|
| 2 |
+
* Opus Decoder para navegador
|
| 3 |
+
* Usa Web Audio API para decodificar Opus
|
| 4 |
+
*/
|
| 5 |
+
|
| 6 |
+
class OpusDecoder {
|
| 7 |
+
constructor() {
|
| 8 |
+
this.audioContext = null;
|
| 9 |
+
this.isInitialized = false;
|
| 10 |
+
}
|
| 11 |
+
|
| 12 |
+
async init(sampleRate = 24000) {
|
| 13 |
+
if (this.isInitialized) return;
|
| 14 |
+
|
| 15 |
+
try {
|
| 16 |
+
// Criar AudioContext com taxa específica
|
| 17 |
+
this.audioContext = new (window.AudioContext || window.webkitAudioContext)({
|
| 18 |
+
sampleRate: sampleRate
|
| 19 |
+
});
|
| 20 |
+
|
| 21 |
+
this.isInitialized = true;
|
| 22 |
+
console.log(`✅ OpusDecoder inicializado @ ${sampleRate}Hz`);
|
| 23 |
+
} catch (error) {
|
| 24 |
+
console.error('❌ Erro ao inicializar OpusDecoder:', error);
|
| 25 |
+
throw error;
|
| 26 |
+
}
|
| 27 |
+
}
|
| 28 |
+
|
| 29 |
+
/**
|
| 30 |
+
* Decodifica Opus para PCM usando Web Audio API
|
| 31 |
+
* @param {ArrayBuffer} opusData - Dados Opus comprimidos
|
| 32 |
+
* @returns {Promise<ArrayBuffer>} - PCM decodificado
|
| 33 |
+
*/
|
| 34 |
+
async decode(opusData) {
|
| 35 |
+
if (!this.isInitialized) {
|
| 36 |
+
await this.init();
|
| 37 |
+
}
|
| 38 |
+
|
| 39 |
+
try {
|
| 40 |
+
// Web Audio API pode decodificar Opus nativamente se embrulhado em container
|
| 41 |
+
// Para Opus puro, precisamos criar um container WebM mínimo
|
| 42 |
+
const webmContainer = this.wrapOpusInWebM(opusData);
|
| 43 |
+
|
| 44 |
+
// Decodificar usando Web Audio API
|
| 45 |
+
const audioBuffer = await this.audioContext.decodeAudioData(webmContainer);
|
| 46 |
+
|
| 47 |
+
// Converter AudioBuffer para PCM Int16
|
| 48 |
+
const pcmData = this.audioBufferToPCM(audioBuffer);
|
| 49 |
+
|
| 50 |
+
console.log(`🔊 Opus decodificado: ${opusData.byteLength} bytes → ${pcmData.byteLength} bytes PCM`);
|
| 51 |
+
|
| 52 |
+
return pcmData;
|
| 53 |
+
} catch (error) {
|
| 54 |
+
console.error('❌ Erro ao decodificar Opus:', error);
|
| 55 |
+
// Fallback: retornar dados originais se não conseguir decodificar
|
| 56 |
+
return opusData;
|
| 57 |
+
}
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
/**
|
| 61 |
+
* Envolve dados Opus em container WebM mínimo
|
| 62 |
+
* @param {ArrayBuffer} opusData - Dados Opus puros
|
| 63 |
+
* @returns {ArrayBuffer} - WebM container com Opus
|
| 64 |
+
*/
|
| 65 |
+
wrapOpusInWebM(opusData) {
|
| 66 |
+
// Implementação simplificada - na prática, usaria uma biblioteca
|
| 67 |
+
// Por enquanto, assumimos que o navegador pode processar Opus diretamente
|
| 68 |
+
// se fornecido com headers apropriados
|
| 69 |
+
|
| 70 |
+
// Para implementação real, considerar usar:
|
| 71 |
+
// - libopus.js (porta WASM do libopus)
|
| 72 |
+
// - opus-recorder (biblioteca JS para Opus)
|
| 73 |
+
|
| 74 |
+
return opusData; // Placeholder
|
| 75 |
+
}
|
| 76 |
+
|
| 77 |
+
/**
|
| 78 |
+
* Converte AudioBuffer para PCM Int16
|
| 79 |
+
* @param {AudioBuffer} audioBuffer
|
| 80 |
+
* @returns {ArrayBuffer} PCM Int16 data
|
| 81 |
+
*/
|
| 82 |
+
audioBufferToPCM(audioBuffer) {
|
| 83 |
+
const length = audioBuffer.length;
|
| 84 |
+
const pcmData = new Int16Array(length);
|
| 85 |
+
const channelData = audioBuffer.getChannelData(0); // Mono
|
| 86 |
+
|
| 87 |
+
// Converter Float32 para Int16
|
| 88 |
+
for (let i = 0; i < length; i++) {
|
| 89 |
+
const sample = Math.max(-1, Math.min(1, channelData[i]));
|
| 90 |
+
pcmData[i] = sample * 0x7FFF;
|
| 91 |
+
}
|
| 92 |
+
|
| 93 |
+
return pcmData.buffer;
|
| 94 |
+
}
|
| 95 |
+
}
|
| 96 |
+
|
| 97 |
+
/**
|
| 98 |
+
* Alternativa: Usar biblioteca opus-decoder (mais robusta)
|
| 99 |
+
* npm install opus-decoder
|
| 100 |
+
*/
|
| 101 |
+
class OpusDecoderWASM {
|
| 102 |
+
constructor() {
|
| 103 |
+
this.decoder = null;
|
| 104 |
+
this.ready = false;
|
| 105 |
+
}
|
| 106 |
+
|
| 107 |
+
async init(sampleRate = 24000, channels = 1) {
|
| 108 |
+
if (this.ready) return;
|
| 109 |
+
|
| 110 |
+
try {
|
| 111 |
+
// Carregar opus-decoder WASM se disponível
|
| 112 |
+
if (typeof OpusDecoderWebAssembly !== 'undefined') {
|
| 113 |
+
const { OpusDecoderWebAssembly } = await import('opus-decoder');
|
| 114 |
+
this.decoder = new OpusDecoderWebAssembly({
|
| 115 |
+
channels: channels,
|
| 116 |
+
sampleRate: sampleRate
|
| 117 |
+
});
|
| 118 |
+
await this.decoder.ready;
|
| 119 |
+
this.ready = true;
|
| 120 |
+
console.log('✅ OpusDecoderWASM pronto');
|
| 121 |
+
} else {
|
| 122 |
+
throw new Error('opus-decoder não disponível');
|
| 123 |
+
}
|
| 124 |
+
} catch (error) {
|
| 125 |
+
console.warn('⚠️ WASM decoder não disponível, usando fallback');
|
| 126 |
+
// Fallback para decoder básico
|
| 127 |
+
this.decoder = new OpusDecoder();
|
| 128 |
+
await this.decoder.init(sampleRate);
|
| 129 |
+
this.ready = true;
|
| 130 |
+
}
|
| 131 |
+
}
|
| 132 |
+
|
| 133 |
+
async decode(opusData) {
|
| 134 |
+
if (!this.ready) {
|
| 135 |
+
await this.init();
|
| 136 |
+
}
|
| 137 |
+
|
| 138 |
+
if (this.decoder.decode) {
|
| 139 |
+
// Usar WASM decoder se disponível
|
| 140 |
+
return await this.decoder.decode(opusData);
|
| 141 |
+
} else {
|
| 142 |
+
// Fallback
|
| 143 |
+
return opusData;
|
| 144 |
+
}
|
| 145 |
+
}
|
| 146 |
+
}
|
| 147 |
+
|
| 148 |
+
// Exportar para uso global
|
| 149 |
+
window.OpusDecoder = OpusDecoder;
|
| 150 |
+
window.OpusDecoderWASM = OpusDecoderWASM;
|
services/webrtc_gateway/package-lock.json
CHANGED
|
@@ -9,6 +9,7 @@
|
|
| 9 |
"version": "1.0.0",
|
| 10 |
"license": "ISC",
|
| 11 |
"dependencies": {
|
|
|
|
| 12 |
"@grpc/grpc-js": "^1.9.11",
|
| 13 |
"@grpc/proto-loader": "^0.7.10",
|
| 14 |
"express": "^5.1.0",
|
|
@@ -18,6 +19,40 @@
|
|
| 18 |
"nodemon": "^3.0.1"
|
| 19 |
}
|
| 20 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
"node_modules/@grpc/grpc-js": {
|
| 22 |
"version": "1.13.4",
|
| 23 |
"resolved": "https://registry.npmjs.org/@grpc/grpc-js/-/grpc-js-1.13.4.tgz",
|
|
@@ -132,6 +167,12 @@
|
|
| 132 |
"undici-types": "~7.10.0"
|
| 133 |
}
|
| 134 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
"node_modules/accepts": {
|
| 136 |
"version": "2.0.0",
|
| 137 |
"resolved": "https://registry.npmjs.org/accepts/-/accepts-2.0.0.tgz",
|
|
@@ -145,6 +186,18 @@
|
|
| 145 |
"node": ">= 0.6"
|
| 146 |
}
|
| 147 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 148 |
"node_modules/ansi-regex": {
|
| 149 |
"version": "5.0.1",
|
| 150 |
"resolved": "https://registry.npmjs.org/ansi-regex/-/ansi-regex-5.0.1.tgz",
|
|
@@ -183,11 +236,30 @@
|
|
| 183 |
"node": ">= 8"
|
| 184 |
}
|
| 185 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 186 |
"node_modules/balanced-match": {
|
| 187 |
"version": "1.0.2",
|
| 188 |
"resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-1.0.2.tgz",
|
| 189 |
"integrity": "sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw==",
|
| 190 |
-
"dev": true,
|
| 191 |
"license": "MIT"
|
| 192 |
},
|
| 193 |
"node_modules/binary-extensions": {
|
|
@@ -227,7 +299,6 @@
|
|
| 227 |
"version": "1.1.12",
|
| 228 |
"resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-1.1.12.tgz",
|
| 229 |
"integrity": "sha512-9T9UjW3r0UW5c1Q7GTwllptXwhvYmEzFhzMfZ9H7FQWt+uZePjZPjBP/W1ZEyZ1twGWom5/56TF4lPcqjnDHcg==",
|
| 230 |
-
"dev": true,
|
| 231 |
"license": "MIT",
|
| 232 |
"dependencies": {
|
| 233 |
"balanced-match": "^1.0.0",
|
|
@@ -310,6 +381,15 @@
|
|
| 310 |
"fsevents": "~2.3.2"
|
| 311 |
}
|
| 312 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 313 |
"node_modules/cliui": {
|
| 314 |
"version": "8.0.1",
|
| 315 |
"resolved": "https://registry.npmjs.org/cliui/-/cliui-8.0.1.tgz",
|
|
@@ -342,13 +422,27 @@
|
|
| 342 |
"integrity": "sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA==",
|
| 343 |
"license": "MIT"
|
| 344 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 345 |
"node_modules/concat-map": {
|
| 346 |
"version": "0.0.1",
|
| 347 |
"resolved": "https://registry.npmjs.org/concat-map/-/concat-map-0.0.1.tgz",
|
| 348 |
"integrity": "sha512-/Srv4dswyQNBfohGpz9o6Yb3Gz3SrUDqBH5rTuhGR7ahtlbYKnVxw2bCFMRljaA7EXHaXZ8wsHdodFvbkhKmqg==",
|
| 349 |
-
"dev": true,
|
| 350 |
"license": "MIT"
|
| 351 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 352 |
"node_modules/content-disposition": {
|
| 353 |
"version": "1.0.0",
|
| 354 |
"resolved": "https://registry.npmjs.org/content-disposition/-/content-disposition-1.0.0.tgz",
|
|
@@ -405,6 +499,12 @@
|
|
| 405 |
}
|
| 406 |
}
|
| 407 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 408 |
"node_modules/depd": {
|
| 409 |
"version": "2.0.0",
|
| 410 |
"resolved": "https://registry.npmjs.org/depd/-/depd-2.0.0.tgz",
|
|
@@ -414,6 +514,15 @@
|
|
| 414 |
"node": ">= 0.8"
|
| 415 |
}
|
| 416 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 417 |
"node_modules/dunder-proto": {
|
| 418 |
"version": "1.0.1",
|
| 419 |
"resolved": "https://registry.npmjs.org/dunder-proto/-/dunder-proto-1.0.1.tgz",
|
|
@@ -593,6 +702,36 @@
|
|
| 593 |
"node": ">= 0.8"
|
| 594 |
}
|
| 595 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 596 |
"node_modules/fsevents": {
|
| 597 |
"version": "2.3.3",
|
| 598 |
"resolved": "https://registry.npmjs.org/fsevents/-/fsevents-2.3.3.tgz",
|
|
@@ -617,6 +756,27 @@
|
|
| 617 |
"url": "https://github.com/sponsors/ljharb"
|
| 618 |
}
|
| 619 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 620 |
"node_modules/get-caller-file": {
|
| 621 |
"version": "2.0.5",
|
| 622 |
"resolved": "https://registry.npmjs.org/get-caller-file/-/get-caller-file-2.0.5.tgz",
|
|
@@ -663,6 +823,27 @@
|
|
| 663 |
"node": ">= 0.4"
|
| 664 |
}
|
| 665 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 666 |
"node_modules/glob-parent": {
|
| 667 |
"version": "5.1.2",
|
| 668 |
"resolved": "https://registry.npmjs.org/glob-parent/-/glob-parent-5.1.2.tgz",
|
|
@@ -710,6 +891,12 @@
|
|
| 710 |
"url": "https://github.com/sponsors/ljharb"
|
| 711 |
}
|
| 712 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 713 |
"node_modules/hasown": {
|
| 714 |
"version": "2.0.2",
|
| 715 |
"resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.2.tgz",
|
|
@@ -747,6 +934,19 @@
|
|
| 747 |
"node": ">= 0.8"
|
| 748 |
}
|
| 749 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 750 |
"node_modules/iconv-lite": {
|
| 751 |
"version": "0.6.3",
|
| 752 |
"resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.6.3.tgz",
|
|
@@ -766,6 +966,17 @@
|
|
| 766 |
"dev": true,
|
| 767 |
"license": "ISC"
|
| 768 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 769 |
"node_modules/inherits": {
|
| 770 |
"version": "2.0.4",
|
| 771 |
"resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz",
|
|
@@ -854,6 +1065,30 @@
|
|
| 854 |
"integrity": "sha512-mNAgZ1GmyNhD7AuqnTG3/VQ26o760+ZYBPKjPvugO8+nLbYfX6TVpJPseBvopbdY+qpZ/lKUnmEc1LeZYS3QAA==",
|
| 855 |
"license": "Apache-2.0"
|
| 856 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 857 |
"node_modules/math-intrinsics": {
|
| 858 |
"version": "1.1.0",
|
| 859 |
"resolved": "https://registry.npmjs.org/math-intrinsics/-/math-intrinsics-1.1.0.tgz",
|
|
@@ -909,7 +1144,6 @@
|
|
| 909 |
"version": "3.1.2",
|
| 910 |
"resolved": "https://registry.npmjs.org/minimatch/-/minimatch-3.1.2.tgz",
|
| 911 |
"integrity": "sha512-J7p63hRiAjw1NDEww1W7i37+ByIrOWO5XQQAzZ3VOcL0PNybwpfmV/N05zFAzwQ9USyEcX6t3UO+K5aqBQOIHw==",
|
| 912 |
-
"dev": true,
|
| 913 |
"license": "ISC",
|
| 914 |
"dependencies": {
|
| 915 |
"brace-expansion": "^1.1.7"
|
|
@@ -918,6 +1152,52 @@
|
|
| 918 |
"node": "*"
|
| 919 |
}
|
| 920 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 921 |
"node_modules/ms": {
|
| 922 |
"version": "2.1.3",
|
| 923 |
"resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
|
|
@@ -933,6 +1213,35 @@
|
|
| 933 |
"node": ">= 0.6"
|
| 934 |
}
|
| 935 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 936 |
"node_modules/nodemon": {
|
| 937 |
"version": "3.1.10",
|
| 938 |
"resolved": "https://registry.npmjs.org/nodemon/-/nodemon-3.1.10.tgz",
|
|
@@ -962,6 +1271,21 @@
|
|
| 962 |
"url": "https://opencollective.com/nodemon"
|
| 963 |
}
|
| 964 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 965 |
"node_modules/normalize-path": {
|
| 966 |
"version": "3.0.0",
|
| 967 |
"resolved": "https://registry.npmjs.org/normalize-path/-/normalize-path-3.0.0.tgz",
|
|
@@ -972,6 +1296,28 @@
|
|
| 972 |
"node": ">=0.10.0"
|
| 973 |
}
|
| 974 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 975 |
"node_modules/object-inspect": {
|
| 976 |
"version": "1.13.4",
|
| 977 |
"resolved": "https://registry.npmjs.org/object-inspect/-/object-inspect-1.13.4.tgz",
|
|
@@ -1014,6 +1360,15 @@
|
|
| 1014 |
"node": ">= 0.8"
|
| 1015 |
}
|
| 1016 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1017 |
"node_modules/path-to-regexp": {
|
| 1018 |
"version": "8.3.0",
|
| 1019 |
"resolved": "https://registry.npmjs.org/path-to-regexp/-/path-to-regexp-8.3.0.tgz",
|
|
@@ -1136,6 +1491,20 @@
|
|
| 1136 |
"url": "https://opencollective.com/express"
|
| 1137 |
}
|
| 1138 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1139 |
"node_modules/readdirp": {
|
| 1140 |
"version": "3.6.0",
|
| 1141 |
"resolved": "https://registry.npmjs.org/readdirp/-/readdirp-3.6.0.tgz",
|
|
@@ -1158,6 +1527,22 @@
|
|
| 1158 |
"node": ">=0.10.0"
|
| 1159 |
}
|
| 1160 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1161 |
"node_modules/router": {
|
| 1162 |
"version": "2.2.0",
|
| 1163 |
"resolved": "https://registry.npmjs.org/router/-/router-2.2.0.tgz",
|
|
@@ -1204,7 +1589,6 @@
|
|
| 1204 |
"version": "7.7.2",
|
| 1205 |
"resolved": "https://registry.npmjs.org/semver/-/semver-7.7.2.tgz",
|
| 1206 |
"integrity": "sha512-RF0Fw+rO5AMf9MAyaRXI4AV0Ulj5lMHqVxxdSgiVbixSCXoEmmX/jk0CuJw4+3SqroYO9VoUh+HcuJivvtJemA==",
|
| 1207 |
-
"dev": true,
|
| 1208 |
"license": "ISC",
|
| 1209 |
"bin": {
|
| 1210 |
"semver": "bin/semver.js"
|
|
@@ -1250,6 +1634,12 @@
|
|
| 1250 |
"node": ">= 18"
|
| 1251 |
}
|
| 1252 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1253 |
"node_modules/setprototypeof": {
|
| 1254 |
"version": "1.2.0",
|
| 1255 |
"resolved": "https://registry.npmjs.org/setprototypeof/-/setprototypeof-1.2.0.tgz",
|
|
@@ -1328,6 +1718,12 @@
|
|
| 1328 |
"url": "https://github.com/sponsors/ljharb"
|
| 1329 |
}
|
| 1330 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1331 |
"node_modules/simple-update-notifier": {
|
| 1332 |
"version": "2.0.0",
|
| 1333 |
"resolved": "https://registry.npmjs.org/simple-update-notifier/-/simple-update-notifier-2.0.0.tgz",
|
|
@@ -1350,6 +1746,15 @@
|
|
| 1350 |
"node": ">= 0.8"
|
| 1351 |
}
|
| 1352 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1353 |
"node_modules/string-width": {
|
| 1354 |
"version": "4.2.3",
|
| 1355 |
"resolved": "https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz",
|
|
@@ -1389,6 +1794,23 @@
|
|
| 1389 |
"node": ">=4"
|
| 1390 |
}
|
| 1391 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1392 |
"node_modules/to-regex-range": {
|
| 1393 |
"version": "5.0.1",
|
| 1394 |
"resolved": "https://registry.npmjs.org/to-regex-range/-/to-regex-range-5.0.1.tgz",
|
|
@@ -1421,6 +1843,12 @@
|
|
| 1421 |
"nodetouch": "bin/nodetouch.js"
|
| 1422 |
}
|
| 1423 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1424 |
"node_modules/type-is": {
|
| 1425 |
"version": "2.0.1",
|
| 1426 |
"resolved": "https://registry.npmjs.org/type-is/-/type-is-2.0.1.tgz",
|
|
@@ -1457,6 +1885,12 @@
|
|
| 1457 |
"node": ">= 0.8"
|
| 1458 |
}
|
| 1459 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1460 |
"node_modules/vary": {
|
| 1461 |
"version": "1.1.2",
|
| 1462 |
"resolved": "https://registry.npmjs.org/vary/-/vary-1.1.2.tgz",
|
|
@@ -1466,6 +1900,31 @@
|
|
| 1466 |
"node": ">= 0.8"
|
| 1467 |
}
|
| 1468 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1469 |
"node_modules/wrap-ansi": {
|
| 1470 |
"version": "7.0.0",
|
| 1471 |
"resolved": "https://registry.npmjs.org/wrap-ansi/-/wrap-ansi-7.0.0.tgz",
|
|
@@ -1519,6 +1978,12 @@
|
|
| 1519 |
"node": ">=10"
|
| 1520 |
}
|
| 1521 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1522 |
"node_modules/yargs": {
|
| 1523 |
"version": "17.7.2",
|
| 1524 |
"resolved": "https://registry.npmjs.org/yargs/-/yargs-17.7.2.tgz",
|
|
|
|
| 9 |
"version": "1.0.0",
|
| 10 |
"license": "ISC",
|
| 11 |
"dependencies": {
|
| 12 |
+
"@discordjs/opus": "^0.10.0",
|
| 13 |
"@grpc/grpc-js": "^1.9.11",
|
| 14 |
"@grpc/proto-loader": "^0.7.10",
|
| 15 |
"express": "^5.1.0",
|
|
|
|
| 19 |
"nodemon": "^3.0.1"
|
| 20 |
}
|
| 21 |
},
|
| 22 |
+
"node_modules/@discordjs/node-pre-gyp": {
|
| 23 |
+
"version": "0.4.5",
|
| 24 |
+
"resolved": "https://registry.npmjs.org/@discordjs/node-pre-gyp/-/node-pre-gyp-0.4.5.tgz",
|
| 25 |
+
"integrity": "sha512-YJOVVZ545x24mHzANfYoy0BJX5PDyeZlpiJjDkUBM/V/Ao7TFX9lcUvCN4nr0tbr5ubeaXxtEBILUrHtTphVeQ==",
|
| 26 |
+
"license": "BSD-3-Clause",
|
| 27 |
+
"dependencies": {
|
| 28 |
+
"detect-libc": "^2.0.0",
|
| 29 |
+
"https-proxy-agent": "^5.0.0",
|
| 30 |
+
"make-dir": "^3.1.0",
|
| 31 |
+
"node-fetch": "^2.6.7",
|
| 32 |
+
"nopt": "^5.0.0",
|
| 33 |
+
"npmlog": "^5.0.1",
|
| 34 |
+
"rimraf": "^3.0.2",
|
| 35 |
+
"semver": "^7.3.5",
|
| 36 |
+
"tar": "^6.1.11"
|
| 37 |
+
},
|
| 38 |
+
"bin": {
|
| 39 |
+
"node-pre-gyp": "bin/node-pre-gyp"
|
| 40 |
+
}
|
| 41 |
+
},
|
| 42 |
+
"node_modules/@discordjs/opus": {
|
| 43 |
+
"version": "0.10.0",
|
| 44 |
+
"resolved": "https://registry.npmjs.org/@discordjs/opus/-/opus-0.10.0.tgz",
|
| 45 |
+
"integrity": "sha512-HHEnSNrSPmFEyndRdQBJN2YE6egyXS9JUnJWyP6jficK0Y+qKMEZXyYTgmzpjrxXP1exM/hKaNP7BRBUEWkU5w==",
|
| 46 |
+
"hasInstallScript": true,
|
| 47 |
+
"license": "MIT",
|
| 48 |
+
"dependencies": {
|
| 49 |
+
"@discordjs/node-pre-gyp": "^0.4.5",
|
| 50 |
+
"node-addon-api": "^8.1.0"
|
| 51 |
+
},
|
| 52 |
+
"engines": {
|
| 53 |
+
"node": ">=12.0.0"
|
| 54 |
+
}
|
| 55 |
+
},
|
| 56 |
"node_modules/@grpc/grpc-js": {
|
| 57 |
"version": "1.13.4",
|
| 58 |
"resolved": "https://registry.npmjs.org/@grpc/grpc-js/-/grpc-js-1.13.4.tgz",
|
|
|
|
| 167 |
"undici-types": "~7.10.0"
|
| 168 |
}
|
| 169 |
},
|
| 170 |
+
"node_modules/abbrev": {
|
| 171 |
+
"version": "1.1.1",
|
| 172 |
+
"resolved": "https://registry.npmjs.org/abbrev/-/abbrev-1.1.1.tgz",
|
| 173 |
+
"integrity": "sha512-nne9/IiQ/hzIhY6pdDnbBtz7DjPTKrY00P/zvPSm5pOFkl6xuGrGnXn/VtTNNfNtAfZ9/1RtehkszU9qcTii0Q==",
|
| 174 |
+
"license": "ISC"
|
| 175 |
+
},
|
| 176 |
"node_modules/accepts": {
|
| 177 |
"version": "2.0.0",
|
| 178 |
"resolved": "https://registry.npmjs.org/accepts/-/accepts-2.0.0.tgz",
|
|
|
|
| 186 |
"node": ">= 0.6"
|
| 187 |
}
|
| 188 |
},
|
| 189 |
+
"node_modules/agent-base": {
|
| 190 |
+
"version": "6.0.2",
|
| 191 |
+
"resolved": "https://registry.npmjs.org/agent-base/-/agent-base-6.0.2.tgz",
|
| 192 |
+
"integrity": "sha512-RZNwNclF7+MS/8bDg70amg32dyeZGZxiDuQmZxKLAlQjr3jGyLx+4Kkk58UO7D2QdgFIQCovuSuZESne6RG6XQ==",
|
| 193 |
+
"license": "MIT",
|
| 194 |
+
"dependencies": {
|
| 195 |
+
"debug": "4"
|
| 196 |
+
},
|
| 197 |
+
"engines": {
|
| 198 |
+
"node": ">= 6.0.0"
|
| 199 |
+
}
|
| 200 |
+
},
|
| 201 |
"node_modules/ansi-regex": {
|
| 202 |
"version": "5.0.1",
|
| 203 |
"resolved": "https://registry.npmjs.org/ansi-regex/-/ansi-regex-5.0.1.tgz",
|
|
|
|
| 236 |
"node": ">= 8"
|
| 237 |
}
|
| 238 |
},
|
| 239 |
+
"node_modules/aproba": {
|
| 240 |
+
"version": "2.1.0",
|
| 241 |
+
"resolved": "https://registry.npmjs.org/aproba/-/aproba-2.1.0.tgz",
|
| 242 |
+
"integrity": "sha512-tLIEcj5GuR2RSTnxNKdkK0dJ/GrC7P38sUkiDmDuHfsHmbagTFAxDVIBltoklXEVIQ/f14IL8IMJ5pn9Hez1Ew==",
|
| 243 |
+
"license": "ISC"
|
| 244 |
+
},
|
| 245 |
+
"node_modules/are-we-there-yet": {
|
| 246 |
+
"version": "2.0.0",
|
| 247 |
+
"resolved": "https://registry.npmjs.org/are-we-there-yet/-/are-we-there-yet-2.0.0.tgz",
|
| 248 |
+
"integrity": "sha512-Ci/qENmwHnsYo9xKIcUJN5LeDKdJ6R1Z1j9V/J5wyq8nh/mYPEpIKJbBZXtZjG04HiK7zV/p6Vs9952MrMeUIw==",
|
| 249 |
+
"deprecated": "This package is no longer supported.",
|
| 250 |
+
"license": "ISC",
|
| 251 |
+
"dependencies": {
|
| 252 |
+
"delegates": "^1.0.0",
|
| 253 |
+
"readable-stream": "^3.6.0"
|
| 254 |
+
},
|
| 255 |
+
"engines": {
|
| 256 |
+
"node": ">=10"
|
| 257 |
+
}
|
| 258 |
+
},
|
| 259 |
"node_modules/balanced-match": {
|
| 260 |
"version": "1.0.2",
|
| 261 |
"resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-1.0.2.tgz",
|
| 262 |
"integrity": "sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw==",
|
|
|
|
| 263 |
"license": "MIT"
|
| 264 |
},
|
| 265 |
"node_modules/binary-extensions": {
|
|
|
|
| 299 |
"version": "1.1.12",
|
| 300 |
"resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-1.1.12.tgz",
|
| 301 |
"integrity": "sha512-9T9UjW3r0UW5c1Q7GTwllptXwhvYmEzFhzMfZ9H7FQWt+uZePjZPjBP/W1ZEyZ1twGWom5/56TF4lPcqjnDHcg==",
|
|
|
|
| 302 |
"license": "MIT",
|
| 303 |
"dependencies": {
|
| 304 |
"balanced-match": "^1.0.0",
|
|
|
|
| 381 |
"fsevents": "~2.3.2"
|
| 382 |
}
|
| 383 |
},
|
| 384 |
+
"node_modules/chownr": {
|
| 385 |
+
"version": "2.0.0",
|
| 386 |
+
"resolved": "https://registry.npmjs.org/chownr/-/chownr-2.0.0.tgz",
|
| 387 |
+
"integrity": "sha512-bIomtDF5KGpdogkLd9VspvFzk9KfpyyGlS8YFVZl7TGPBHL5snIOnxeshwVgPteQ9b4Eydl+pVbIyE1DcvCWgQ==",
|
| 388 |
+
"license": "ISC",
|
| 389 |
+
"engines": {
|
| 390 |
+
"node": ">=10"
|
| 391 |
+
}
|
| 392 |
+
},
|
| 393 |
"node_modules/cliui": {
|
| 394 |
"version": "8.0.1",
|
| 395 |
"resolved": "https://registry.npmjs.org/cliui/-/cliui-8.0.1.tgz",
|
|
|
|
| 422 |
"integrity": "sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA==",
|
| 423 |
"license": "MIT"
|
| 424 |
},
|
| 425 |
+
"node_modules/color-support": {
|
| 426 |
+
"version": "1.1.3",
|
| 427 |
+
"resolved": "https://registry.npmjs.org/color-support/-/color-support-1.1.3.tgz",
|
| 428 |
+
"integrity": "sha512-qiBjkpbMLO/HL68y+lh4q0/O1MZFj2RX6X/KmMa3+gJD3z+WwI1ZzDHysvqHGS3mP6mznPckpXmw1nI9cJjyRg==",
|
| 429 |
+
"license": "ISC",
|
| 430 |
+
"bin": {
|
| 431 |
+
"color-support": "bin.js"
|
| 432 |
+
}
|
| 433 |
+
},
|
| 434 |
"node_modules/concat-map": {
|
| 435 |
"version": "0.0.1",
|
| 436 |
"resolved": "https://registry.npmjs.org/concat-map/-/concat-map-0.0.1.tgz",
|
| 437 |
"integrity": "sha512-/Srv4dswyQNBfohGpz9o6Yb3Gz3SrUDqBH5rTuhGR7ahtlbYKnVxw2bCFMRljaA7EXHaXZ8wsHdodFvbkhKmqg==",
|
|
|
|
| 438 |
"license": "MIT"
|
| 439 |
},
|
| 440 |
+
"node_modules/console-control-strings": {
|
| 441 |
+
"version": "1.1.0",
|
| 442 |
+
"resolved": "https://registry.npmjs.org/console-control-strings/-/console-control-strings-1.1.0.tgz",
|
| 443 |
+
"integrity": "sha512-ty/fTekppD2fIwRvnZAVdeOiGd1c7YXEixbgJTNzqcxJWKQnjJ/V1bNEEE6hygpM3WjwHFUVK6HTjWSzV4a8sQ==",
|
| 444 |
+
"license": "ISC"
|
| 445 |
+
},
|
| 446 |
"node_modules/content-disposition": {
|
| 447 |
"version": "1.0.0",
|
| 448 |
"resolved": "https://registry.npmjs.org/content-disposition/-/content-disposition-1.0.0.tgz",
|
|
|
|
| 499 |
}
|
| 500 |
}
|
| 501 |
},
|
| 502 |
+
"node_modules/delegates": {
|
| 503 |
+
"version": "1.0.0",
|
| 504 |
+
"resolved": "https://registry.npmjs.org/delegates/-/delegates-1.0.0.tgz",
|
| 505 |
+
"integrity": "sha512-bd2L678uiWATM6m5Z1VzNCErI3jiGzt6HGY8OVICs40JQq/HALfbyNJmp0UDakEY4pMMaN0Ly5om/B1VI/+xfQ==",
|
| 506 |
+
"license": "MIT"
|
| 507 |
+
},
|
| 508 |
"node_modules/depd": {
|
| 509 |
"version": "2.0.0",
|
| 510 |
"resolved": "https://registry.npmjs.org/depd/-/depd-2.0.0.tgz",
|
|
|
|
| 514 |
"node": ">= 0.8"
|
| 515 |
}
|
| 516 |
},
|
| 517 |
+
"node_modules/detect-libc": {
|
| 518 |
+
"version": "2.0.4",
|
| 519 |
+
"resolved": "https://registry.npmjs.org/detect-libc/-/detect-libc-2.0.4.tgz",
|
| 520 |
+
"integrity": "sha512-3UDv+G9CsCKO1WKMGw9fwq/SWJYbI0c5Y7LU1AXYoDdbhE2AHQ6N6Nb34sG8Fj7T5APy8qXDCKuuIHd1BR0tVA==",
|
| 521 |
+
"license": "Apache-2.0",
|
| 522 |
+
"engines": {
|
| 523 |
+
"node": ">=8"
|
| 524 |
+
}
|
| 525 |
+
},
|
| 526 |
"node_modules/dunder-proto": {
|
| 527 |
"version": "1.0.1",
|
| 528 |
"resolved": "https://registry.npmjs.org/dunder-proto/-/dunder-proto-1.0.1.tgz",
|
|
|
|
| 702 |
"node": ">= 0.8"
|
| 703 |
}
|
| 704 |
},
|
| 705 |
+
"node_modules/fs-minipass": {
|
| 706 |
+
"version": "2.1.0",
|
| 707 |
+
"resolved": "https://registry.npmjs.org/fs-minipass/-/fs-minipass-2.1.0.tgz",
|
| 708 |
+
"integrity": "sha512-V/JgOLFCS+R6Vcq0slCuaeWEdNC3ouDlJMNIsacH2VtALiu9mV4LPrHc5cDl8k5aw6J8jwgWWpiTo5RYhmIzvg==",
|
| 709 |
+
"license": "ISC",
|
| 710 |
+
"dependencies": {
|
| 711 |
+
"minipass": "^3.0.0"
|
| 712 |
+
},
|
| 713 |
+
"engines": {
|
| 714 |
+
"node": ">= 8"
|
| 715 |
+
}
|
| 716 |
+
},
|
| 717 |
+
"node_modules/fs-minipass/node_modules/minipass": {
|
| 718 |
+
"version": "3.3.6",
|
| 719 |
+
"resolved": "https://registry.npmjs.org/minipass/-/minipass-3.3.6.tgz",
|
| 720 |
+
"integrity": "sha512-DxiNidxSEK+tHG6zOIklvNOwm3hvCrbUrdtzY74U6HKTJxvIDfOUL5W5P2Ghd3DTkhhKPYGqeNUIh5qcM4YBfw==",
|
| 721 |
+
"license": "ISC",
|
| 722 |
+
"dependencies": {
|
| 723 |
+
"yallist": "^4.0.0"
|
| 724 |
+
},
|
| 725 |
+
"engines": {
|
| 726 |
+
"node": ">=8"
|
| 727 |
+
}
|
| 728 |
+
},
|
| 729 |
+
"node_modules/fs.realpath": {
|
| 730 |
+
"version": "1.0.0",
|
| 731 |
+
"resolved": "https://registry.npmjs.org/fs.realpath/-/fs.realpath-1.0.0.tgz",
|
| 732 |
+
"integrity": "sha512-OO0pH2lK6a0hZnAdau5ItzHPI6pUlvI7jMVnxUQRtw4owF2wk8lOSabtGDCTP4Ggrg2MbGnWO9X8K1t4+fGMDw==",
|
| 733 |
+
"license": "ISC"
|
| 734 |
+
},
|
| 735 |
"node_modules/fsevents": {
|
| 736 |
"version": "2.3.3",
|
| 737 |
"resolved": "https://registry.npmjs.org/fsevents/-/fsevents-2.3.3.tgz",
|
|
|
|
| 756 |
"url": "https://github.com/sponsors/ljharb"
|
| 757 |
}
|
| 758 |
},
|
| 759 |
+
"node_modules/gauge": {
|
| 760 |
+
"version": "3.0.2",
|
| 761 |
+
"resolved": "https://registry.npmjs.org/gauge/-/gauge-3.0.2.tgz",
|
| 762 |
+
"integrity": "sha512-+5J6MS/5XksCuXq++uFRsnUd7Ovu1XenbeuIuNRJxYWjgQbPuFhT14lAvsWfqfAmnwluf1OwMjz39HjfLPci0Q==",
|
| 763 |
+
"deprecated": "This package is no longer supported.",
|
| 764 |
+
"license": "ISC",
|
| 765 |
+
"dependencies": {
|
| 766 |
+
"aproba": "^1.0.3 || ^2.0.0",
|
| 767 |
+
"color-support": "^1.1.2",
|
| 768 |
+
"console-control-strings": "^1.0.0",
|
| 769 |
+
"has-unicode": "^2.0.1",
|
| 770 |
+
"object-assign": "^4.1.1",
|
| 771 |
+
"signal-exit": "^3.0.0",
|
| 772 |
+
"string-width": "^4.2.3",
|
| 773 |
+
"strip-ansi": "^6.0.1",
|
| 774 |
+
"wide-align": "^1.1.2"
|
| 775 |
+
},
|
| 776 |
+
"engines": {
|
| 777 |
+
"node": ">=10"
|
| 778 |
+
}
|
| 779 |
+
},
|
| 780 |
"node_modules/get-caller-file": {
|
| 781 |
"version": "2.0.5",
|
| 782 |
"resolved": "https://registry.npmjs.org/get-caller-file/-/get-caller-file-2.0.5.tgz",
|
|
|
|
| 823 |
"node": ">= 0.4"
|
| 824 |
}
|
| 825 |
},
|
| 826 |
+
"node_modules/glob": {
|
| 827 |
+
"version": "7.2.3",
|
| 828 |
+
"resolved": "https://registry.npmjs.org/glob/-/glob-7.2.3.tgz",
|
| 829 |
+
"integrity": "sha512-nFR0zLpU2YCaRxwoCJvL6UvCH2JFyFVIvwTLsIf21AuHlMskA1hhTdk+LlYJtOlYt9v6dvszD2BGRqBL+iQK9Q==",
|
| 830 |
+
"deprecated": "Glob versions prior to v9 are no longer supported",
|
| 831 |
+
"license": "ISC",
|
| 832 |
+
"dependencies": {
|
| 833 |
+
"fs.realpath": "^1.0.0",
|
| 834 |
+
"inflight": "^1.0.4",
|
| 835 |
+
"inherits": "2",
|
| 836 |
+
"minimatch": "^3.1.1",
|
| 837 |
+
"once": "^1.3.0",
|
| 838 |
+
"path-is-absolute": "^1.0.0"
|
| 839 |
+
},
|
| 840 |
+
"engines": {
|
| 841 |
+
"node": "*"
|
| 842 |
+
},
|
| 843 |
+
"funding": {
|
| 844 |
+
"url": "https://github.com/sponsors/isaacs"
|
| 845 |
+
}
|
| 846 |
+
},
|
| 847 |
"node_modules/glob-parent": {
|
| 848 |
"version": "5.1.2",
|
| 849 |
"resolved": "https://registry.npmjs.org/glob-parent/-/glob-parent-5.1.2.tgz",
|
|
|
|
| 891 |
"url": "https://github.com/sponsors/ljharb"
|
| 892 |
}
|
| 893 |
},
|
| 894 |
+
"node_modules/has-unicode": {
|
| 895 |
+
"version": "2.0.1",
|
| 896 |
+
"resolved": "https://registry.npmjs.org/has-unicode/-/has-unicode-2.0.1.tgz",
|
| 897 |
+
"integrity": "sha512-8Rf9Y83NBReMnx0gFzA8JImQACstCYWUplepDa9xprwwtmgEZUF0h/i5xSA625zB/I37EtrswSST6OXxwaaIJQ==",
|
| 898 |
+
"license": "ISC"
|
| 899 |
+
},
|
| 900 |
"node_modules/hasown": {
|
| 901 |
"version": "2.0.2",
|
| 902 |
"resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.2.tgz",
|
|
|
|
| 934 |
"node": ">= 0.8"
|
| 935 |
}
|
| 936 |
},
|
| 937 |
+
"node_modules/https-proxy-agent": {
|
| 938 |
+
"version": "5.0.1",
|
| 939 |
+
"resolved": "https://registry.npmjs.org/https-proxy-agent/-/https-proxy-agent-5.0.1.tgz",
|
| 940 |
+
"integrity": "sha512-dFcAjpTQFgoLMzC2VwU+C/CbS7uRL0lWmxDITmqm7C+7F0Odmj6s9l6alZc6AELXhrnggM2CeWSXHGOdX2YtwA==",
|
| 941 |
+
"license": "MIT",
|
| 942 |
+
"dependencies": {
|
| 943 |
+
"agent-base": "6",
|
| 944 |
+
"debug": "4"
|
| 945 |
+
},
|
| 946 |
+
"engines": {
|
| 947 |
+
"node": ">= 6"
|
| 948 |
+
}
|
| 949 |
+
},
|
| 950 |
"node_modules/iconv-lite": {
|
| 951 |
"version": "0.6.3",
|
| 952 |
"resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.6.3.tgz",
|
|
|
|
| 966 |
"dev": true,
|
| 967 |
"license": "ISC"
|
| 968 |
},
|
| 969 |
+
"node_modules/inflight": {
|
| 970 |
+
"version": "1.0.6",
|
| 971 |
+
"resolved": "https://registry.npmjs.org/inflight/-/inflight-1.0.6.tgz",
|
| 972 |
+
"integrity": "sha512-k92I/b08q4wvFscXCLvqfsHCrjrF7yiXsQuIVvVE7N82W3+aqpzuUdBbfhWcy/FZR3/4IgflMgKLOsvPDrGCJA==",
|
| 973 |
+
"deprecated": "This module is not supported, and leaks memory. Do not use it. Check out lru-cache if you want a good and tested way to coalesce async requests by a key value, which is much more comprehensive and powerful.",
|
| 974 |
+
"license": "ISC",
|
| 975 |
+
"dependencies": {
|
| 976 |
+
"once": "^1.3.0",
|
| 977 |
+
"wrappy": "1"
|
| 978 |
+
}
|
| 979 |
+
},
|
| 980 |
"node_modules/inherits": {
|
| 981 |
"version": "2.0.4",
|
| 982 |
"resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz",
|
|
|
|
| 1065 |
"integrity": "sha512-mNAgZ1GmyNhD7AuqnTG3/VQ26o760+ZYBPKjPvugO8+nLbYfX6TVpJPseBvopbdY+qpZ/lKUnmEc1LeZYS3QAA==",
|
| 1066 |
"license": "Apache-2.0"
|
| 1067 |
},
|
| 1068 |
+
"node_modules/make-dir": {
|
| 1069 |
+
"version": "3.1.0",
|
| 1070 |
+
"resolved": "https://registry.npmjs.org/make-dir/-/make-dir-3.1.0.tgz",
|
| 1071 |
+
"integrity": "sha512-g3FeP20LNwhALb/6Cz6Dd4F2ngze0jz7tbzrD2wAV+o9FeNHe4rL+yK2md0J/fiSf1sa1ADhXqi5+oVwOM/eGw==",
|
| 1072 |
+
"license": "MIT",
|
| 1073 |
+
"dependencies": {
|
| 1074 |
+
"semver": "^6.0.0"
|
| 1075 |
+
},
|
| 1076 |
+
"engines": {
|
| 1077 |
+
"node": ">=8"
|
| 1078 |
+
},
|
| 1079 |
+
"funding": {
|
| 1080 |
+
"url": "https://github.com/sponsors/sindresorhus"
|
| 1081 |
+
}
|
| 1082 |
+
},
|
| 1083 |
+
"node_modules/make-dir/node_modules/semver": {
|
| 1084 |
+
"version": "6.3.1",
|
| 1085 |
+
"resolved": "https://registry.npmjs.org/semver/-/semver-6.3.1.tgz",
|
| 1086 |
+
"integrity": "sha512-BR7VvDCVHO+q2xBEWskxS6DJE1qRnb7DxzUrogb71CWoSficBxYsiAGd+Kl0mmq/MprG9yArRkyrQxTO6XjMzA==",
|
| 1087 |
+
"license": "ISC",
|
| 1088 |
+
"bin": {
|
| 1089 |
+
"semver": "bin/semver.js"
|
| 1090 |
+
}
|
| 1091 |
+
},
|
| 1092 |
"node_modules/math-intrinsics": {
|
| 1093 |
"version": "1.1.0",
|
| 1094 |
"resolved": "https://registry.npmjs.org/math-intrinsics/-/math-intrinsics-1.1.0.tgz",
|
|
|
|
| 1144 |
"version": "3.1.2",
|
| 1145 |
"resolved": "https://registry.npmjs.org/minimatch/-/minimatch-3.1.2.tgz",
|
| 1146 |
"integrity": "sha512-J7p63hRiAjw1NDEww1W7i37+ByIrOWO5XQQAzZ3VOcL0PNybwpfmV/N05zFAzwQ9USyEcX6t3UO+K5aqBQOIHw==",
|
|
|
|
| 1147 |
"license": "ISC",
|
| 1148 |
"dependencies": {
|
| 1149 |
"brace-expansion": "^1.1.7"
|
|
|
|
| 1152 |
"node": "*"
|
| 1153 |
}
|
| 1154 |
},
|
| 1155 |
+
"node_modules/minipass": {
|
| 1156 |
+
"version": "5.0.0",
|
| 1157 |
+
"resolved": "https://registry.npmjs.org/minipass/-/minipass-5.0.0.tgz",
|
| 1158 |
+
"integrity": "sha512-3FnjYuehv9k6ovOEbyOswadCDPX1piCfhV8ncmYtHOjuPwylVWsghTLo7rabjC3Rx5xD4HDx8Wm1xnMF7S5qFQ==",
|
| 1159 |
+
"license": "ISC",
|
| 1160 |
+
"engines": {
|
| 1161 |
+
"node": ">=8"
|
| 1162 |
+
}
|
| 1163 |
+
},
|
| 1164 |
+
"node_modules/minizlib": {
|
| 1165 |
+
"version": "2.1.2",
|
| 1166 |
+
"resolved": "https://registry.npmjs.org/minizlib/-/minizlib-2.1.2.tgz",
|
| 1167 |
+
"integrity": "sha512-bAxsR8BVfj60DWXHE3u30oHzfl4G7khkSuPW+qvpd7jFRHm7dLxOjUk1EHACJ/hxLY8phGJ0YhYHZo7jil7Qdg==",
|
| 1168 |
+
"license": "MIT",
|
| 1169 |
+
"dependencies": {
|
| 1170 |
+
"minipass": "^3.0.0",
|
| 1171 |
+
"yallist": "^4.0.0"
|
| 1172 |
+
},
|
| 1173 |
+
"engines": {
|
| 1174 |
+
"node": ">= 8"
|
| 1175 |
+
}
|
| 1176 |
+
},
|
| 1177 |
+
"node_modules/minizlib/node_modules/minipass": {
|
| 1178 |
+
"version": "3.3.6",
|
| 1179 |
+
"resolved": "https://registry.npmjs.org/minipass/-/minipass-3.3.6.tgz",
|
| 1180 |
+
"integrity": "sha512-DxiNidxSEK+tHG6zOIklvNOwm3hvCrbUrdtzY74U6HKTJxvIDfOUL5W5P2Ghd3DTkhhKPYGqeNUIh5qcM4YBfw==",
|
| 1181 |
+
"license": "ISC",
|
| 1182 |
+
"dependencies": {
|
| 1183 |
+
"yallist": "^4.0.0"
|
| 1184 |
+
},
|
| 1185 |
+
"engines": {
|
| 1186 |
+
"node": ">=8"
|
| 1187 |
+
}
|
| 1188 |
+
},
|
| 1189 |
+
"node_modules/mkdirp": {
|
| 1190 |
+
"version": "1.0.4",
|
| 1191 |
+
"resolved": "https://registry.npmjs.org/mkdirp/-/mkdirp-1.0.4.tgz",
|
| 1192 |
+
"integrity": "sha512-vVqVZQyf3WLx2Shd0qJ9xuvqgAyKPLAiqITEtqW0oIUjzo3PePDd6fW9iFz30ef7Ysp/oiWqbhszeGWW2T6Gzw==",
|
| 1193 |
+
"license": "MIT",
|
| 1194 |
+
"bin": {
|
| 1195 |
+
"mkdirp": "bin/cmd.js"
|
| 1196 |
+
},
|
| 1197 |
+
"engines": {
|
| 1198 |
+
"node": ">=10"
|
| 1199 |
+
}
|
| 1200 |
+
},
|
| 1201 |
"node_modules/ms": {
|
| 1202 |
"version": "2.1.3",
|
| 1203 |
"resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
|
|
|
|
| 1213 |
"node": ">= 0.6"
|
| 1214 |
}
|
| 1215 |
},
|
| 1216 |
+
"node_modules/node-addon-api": {
|
| 1217 |
+
"version": "8.5.0",
|
| 1218 |
+
"resolved": "https://registry.npmjs.org/node-addon-api/-/node-addon-api-8.5.0.tgz",
|
| 1219 |
+
"integrity": "sha512-/bRZty2mXUIFY/xU5HLvveNHlswNJej+RnxBjOMkidWfwZzgTbPG1E3K5TOxRLOR+5hX7bSofy8yf1hZevMS8A==",
|
| 1220 |
+
"license": "MIT",
|
| 1221 |
+
"engines": {
|
| 1222 |
+
"node": "^18 || ^20 || >= 21"
|
| 1223 |
+
}
|
| 1224 |
+
},
|
| 1225 |
+
"node_modules/node-fetch": {
|
| 1226 |
+
"version": "2.7.0",
|
| 1227 |
+
"resolved": "https://registry.npmjs.org/node-fetch/-/node-fetch-2.7.0.tgz",
|
| 1228 |
+
"integrity": "sha512-c4FRfUm/dbcWZ7U+1Wq0AwCyFL+3nt2bEw05wfxSz+DWpWsitgmSgYmy2dQdWyKC1694ELPqMs/YzUSNozLt8A==",
|
| 1229 |
+
"license": "MIT",
|
| 1230 |
+
"dependencies": {
|
| 1231 |
+
"whatwg-url": "^5.0.0"
|
| 1232 |
+
},
|
| 1233 |
+
"engines": {
|
| 1234 |
+
"node": "4.x || >=6.0.0"
|
| 1235 |
+
},
|
| 1236 |
+
"peerDependencies": {
|
| 1237 |
+
"encoding": "^0.1.0"
|
| 1238 |
+
},
|
| 1239 |
+
"peerDependenciesMeta": {
|
| 1240 |
+
"encoding": {
|
| 1241 |
+
"optional": true
|
| 1242 |
+
}
|
| 1243 |
+
}
|
| 1244 |
+
},
|
| 1245 |
"node_modules/nodemon": {
|
| 1246 |
"version": "3.1.10",
|
| 1247 |
"resolved": "https://registry.npmjs.org/nodemon/-/nodemon-3.1.10.tgz",
|
|
|
|
| 1271 |
"url": "https://opencollective.com/nodemon"
|
| 1272 |
}
|
| 1273 |
},
|
| 1274 |
+
"node_modules/nopt": {
|
| 1275 |
+
"version": "5.0.0",
|
| 1276 |
+
"resolved": "https://registry.npmjs.org/nopt/-/nopt-5.0.0.tgz",
|
| 1277 |
+
"integrity": "sha512-Tbj67rffqceeLpcRXrT7vKAN8CwfPeIBgM7E6iBkmKLV7bEMwpGgYLGv0jACUsECaa/vuxP0IjEont6umdMgtQ==",
|
| 1278 |
+
"license": "ISC",
|
| 1279 |
+
"dependencies": {
|
| 1280 |
+
"abbrev": "1"
|
| 1281 |
+
},
|
| 1282 |
+
"bin": {
|
| 1283 |
+
"nopt": "bin/nopt.js"
|
| 1284 |
+
},
|
| 1285 |
+
"engines": {
|
| 1286 |
+
"node": ">=6"
|
| 1287 |
+
}
|
| 1288 |
+
},
|
| 1289 |
"node_modules/normalize-path": {
|
| 1290 |
"version": "3.0.0",
|
| 1291 |
"resolved": "https://registry.npmjs.org/normalize-path/-/normalize-path-3.0.0.tgz",
|
|
|
|
| 1296 |
"node": ">=0.10.0"
|
| 1297 |
}
|
| 1298 |
},
|
| 1299 |
+
"node_modules/npmlog": {
|
| 1300 |
+
"version": "5.0.1",
|
| 1301 |
+
"resolved": "https://registry.npmjs.org/npmlog/-/npmlog-5.0.1.tgz",
|
| 1302 |
+
"integrity": "sha512-AqZtDUWOMKs1G/8lwylVjrdYgqA4d9nu8hc+0gzRxlDb1I10+FHBGMXs6aiQHFdCUUlqH99MUMuLfzWDNDtfxw==",
|
| 1303 |
+
"deprecated": "This package is no longer supported.",
|
| 1304 |
+
"license": "ISC",
|
| 1305 |
+
"dependencies": {
|
| 1306 |
+
"are-we-there-yet": "^2.0.0",
|
| 1307 |
+
"console-control-strings": "^1.1.0",
|
| 1308 |
+
"gauge": "^3.0.0",
|
| 1309 |
+
"set-blocking": "^2.0.0"
|
| 1310 |
+
}
|
| 1311 |
+
},
|
| 1312 |
+
"node_modules/object-assign": {
|
| 1313 |
+
"version": "4.1.1",
|
| 1314 |
+
"resolved": "https://registry.npmjs.org/object-assign/-/object-assign-4.1.1.tgz",
|
| 1315 |
+
"integrity": "sha512-rJgTQnkUnH1sFw8yT6VSU3zD3sWmu6sZhIseY8VX+GRu3P6F7Fu+JNDoXfklElbLJSnc3FUQHVe4cU5hj+BcUg==",
|
| 1316 |
+
"license": "MIT",
|
| 1317 |
+
"engines": {
|
| 1318 |
+
"node": ">=0.10.0"
|
| 1319 |
+
}
|
| 1320 |
+
},
|
| 1321 |
"node_modules/object-inspect": {
|
| 1322 |
"version": "1.13.4",
|
| 1323 |
"resolved": "https://registry.npmjs.org/object-inspect/-/object-inspect-1.13.4.tgz",
|
|
|
|
| 1360 |
"node": ">= 0.8"
|
| 1361 |
}
|
| 1362 |
},
|
| 1363 |
+
"node_modules/path-is-absolute": {
|
| 1364 |
+
"version": "1.0.1",
|
| 1365 |
+
"resolved": "https://registry.npmjs.org/path-is-absolute/-/path-is-absolute-1.0.1.tgz",
|
| 1366 |
+
"integrity": "sha512-AVbw3UJ2e9bq64vSaS9Am0fje1Pa8pbGqTTsmXfaIiMpnr5DlDhfJOuLj9Sf95ZPVDAUerDfEk88MPmPe7UCQg==",
|
| 1367 |
+
"license": "MIT",
|
| 1368 |
+
"engines": {
|
| 1369 |
+
"node": ">=0.10.0"
|
| 1370 |
+
}
|
| 1371 |
+
},
|
| 1372 |
"node_modules/path-to-regexp": {
|
| 1373 |
"version": "8.3.0",
|
| 1374 |
"resolved": "https://registry.npmjs.org/path-to-regexp/-/path-to-regexp-8.3.0.tgz",
|
|
|
|
| 1491 |
"url": "https://opencollective.com/express"
|
| 1492 |
}
|
| 1493 |
},
|
| 1494 |
+
"node_modules/readable-stream": {
|
| 1495 |
+
"version": "3.6.2",
|
| 1496 |
+
"resolved": "https://registry.npmjs.org/readable-stream/-/readable-stream-3.6.2.tgz",
|
| 1497 |
+
"integrity": "sha512-9u/sniCrY3D5WdsERHzHE4G2YCXqoG5FTHUiCC4SIbr6XcLZBY05ya9EKjYek9O5xOAwjGq+1JdGBAS7Q9ScoA==",
|
| 1498 |
+
"license": "MIT",
|
| 1499 |
+
"dependencies": {
|
| 1500 |
+
"inherits": "^2.0.3",
|
| 1501 |
+
"string_decoder": "^1.1.1",
|
| 1502 |
+
"util-deprecate": "^1.0.1"
|
| 1503 |
+
},
|
| 1504 |
+
"engines": {
|
| 1505 |
+
"node": ">= 6"
|
| 1506 |
+
}
|
| 1507 |
+
},
|
| 1508 |
"node_modules/readdirp": {
|
| 1509 |
"version": "3.6.0",
|
| 1510 |
"resolved": "https://registry.npmjs.org/readdirp/-/readdirp-3.6.0.tgz",
|
|
|
|
| 1527 |
"node": ">=0.10.0"
|
| 1528 |
}
|
| 1529 |
},
|
| 1530 |
+
"node_modules/rimraf": {
|
| 1531 |
+
"version": "3.0.2",
|
| 1532 |
+
"resolved": "https://registry.npmjs.org/rimraf/-/rimraf-3.0.2.tgz",
|
| 1533 |
+
"integrity": "sha512-JZkJMZkAGFFPP2YqXZXPbMlMBgsxzE8ILs4lMIX/2o0L9UBw9O/Y3o6wFw/i9YLapcUJWwqbi3kdxIPdC62TIA==",
|
| 1534 |
+
"deprecated": "Rimraf versions prior to v4 are no longer supported",
|
| 1535 |
+
"license": "ISC",
|
| 1536 |
+
"dependencies": {
|
| 1537 |
+
"glob": "^7.1.3"
|
| 1538 |
+
},
|
| 1539 |
+
"bin": {
|
| 1540 |
+
"rimraf": "bin.js"
|
| 1541 |
+
},
|
| 1542 |
+
"funding": {
|
| 1543 |
+
"url": "https://github.com/sponsors/isaacs"
|
| 1544 |
+
}
|
| 1545 |
+
},
|
| 1546 |
"node_modules/router": {
|
| 1547 |
"version": "2.2.0",
|
| 1548 |
"resolved": "https://registry.npmjs.org/router/-/router-2.2.0.tgz",
|
|
|
|
| 1589 |
"version": "7.7.2",
|
| 1590 |
"resolved": "https://registry.npmjs.org/semver/-/semver-7.7.2.tgz",
|
| 1591 |
"integrity": "sha512-RF0Fw+rO5AMf9MAyaRXI4AV0Ulj5lMHqVxxdSgiVbixSCXoEmmX/jk0CuJw4+3SqroYO9VoUh+HcuJivvtJemA==",
|
|
|
|
| 1592 |
"license": "ISC",
|
| 1593 |
"bin": {
|
| 1594 |
"semver": "bin/semver.js"
|
|
|
|
| 1634 |
"node": ">= 18"
|
| 1635 |
}
|
| 1636 |
},
|
| 1637 |
+
"node_modules/set-blocking": {
|
| 1638 |
+
"version": "2.0.0",
|
| 1639 |
+
"resolved": "https://registry.npmjs.org/set-blocking/-/set-blocking-2.0.0.tgz",
|
| 1640 |
+
"integrity": "sha512-KiKBS8AnWGEyLzofFfmvKwpdPzqiy16LvQfK3yv/fVH7Bj13/wl3JSR1J+rfgRE9q7xUJK4qvgS8raSOeLUehw==",
|
| 1641 |
+
"license": "ISC"
|
| 1642 |
+
},
|
| 1643 |
"node_modules/setprototypeof": {
|
| 1644 |
"version": "1.2.0",
|
| 1645 |
"resolved": "https://registry.npmjs.org/setprototypeof/-/setprototypeof-1.2.0.tgz",
|
|
|
|
| 1718 |
"url": "https://github.com/sponsors/ljharb"
|
| 1719 |
}
|
| 1720 |
},
|
| 1721 |
+
"node_modules/signal-exit": {
|
| 1722 |
+
"version": "3.0.7",
|
| 1723 |
+
"resolved": "https://registry.npmjs.org/signal-exit/-/signal-exit-3.0.7.tgz",
|
| 1724 |
+
"integrity": "sha512-wnD2ZE+l+SPC/uoS0vXeE9L1+0wuaMqKlfz9AMUo38JsyLSBWSFcHR1Rri62LZc12vLr1gb3jl7iwQhgwpAbGQ==",
|
| 1725 |
+
"license": "ISC"
|
| 1726 |
+
},
|
| 1727 |
"node_modules/simple-update-notifier": {
|
| 1728 |
"version": "2.0.0",
|
| 1729 |
"resolved": "https://registry.npmjs.org/simple-update-notifier/-/simple-update-notifier-2.0.0.tgz",
|
|
|
|
| 1746 |
"node": ">= 0.8"
|
| 1747 |
}
|
| 1748 |
},
|
| 1749 |
+
"node_modules/string_decoder": {
|
| 1750 |
+
"version": "1.3.0",
|
| 1751 |
+
"resolved": "https://registry.npmjs.org/string_decoder/-/string_decoder-1.3.0.tgz",
|
| 1752 |
+
"integrity": "sha512-hkRX8U1WjJFd8LsDJ2yQ/wWWxaopEsABU1XfkM8A+j0+85JAGppt16cr1Whg6KIbb4okU6Mql6BOj+uup/wKeA==",
|
| 1753 |
+
"license": "MIT",
|
| 1754 |
+
"dependencies": {
|
| 1755 |
+
"safe-buffer": "~5.2.0"
|
| 1756 |
+
}
|
| 1757 |
+
},
|
| 1758 |
"node_modules/string-width": {
|
| 1759 |
"version": "4.2.3",
|
| 1760 |
"resolved": "https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz",
|
|
|
|
| 1794 |
"node": ">=4"
|
| 1795 |
}
|
| 1796 |
},
|
| 1797 |
+
"node_modules/tar": {
|
| 1798 |
+
"version": "6.2.1",
|
| 1799 |
+
"resolved": "https://registry.npmjs.org/tar/-/tar-6.2.1.tgz",
|
| 1800 |
+
"integrity": "sha512-DZ4yORTwrbTj/7MZYq2w+/ZFdI6OZ/f9SFHR+71gIVUZhOQPHzVCLpvRnPgyaMpfWxxk/4ONva3GQSyNIKRv6A==",
|
| 1801 |
+
"license": "ISC",
|
| 1802 |
+
"dependencies": {
|
| 1803 |
+
"chownr": "^2.0.0",
|
| 1804 |
+
"fs-minipass": "^2.0.0",
|
| 1805 |
+
"minipass": "^5.0.0",
|
| 1806 |
+
"minizlib": "^2.1.1",
|
| 1807 |
+
"mkdirp": "^1.0.3",
|
| 1808 |
+
"yallist": "^4.0.0"
|
| 1809 |
+
},
|
| 1810 |
+
"engines": {
|
| 1811 |
+
"node": ">=10"
|
| 1812 |
+
}
|
| 1813 |
+
},
|
| 1814 |
"node_modules/to-regex-range": {
|
| 1815 |
"version": "5.0.1",
|
| 1816 |
"resolved": "https://registry.npmjs.org/to-regex-range/-/to-regex-range-5.0.1.tgz",
|
|
|
|
| 1843 |
"nodetouch": "bin/nodetouch.js"
|
| 1844 |
}
|
| 1845 |
},
|
| 1846 |
+
"node_modules/tr46": {
|
| 1847 |
+
"version": "0.0.3",
|
| 1848 |
+
"resolved": "https://registry.npmjs.org/tr46/-/tr46-0.0.3.tgz",
|
| 1849 |
+
"integrity": "sha512-N3WMsuqV66lT30CrXNbEjx4GEwlow3v6rr4mCcv6prnfwhS01rkgyFdjPNBYd9br7LpXV1+Emh01fHnq2Gdgrw==",
|
| 1850 |
+
"license": "MIT"
|
| 1851 |
+
},
|
| 1852 |
"node_modules/type-is": {
|
| 1853 |
"version": "2.0.1",
|
| 1854 |
"resolved": "https://registry.npmjs.org/type-is/-/type-is-2.0.1.tgz",
|
|
|
|
| 1885 |
"node": ">= 0.8"
|
| 1886 |
}
|
| 1887 |
},
|
| 1888 |
+
"node_modules/util-deprecate": {
|
| 1889 |
+
"version": "1.0.2",
|
| 1890 |
+
"resolved": "https://registry.npmjs.org/util-deprecate/-/util-deprecate-1.0.2.tgz",
|
| 1891 |
+
"integrity": "sha512-EPD5q1uXyFxJpCrLnCc1nHnq3gOa6DZBocAIiI2TaSCA7VCJ1UJDMagCzIkXNsUYfD1daK//LTEQ8xiIbrHtcw==",
|
| 1892 |
+
"license": "MIT"
|
| 1893 |
+
},
|
| 1894 |
"node_modules/vary": {
|
| 1895 |
"version": "1.1.2",
|
| 1896 |
"resolved": "https://registry.npmjs.org/vary/-/vary-1.1.2.tgz",
|
|
|
|
| 1900 |
"node": ">= 0.8"
|
| 1901 |
}
|
| 1902 |
},
|
| 1903 |
+
"node_modules/webidl-conversions": {
|
| 1904 |
+
"version": "3.0.1",
|
| 1905 |
+
"resolved": "https://registry.npmjs.org/webidl-conversions/-/webidl-conversions-3.0.1.tgz",
|
| 1906 |
+
"integrity": "sha512-2JAn3z8AR6rjK8Sm8orRC0h/bcl/DqL7tRPdGZ4I1CjdF+EaMLmYxBHyXuKL849eucPFhvBoxMsflfOb8kxaeQ==",
|
| 1907 |
+
"license": "BSD-2-Clause"
|
| 1908 |
+
},
|
| 1909 |
+
"node_modules/whatwg-url": {
|
| 1910 |
+
"version": "5.0.0",
|
| 1911 |
+
"resolved": "https://registry.npmjs.org/whatwg-url/-/whatwg-url-5.0.0.tgz",
|
| 1912 |
+
"integrity": "sha512-saE57nupxk6v3HY35+jzBwYa0rKSy0XR8JSxZPwgLr7ys0IBzhGviA1/TUGJLmSVqs8pb9AnvICXEuOHLprYTw==",
|
| 1913 |
+
"license": "MIT",
|
| 1914 |
+
"dependencies": {
|
| 1915 |
+
"tr46": "~0.0.3",
|
| 1916 |
+
"webidl-conversions": "^3.0.0"
|
| 1917 |
+
}
|
| 1918 |
+
},
|
| 1919 |
+
"node_modules/wide-align": {
|
| 1920 |
+
"version": "1.1.5",
|
| 1921 |
+
"resolved": "https://registry.npmjs.org/wide-align/-/wide-align-1.1.5.tgz",
|
| 1922 |
+
"integrity": "sha512-eDMORYaPNZ4sQIuuYPDHdQvf4gyCF9rEEV/yPxGfwPkRodwEgiMUUXTx/dex+Me0wxx53S+NgUHaP7y3MGlDmg==",
|
| 1923 |
+
"license": "ISC",
|
| 1924 |
+
"dependencies": {
|
| 1925 |
+
"string-width": "^1.0.2 || 2 || 3 || 4"
|
| 1926 |
+
}
|
| 1927 |
+
},
|
| 1928 |
"node_modules/wrap-ansi": {
|
| 1929 |
"version": "7.0.0",
|
| 1930 |
"resolved": "https://registry.npmjs.org/wrap-ansi/-/wrap-ansi-7.0.0.tgz",
|
|
|
|
| 1978 |
"node": ">=10"
|
| 1979 |
}
|
| 1980 |
},
|
| 1981 |
+
"node_modules/yallist": {
|
| 1982 |
+
"version": "4.0.0",
|
| 1983 |
+
"resolved": "https://registry.npmjs.org/yallist/-/yallist-4.0.0.tgz",
|
| 1984 |
+
"integrity": "sha512-3wdGidZyq5PB084XLES5TpOSRA3wjXAlIWMhum2kRcv/41Sn2emQ0dycQW4uZXLejwKvg6EsvbdlVL+FYEct7A==",
|
| 1985 |
+
"license": "ISC"
|
| 1986 |
+
},
|
| 1987 |
"node_modules/yargs": {
|
| 1988 |
"version": "17.7.2",
|
| 1989 |
"resolved": "https://registry.npmjs.org/yargs/-/yargs-17.7.2.tgz",
|
services/webrtc_gateway/package.json
CHANGED
|
@@ -12,10 +12,11 @@
|
|
| 12 |
"license": "ISC",
|
| 13 |
"description": "Servidor WebRTC unificado com Simple Peer conectando ao Ultravox/TTS",
|
| 14 |
"dependencies": {
|
| 15 |
-
"
|
| 16 |
-
"ws": "^8.18.3",
|
| 17 |
"@grpc/grpc-js": "^1.9.11",
|
| 18 |
-
"@grpc/proto-loader": "^0.7.10"
|
|
|
|
|
|
|
| 19 |
},
|
| 20 |
"devDependencies": {
|
| 21 |
"nodemon": "^3.0.1"
|
|
|
|
| 12 |
"license": "ISC",
|
| 13 |
"description": "Servidor WebRTC unificado com Simple Peer conectando ao Ultravox/TTS",
|
| 14 |
"dependencies": {
|
| 15 |
+
"@discordjs/opus": "^0.10.0",
|
|
|
|
| 16 |
"@grpc/grpc-js": "^1.9.11",
|
| 17 |
+
"@grpc/proto-loader": "^0.7.10",
|
| 18 |
+
"express": "^5.1.0",
|
| 19 |
+
"ws": "^8.18.3"
|
| 20 |
},
|
| 21 |
"devDependencies": {
|
| 22 |
"nodemon": "^3.0.1"
|
services/webrtc_gateway/response_1757390722112.pcm
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"type":"init","clientId":"yi5gt94jz1c5n6ky7t844","conversationId":"conv_1757390722110_31ca908303358733"}
|
services/webrtc_gateway/response_1757391966860.pcm
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"type":"init","clientId":"knc8cmsgwqddnn3diqw3do","conversationId":"conv_1757391966858_5d93e75e246743a2"}
|
services/webrtc_gateway/start.sh
CHANGED
|
@@ -60,8 +60,6 @@ source venv/bin/activate
|
|
| 60 |
# Configurar variáveis de ambiente
|
| 61 |
export PYTHONPATH=/workspace/ultravox-pipeline:/workspace/ultravox-pipeline/protos/generated
|
| 62 |
export WEBRTC_PORT=$PORT
|
| 63 |
-
export ORCHESTRATOR_HOST=localhost
|
| 64 |
-
export ORCHESTRATOR_PORT=50053
|
| 65 |
|
| 66 |
echo -e "${YELLOW}Porta: $PORT${NC}"
|
| 67 |
echo -e "${YELLOW}Log: $LOG_FILE${NC}"
|
|
|
|
| 60 |
# Configurar variáveis de ambiente
|
| 61 |
export PYTHONPATH=/workspace/ultravox-pipeline:/workspace/ultravox-pipeline/protos/generated
|
| 62 |
export WEBRTC_PORT=$PORT
|
|
|
|
|
|
|
| 63 |
|
| 64 |
echo -e "${YELLOW}Porta: $PORT${NC}"
|
| 65 |
echo -e "${YELLOW}Log: $LOG_FILE${NC}"
|
services/webrtc_gateway/test-audio-cli.js
ADDED
|
@@ -0,0 +1,178 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env node
|
| 2 |
+
|
| 3 |
+
/**
|
| 4 |
+
* Teste CLI para simular envio de áudio PCM ao servidor
|
| 5 |
+
* Similar ao que o navegador faz, mas via linha de comando
|
| 6 |
+
*/
|
| 7 |
+
|
| 8 |
+
const WebSocket = require('ws');
|
| 9 |
+
const fs = require('fs');
|
| 10 |
+
const path = require('path');
|
| 11 |
+
|
| 12 |
+
const WS_URL = 'ws://localhost:8082/ws';
|
| 13 |
+
|
| 14 |
+
class AudioTester {
|
| 15 |
+
constructor() {
|
| 16 |
+
this.ws = null;
|
| 17 |
+
this.conversationId = null;
|
| 18 |
+
this.clientId = null;
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
connect() {
|
| 22 |
+
return new Promise((resolve, reject) => {
|
| 23 |
+
console.log('🔌 Conectando ao WebSocket...');
|
| 24 |
+
|
| 25 |
+
this.ws = new WebSocket(WS_URL);
|
| 26 |
+
|
| 27 |
+
this.ws.on('open', () => {
|
| 28 |
+
console.log('✅ Conectado ao servidor');
|
| 29 |
+
resolve();
|
| 30 |
+
});
|
| 31 |
+
|
| 32 |
+
this.ws.on('error', (error) => {
|
| 33 |
+
console.error('❌ Erro:', error.message);
|
| 34 |
+
reject(error);
|
| 35 |
+
});
|
| 36 |
+
|
| 37 |
+
this.ws.on('message', (data) => {
|
| 38 |
+
// Verificar se é binário (áudio) ou JSON (mensagem)
|
| 39 |
+
if (data instanceof Buffer) {
|
| 40 |
+
console.log(`🔊 Áudio recebido: ${(data.length / 1024).toFixed(1)}KB`);
|
| 41 |
+
// Salvar áudio para análise
|
| 42 |
+
const filename = `response_${Date.now()}.pcm`;
|
| 43 |
+
fs.writeFileSync(filename, data);
|
| 44 |
+
console.log(` Salvo como: ${filename}`);
|
| 45 |
+
} else {
|
| 46 |
+
try {
|
| 47 |
+
const msg = JSON.parse(data);
|
| 48 |
+
console.log('📨 Mensagem recebida:', msg);
|
| 49 |
+
|
| 50 |
+
if (msg.type === 'init') {
|
| 51 |
+
this.clientId = msg.clientId;
|
| 52 |
+
this.conversationId = msg.conversationId;
|
| 53 |
+
console.log(`🔑 Client ID: ${this.clientId}`);
|
| 54 |
+
console.log(`🔑 Conversation ID: ${this.conversationId}`);
|
| 55 |
+
} else if (msg.type === 'metrics') {
|
| 56 |
+
console.log(`📊 Resposta: "${msg.response}" (${msg.latency}ms)`);
|
| 57 |
+
}
|
| 58 |
+
} catch (e) {
|
| 59 |
+
console.log('📨 Dados recebidos:', data.toString());
|
| 60 |
+
}
|
| 61 |
+
}
|
| 62 |
+
});
|
| 63 |
+
});
|
| 64 |
+
}
|
| 65 |
+
|
| 66 |
+
/**
|
| 67 |
+
* Gera áudio PCM sintético com tom de 440Hz (nota Lá)
|
| 68 |
+
* @param {number} durationMs - Duração em milissegundos
|
| 69 |
+
* @returns {Buffer} - Buffer PCM 16-bit @ 16kHz
|
| 70 |
+
*/
|
| 71 |
+
generateTestAudio(durationMs = 2000) {
|
| 72 |
+
const sampleRate = 16000;
|
| 73 |
+
const frequency = 440; // Hz (nota Lá)
|
| 74 |
+
const samples = Math.floor(sampleRate * durationMs / 1000);
|
| 75 |
+
const buffer = Buffer.alloc(samples * 2); // 16-bit = 2 bytes por sample
|
| 76 |
+
|
| 77 |
+
for (let i = 0; i < samples; i++) {
|
| 78 |
+
// Gerar onda senoidal
|
| 79 |
+
const t = i / sampleRate;
|
| 80 |
+
const value = Math.sin(2 * Math.PI * frequency * t);
|
| 81 |
+
|
| 82 |
+
// Converter para int16
|
| 83 |
+
const int16Value = Math.floor(value * 32767);
|
| 84 |
+
|
| 85 |
+
// Escrever no buffer (little-endian)
|
| 86 |
+
buffer.writeInt16LE(int16Value, i * 2);
|
| 87 |
+
}
|
| 88 |
+
|
| 89 |
+
return buffer;
|
| 90 |
+
}
|
| 91 |
+
|
| 92 |
+
/**
|
| 93 |
+
* Gera áudio de fala real usando espeak (se disponível)
|
| 94 |
+
*/
|
| 95 |
+
async generateSpeechAudio(text = "Olá, este é um teste de áudio") {
|
| 96 |
+
const { execSync } = require('child_process');
|
| 97 |
+
const tempFile = `/tmp/test_audio_${Date.now()}.raw`;
|
| 98 |
+
|
| 99 |
+
try {
|
| 100 |
+
// Usar espeak para gerar áudio
|
| 101 |
+
console.log(`🎤 Gerando áudio de fala: "${text}"`);
|
| 102 |
+
execSync(`espeak -s 150 -v pt-br "${text}" --stdout | sox - -r 16000 -b 16 -e signed-integer ${tempFile}`);
|
| 103 |
+
|
| 104 |
+
const audioBuffer = fs.readFileSync(tempFile);
|
| 105 |
+
fs.unlinkSync(tempFile); // Limpar arquivo temporário
|
| 106 |
+
|
| 107 |
+
return audioBuffer;
|
| 108 |
+
} catch (error) {
|
| 109 |
+
console.warn('⚠️ espeak/sox não disponível, usando áudio sintético');
|
| 110 |
+
return this.generateTestAudio(2000);
|
| 111 |
+
}
|
| 112 |
+
}
|
| 113 |
+
|
| 114 |
+
async sendAudio(audioBuffer) {
|
| 115 |
+
console.log(`\n📤 Enviando áudio PCM: ${(audioBuffer.length / 1024).toFixed(1)}KB`);
|
| 116 |
+
|
| 117 |
+
// Enviar como dados binários diretos (como o navegador faz)
|
| 118 |
+
this.ws.send(audioBuffer);
|
| 119 |
+
|
| 120 |
+
console.log('✅ Áudio enviado');
|
| 121 |
+
}
|
| 122 |
+
|
| 123 |
+
async testConversation() {
|
| 124 |
+
console.log('\n=== Iniciando teste de conversação ===\n');
|
| 125 |
+
|
| 126 |
+
// Teste 1: Enviar tom sintético
|
| 127 |
+
console.log('1️⃣ Teste com tom sintético (440Hz por 2s)');
|
| 128 |
+
const syntheticAudio = this.generateTestAudio(2000);
|
| 129 |
+
await this.sendAudio(syntheticAudio);
|
| 130 |
+
await this.wait(5000); // Aguardar resposta
|
| 131 |
+
|
| 132 |
+
// Teste 2: Enviar áudio de fala (se possível)
|
| 133 |
+
console.log('\n2️⃣ Teste com fala sintetizada');
|
| 134 |
+
const speechAudio = await this.generateSpeechAudio("Qual é o seu nome?");
|
| 135 |
+
await this.sendAudio(speechAudio);
|
| 136 |
+
await this.wait(5000); // Aguardar resposta
|
| 137 |
+
|
| 138 |
+
// Teste 3: Enviar silêncio
|
| 139 |
+
console.log('\n3️⃣ Teste com silêncio');
|
| 140 |
+
const silentAudio = Buffer.alloc(32000); // 1 segundo de silêncio
|
| 141 |
+
await this.sendAudio(silentAudio);
|
| 142 |
+
await this.wait(5000); // Aguardar resposta
|
| 143 |
+
}
|
| 144 |
+
|
| 145 |
+
wait(ms) {
|
| 146 |
+
return new Promise(resolve => setTimeout(resolve, ms));
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
disconnect() {
|
| 150 |
+
if (this.ws) {
|
| 151 |
+
console.log('\n👋 Desconectando...');
|
| 152 |
+
this.ws.close();
|
| 153 |
+
}
|
| 154 |
+
}
|
| 155 |
+
}
|
| 156 |
+
|
| 157 |
+
async function main() {
|
| 158 |
+
const tester = new AudioTester();
|
| 159 |
+
|
| 160 |
+
try {
|
| 161 |
+
await tester.connect();
|
| 162 |
+
await tester.wait(500);
|
| 163 |
+
await tester.testConversation();
|
| 164 |
+
await tester.wait(2000); // Aguardar últimas respostas
|
| 165 |
+
} catch (error) {
|
| 166 |
+
console.error('Erro fatal:', error);
|
| 167 |
+
} finally {
|
| 168 |
+
tester.disconnect();
|
| 169 |
+
}
|
| 170 |
+
}
|
| 171 |
+
|
| 172 |
+
console.log('╔═══════════════════════════════════════╗');
|
| 173 |
+
console.log('║ Teste CLI de Áudio PCM ║');
|
| 174 |
+
console.log('╚═══════════════════════════════════════╝\n');
|
| 175 |
+
console.log('Este teste simula o envio de áudio PCM');
|
| 176 |
+
console.log('como o navegador faz, mas via CLI.\n');
|
| 177 |
+
|
| 178 |
+
main().catch(console.error);
|
services/webrtc_gateway/test-memory.js
ADDED
|
@@ -0,0 +1,108 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env node
|
| 2 |
+
|
| 3 |
+
/**
|
| 4 |
+
* Teste do sistema de memória de conversações
|
| 5 |
+
*/
|
| 6 |
+
|
| 7 |
+
const WebSocket = require('ws');
|
| 8 |
+
|
| 9 |
+
const WS_URL = 'ws://localhost:8082/ws';
|
| 10 |
+
|
| 11 |
+
class MemoryTester {
|
| 12 |
+
constructor() {
|
| 13 |
+
this.ws = null;
|
| 14 |
+
this.conversationId = null;
|
| 15 |
+
}
|
| 16 |
+
|
| 17 |
+
connect() {
|
| 18 |
+
return new Promise((resolve, reject) => {
|
| 19 |
+
console.log('🔌 Conectando ao WebSocket...');
|
| 20 |
+
|
| 21 |
+
this.ws = new WebSocket(WS_URL);
|
| 22 |
+
|
| 23 |
+
this.ws.on('open', () => {
|
| 24 |
+
console.log('✅ Conectado');
|
| 25 |
+
resolve();
|
| 26 |
+
});
|
| 27 |
+
|
| 28 |
+
this.ws.on('error', (error) => {
|
| 29 |
+
console.error('❌ Erro:', error.message);
|
| 30 |
+
reject(error);
|
| 31 |
+
});
|
| 32 |
+
|
| 33 |
+
this.ws.on('message', (data) => {
|
| 34 |
+
const msg = JSON.parse(data);
|
| 35 |
+
console.log('📨 Mensagem recebida:', msg);
|
| 36 |
+
|
| 37 |
+
if (msg.type === 'init' && msg.conversationId) {
|
| 38 |
+
this.conversationId = msg.conversationId;
|
| 39 |
+
console.log(`🔑 Conversation ID: ${this.conversationId}`);
|
| 40 |
+
}
|
| 41 |
+
});
|
| 42 |
+
});
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
+
async testMemoryOperations() {
|
| 46 |
+
console.log('\n=== Testando Operações de Memória ===\n');
|
| 47 |
+
|
| 48 |
+
// 1. Obter conversação atual
|
| 49 |
+
console.log('1. Obtendo conversação atual...');
|
| 50 |
+
this.ws.send(JSON.stringify({ type: 'get-conversation' }));
|
| 51 |
+
await this.wait(1000);
|
| 52 |
+
|
| 53 |
+
// 2. Listar conversações
|
| 54 |
+
console.log('\n2. Listando conversações...');
|
| 55 |
+
this.ws.send(JSON.stringify({ type: 'list-conversations' }));
|
| 56 |
+
await this.wait(1000);
|
| 57 |
+
|
| 58 |
+
// 3. Obter estatísticas
|
| 59 |
+
console.log('\n3. Obtendo estatísticas de memória...');
|
| 60 |
+
this.ws.send(JSON.stringify({ type: 'get-stats' }));
|
| 61 |
+
await this.wait(1000);
|
| 62 |
+
|
| 63 |
+
// 4. Simular mensagem de áudio
|
| 64 |
+
console.log('\n4. Simulando processamento de áudio...');
|
| 65 |
+
const audioData = Buffer.alloc(1000); // Buffer vazio para teste
|
| 66 |
+
this.ws.send(JSON.stringify({
|
| 67 |
+
type: 'audio',
|
| 68 |
+
data: audioData.toString('base64')
|
| 69 |
+
}));
|
| 70 |
+
await this.wait(2000);
|
| 71 |
+
|
| 72 |
+
// 5. Verificar se mensagens foram armazenadas
|
| 73 |
+
console.log('\n5. Verificando mensagens armazenadas...');
|
| 74 |
+
this.ws.send(JSON.stringify({ type: 'get-conversation' }));
|
| 75 |
+
await this.wait(1000);
|
| 76 |
+
}
|
| 77 |
+
|
| 78 |
+
wait(ms) {
|
| 79 |
+
return new Promise(resolve => setTimeout(resolve, ms));
|
| 80 |
+
}
|
| 81 |
+
|
| 82 |
+
disconnect() {
|
| 83 |
+
if (this.ws) {
|
| 84 |
+
console.log('\n👋 Desconectando...');
|
| 85 |
+
this.ws.close();
|
| 86 |
+
}
|
| 87 |
+
}
|
| 88 |
+
}
|
| 89 |
+
|
| 90 |
+
async function main() {
|
| 91 |
+
const tester = new MemoryTester();
|
| 92 |
+
|
| 93 |
+
try {
|
| 94 |
+
await tester.connect();
|
| 95 |
+
await tester.wait(500);
|
| 96 |
+
await tester.testMemoryOperations();
|
| 97 |
+
} catch (error) {
|
| 98 |
+
console.error('Erro fatal:', error);
|
| 99 |
+
} finally {
|
| 100 |
+
tester.disconnect();
|
| 101 |
+
}
|
| 102 |
+
}
|
| 103 |
+
|
| 104 |
+
console.log('╔═══════════════════════════════════════╗');
|
| 105 |
+
console.log('║ Teste do Sistema de Memória ║');
|
| 106 |
+
console.log('╚═══════════════════════════════════════╝\n');
|
| 107 |
+
|
| 108 |
+
main().catch(console.error);
|
services/webrtc_gateway/test-portuguese-audio.js
ADDED
|
@@ -0,0 +1,410 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env node
|
| 2 |
+
|
| 3 |
+
/**
|
| 4 |
+
* Teste com áudio real em português usando gTTS
|
| 5 |
+
* Gera perguntas faladas e verifica coerência das respostas
|
| 6 |
+
*/
|
| 7 |
+
|
| 8 |
+
const WebSocket = require('ws');
|
| 9 |
+
const fs = require('fs');
|
| 10 |
+
const { exec, execSync } = require('child_process');
|
| 11 |
+
const path = require('path');
|
| 12 |
+
const util = require('util');
|
| 13 |
+
const execPromise = util.promisify(exec);
|
| 14 |
+
|
| 15 |
+
const WS_URL = 'ws://localhost:8082/ws';
|
| 16 |
+
|
| 17 |
+
// Cores para output
|
| 18 |
+
const colors = {
|
| 19 |
+
reset: '\x1b[0m',
|
| 20 |
+
bright: '\x1b[1m',
|
| 21 |
+
green: '\x1b[32m',
|
| 22 |
+
red: '\x1b[31m',
|
| 23 |
+
yellow: '\x1b[33m',
|
| 24 |
+
blue: '\x1b[34m',
|
| 25 |
+
cyan: '\x1b[36m',
|
| 26 |
+
magenta: '\x1b[35m'
|
| 27 |
+
};
|
| 28 |
+
|
| 29 |
+
class PortugueseAudioTester {
|
| 30 |
+
constructor() {
|
| 31 |
+
this.ws = null;
|
| 32 |
+
this.testResults = [];
|
| 33 |
+
this.currentTest = null;
|
| 34 |
+
this.responseBuffer = '';
|
| 35 |
+
}
|
| 36 |
+
|
| 37 |
+
async connect() {
|
| 38 |
+
return new Promise((resolve, reject) => {
|
| 39 |
+
console.log(`${colors.cyan}🔌 Conectando ao WebSocket...${colors.reset}`);
|
| 40 |
+
|
| 41 |
+
this.ws = new WebSocket(WS_URL);
|
| 42 |
+
|
| 43 |
+
this.ws.on('open', () => {
|
| 44 |
+
console.log(`${colors.green}✅ Conectado ao servidor${colors.reset}`);
|
| 45 |
+
resolve();
|
| 46 |
+
});
|
| 47 |
+
|
| 48 |
+
this.ws.on('error', (error) => {
|
| 49 |
+
console.error(`${colors.red}❌ Erro:${colors.reset}`, error.message);
|
| 50 |
+
reject(error);
|
| 51 |
+
});
|
| 52 |
+
|
| 53 |
+
this.ws.on('message', (data) => {
|
| 54 |
+
this.handleMessage(data);
|
| 55 |
+
});
|
| 56 |
+
});
|
| 57 |
+
}
|
| 58 |
+
|
| 59 |
+
handleMessage(data) {
|
| 60 |
+
// Verificar se é binário (áudio) ou JSON (mensagem)
|
| 61 |
+
if (Buffer.isBuffer(data)) {
|
| 62 |
+
console.log(`${colors.green}🔊 Áudio de resposta recebido: ${data.length} bytes${colors.reset}`);
|
| 63 |
+
if (this.currentTest) {
|
| 64 |
+
this.currentTest.audioReceived = true;
|
| 65 |
+
this.currentTest.audioSize = data.length;
|
| 66 |
+
}
|
| 67 |
+
return;
|
| 68 |
+
}
|
| 69 |
+
|
| 70 |
+
try {
|
| 71 |
+
const msg = JSON.parse(data);
|
| 72 |
+
|
| 73 |
+
switch (msg.type) {
|
| 74 |
+
case 'init':
|
| 75 |
+
case 'welcome':
|
| 76 |
+
console.log(`${colors.blue}🔑 Sessão iniciada: ${msg.clientId}${colors.reset}`);
|
| 77 |
+
break;
|
| 78 |
+
|
| 79 |
+
case 'metrics':
|
| 80 |
+
console.log(`${colors.yellow}📝 Resposta do sistema: "${msg.response}"${colors.reset}`);
|
| 81 |
+
if (this.currentTest) {
|
| 82 |
+
this.currentTest.response = msg.response;
|
| 83 |
+
this.currentTest.latency = msg.latency;
|
| 84 |
+
this.responseBuffer = msg.response;
|
| 85 |
+
}
|
| 86 |
+
break;
|
| 87 |
+
|
| 88 |
+
case 'response':
|
| 89 |
+
case 'transcription':
|
| 90 |
+
// Adicionar suporte para outros formatos de resposta
|
| 91 |
+
const text = msg.text || msg.response || msg.message;
|
| 92 |
+
if (text) {
|
| 93 |
+
console.log(`${colors.yellow}📝 Resposta: "${text}"${colors.reset}`);
|
| 94 |
+
if (this.currentTest) {
|
| 95 |
+
this.currentTest.response = text;
|
| 96 |
+
this.currentTest.latency = msg.latency || 0;
|
| 97 |
+
}
|
| 98 |
+
}
|
| 99 |
+
break;
|
| 100 |
+
|
| 101 |
+
case 'error':
|
| 102 |
+
console.error(`${colors.red}❌ Erro: ${msg.message}${colors.reset}`);
|
| 103 |
+
break;
|
| 104 |
+
}
|
| 105 |
+
} catch (error) {
|
| 106 |
+
// Dados de texto simples
|
| 107 |
+
const text = data.toString();
|
| 108 |
+
if (text.length > 0 && text.length < 200) {
|
| 109 |
+
console.log(`${colors.cyan}📨 Mensagem: ${text}${colors.reset}`);
|
| 110 |
+
}
|
| 111 |
+
}
|
| 112 |
+
}
|
| 113 |
+
|
| 114 |
+
/**
|
| 115 |
+
* Gera áudio MP3 usando gTTS e converte para PCM
|
| 116 |
+
* @param {string} text - Texto em português para converter
|
| 117 |
+
* @param {string} outputFile - Nome do arquivo de saída
|
| 118 |
+
*/
|
| 119 |
+
async generatePortugueseAudio(text, outputFile) {
|
| 120 |
+
console.log(`${colors.magenta}🎤 Gerando áudio: "${text}"${colors.reset}`);
|
| 121 |
+
|
| 122 |
+
const mp3File = outputFile.replace('.pcm', '.mp3');
|
| 123 |
+
|
| 124 |
+
try {
|
| 125 |
+
// Gerar MP3 com gTTS em português brasileiro
|
| 126 |
+
const gttsCommand = `gtts-cli "${text}" -l pt-br -o ${mp3File}`;
|
| 127 |
+
await execPromise(gttsCommand);
|
| 128 |
+
console.log(` ✅ MP3 gerado: ${mp3File}`);
|
| 129 |
+
|
| 130 |
+
// Converter MP3 para PCM 16-bit @ 16kHz
|
| 131 |
+
const ffmpegCommand = `ffmpeg -i ${mp3File} -f s16le -acodec pcm_s16le -ar 16000 -ac 1 ${outputFile} -y`;
|
| 132 |
+
await execPromise(ffmpegCommand);
|
| 133 |
+
console.log(` ✅ PCM gerado: ${outputFile}`);
|
| 134 |
+
|
| 135 |
+
// Limpar arquivo MP3 temporário
|
| 136 |
+
fs.unlinkSync(mp3File);
|
| 137 |
+
|
| 138 |
+
// Ler arquivo PCM
|
| 139 |
+
const pcmBuffer = fs.readFileSync(outputFile);
|
| 140 |
+
console.log(` 📊 Tamanho PCM: ${pcmBuffer.length} bytes`);
|
| 141 |
+
|
| 142 |
+
return pcmBuffer;
|
| 143 |
+
} catch (error) {
|
| 144 |
+
console.error(`${colors.red}❌ Erro gerando áudio: ${error.message}${colors.reset}`);
|
| 145 |
+
throw error;
|
| 146 |
+
}
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
async sendPortugueseQuestion(question, expectedContext) {
|
| 150 |
+
console.log(`\n${colors.bright}=== Teste: ${question} ===${colors.reset}`);
|
| 151 |
+
|
| 152 |
+
this.currentTest = {
|
| 153 |
+
question: question,
|
| 154 |
+
expectedContext: expectedContext,
|
| 155 |
+
startTime: Date.now(),
|
| 156 |
+
response: null,
|
| 157 |
+
audioReceived: false
|
| 158 |
+
};
|
| 159 |
+
|
| 160 |
+
try {
|
| 161 |
+
// Gerar áudio da pergunta
|
| 162 |
+
const audioFile = `/tmp/question_${Date.now()}.pcm`;
|
| 163 |
+
const pcmAudio = await this.generatePortugueseAudio(question, audioFile);
|
| 164 |
+
|
| 165 |
+
// Enviar áudio PCM diretamente
|
| 166 |
+
console.log(`${colors.cyan}📤 Enviando áudio PCM: ${pcmAudio.length} bytes${colors.reset}`);
|
| 167 |
+
this.ws.send(pcmAudio);
|
| 168 |
+
|
| 169 |
+
// Aguardar resposta
|
| 170 |
+
await this.waitForResponse(8000);
|
| 171 |
+
|
| 172 |
+
// Limpar arquivo temporário
|
| 173 |
+
if (fs.existsSync(audioFile)) {
|
| 174 |
+
fs.unlinkSync(audioFile);
|
| 175 |
+
}
|
| 176 |
+
|
| 177 |
+
// Avaliar resultado
|
| 178 |
+
this.evaluateTest();
|
| 179 |
+
|
| 180 |
+
} catch (error) {
|
| 181 |
+
console.error(`${colors.red}❌ Erro no teste: ${error.message}${colors.reset}`);
|
| 182 |
+
this.currentTest.error = error.message;
|
| 183 |
+
}
|
| 184 |
+
}
|
| 185 |
+
|
| 186 |
+
waitForResponse(timeoutMs) {
|
| 187 |
+
return new Promise((resolve) => {
|
| 188 |
+
const startTime = Date.now();
|
| 189 |
+
|
| 190 |
+
const checkInterval = setInterval(() => {
|
| 191 |
+
const elapsed = Date.now() - startTime;
|
| 192 |
+
|
| 193 |
+
// Verificar se recebemos resposta
|
| 194 |
+
if (this.currentTest.response || this.currentTest.audioReceived) {
|
| 195 |
+
clearInterval(checkInterval);
|
| 196 |
+
resolve();
|
| 197 |
+
} else if (elapsed > timeoutMs) {
|
| 198 |
+
clearInterval(checkInterval);
|
| 199 |
+
console.log(`${colors.yellow}⏱️ Timeout aguardando resposta${colors.reset}`);
|
| 200 |
+
resolve();
|
| 201 |
+
}
|
| 202 |
+
}, 100);
|
| 203 |
+
});
|
| 204 |
+
}
|
| 205 |
+
|
| 206 |
+
evaluateTest() {
|
| 207 |
+
const test = this.currentTest;
|
| 208 |
+
const responseTime = Date.now() - test.startTime;
|
| 209 |
+
|
| 210 |
+
console.log(`\n${colors.bright}📊 Resultado do Teste:${colors.reset}`);
|
| 211 |
+
console.log(` Pergunta: "${test.question}"`);
|
| 212 |
+
console.log(` Tempo de resposta: ${responseTime}ms`);
|
| 213 |
+
console.log(` Resposta recebida: ${test.response ? '✅' : '❌'}`);
|
| 214 |
+
console.log(` Áudio recebido: ${test.audioReceived ? '✅' : '❌'}`);
|
| 215 |
+
|
| 216 |
+
if (test.response) {
|
| 217 |
+
console.log(` Resposta: "${test.response}"`);
|
| 218 |
+
|
| 219 |
+
// Verificar coerência
|
| 220 |
+
const response = test.response.toLowerCase();
|
| 221 |
+
let isCoherent = false;
|
| 222 |
+
let coherenceReason = '';
|
| 223 |
+
|
| 224 |
+
// Verificar se a resposta contém palavras-chave esperadas
|
| 225 |
+
test.expectedContext.forEach(keyword => {
|
| 226 |
+
if (response.includes(keyword.toLowerCase())) {
|
| 227 |
+
isCoherent = true;
|
| 228 |
+
coherenceReason = `contém "${keyword}"`;
|
| 229 |
+
}
|
| 230 |
+
});
|
| 231 |
+
|
| 232 |
+
// Verificar se é uma resposta genérica válida
|
| 233 |
+
const validGenericResponses = [
|
| 234 |
+
'olá', 'oi', 'bom dia', 'boa tarde', 'boa noite',
|
| 235 |
+
'ajudar', 'assistente', 'posso', 'como',
|
| 236 |
+
'brasil', 'brasileiro', 'portuguesa',
|
| 237 |
+
'você', 'seu', 'sua', 'nome', 'chamar'
|
| 238 |
+
];
|
| 239 |
+
|
| 240 |
+
if (!isCoherent) {
|
| 241 |
+
validGenericResponses.forEach(word => {
|
| 242 |
+
if (response.includes(word)) {
|
| 243 |
+
isCoherent = true;
|
| 244 |
+
coherenceReason = `resposta válida com "${word}"`;
|
| 245 |
+
}
|
| 246 |
+
});
|
| 247 |
+
}
|
| 248 |
+
|
| 249 |
+
// Verificar se é uma resposta muito curta ou sem sentido
|
| 250 |
+
if (response.length < 5 || response.match(/^[0-9\s]+$/)) {
|
| 251 |
+
isCoherent = false;
|
| 252 |
+
coherenceReason = 'resposta muito curta ou inválida';
|
| 253 |
+
}
|
| 254 |
+
|
| 255 |
+
if (isCoherent) {
|
| 256 |
+
console.log(` ${colors.green}✅ Resposta COERENTE (${coherenceReason})${colors.reset}`);
|
| 257 |
+
} else {
|
| 258 |
+
console.log(` ${colors.red}❌ Resposta INCOERENTE (${coherenceReason})${colors.reset}`);
|
| 259 |
+
}
|
| 260 |
+
|
| 261 |
+
test.isCoherent = isCoherent;
|
| 262 |
+
} else {
|
| 263 |
+
test.isCoherent = false;
|
| 264 |
+
}
|
| 265 |
+
|
| 266 |
+
test.responseTime = responseTime;
|
| 267 |
+
test.passed = test.response && test.isCoherent;
|
| 268 |
+
|
| 269 |
+
this.testResults.push(test);
|
| 270 |
+
}
|
| 271 |
+
|
| 272 |
+
async runAllTests() {
|
| 273 |
+
console.log(`\n${colors.bright}${colors.cyan}🚀 Iniciando testes com áudio em português${colors.reset}\n`);
|
| 274 |
+
|
| 275 |
+
// Teste 1: Saudação
|
| 276 |
+
await this.sendPortugueseQuestion(
|
| 277 |
+
"Olá, bom dia",
|
| 278 |
+
['olá', 'oi', 'bom dia', 'prazer', 'ajudar']
|
| 279 |
+
);
|
| 280 |
+
await this.wait(2000);
|
| 281 |
+
|
| 282 |
+
// Teste 2: Pergunta sobre nome
|
| 283 |
+
await this.sendPortugueseQuestion(
|
| 284 |
+
"Qual é o seu nome?",
|
| 285 |
+
['nome', 'chamo', 'sou', 'assistente', 'ultravox']
|
| 286 |
+
);
|
| 287 |
+
await this.wait(2000);
|
| 288 |
+
|
| 289 |
+
// Teste 3: Pergunta sobre Brasil
|
| 290 |
+
await this.sendPortugueseQuestion(
|
| 291 |
+
"Qual é a capital do Brasil?",
|
| 292 |
+
['brasília', 'capital', 'brasil', 'distrito federal']
|
| 293 |
+
);
|
| 294 |
+
await this.wait(2000);
|
| 295 |
+
|
| 296 |
+
// Teste 4: Pergunta sobre ajuda
|
| 297 |
+
await this.sendPortugueseQuestion(
|
| 298 |
+
"Você pode me ajudar?",
|
| 299 |
+
['sim', 'posso', 'ajudar', 'claro', 'certamente', 'como']
|
| 300 |
+
);
|
| 301 |
+
await this.wait(2000);
|
| 302 |
+
|
| 303 |
+
// Teste 5: Pergunta sobre o dia
|
| 304 |
+
await this.sendPortugueseQuestion(
|
| 305 |
+
"Como está o dia hoje?",
|
| 306 |
+
['dia', 'hoje', 'tempo', 'clima', 'está']
|
| 307 |
+
);
|
| 308 |
+
|
| 309 |
+
// Mostrar resumo
|
| 310 |
+
this.showSummary();
|
| 311 |
+
}
|
| 312 |
+
|
| 313 |
+
showSummary() {
|
| 314 |
+
console.log(`\n${colors.bright}${colors.cyan}📈 RESUMO DOS TESTES${colors.reset}`);
|
| 315 |
+
console.log('═'.repeat(70));
|
| 316 |
+
|
| 317 |
+
let passed = 0;
|
| 318 |
+
let failed = 0;
|
| 319 |
+
|
| 320 |
+
this.testResults.forEach((test, index) => {
|
| 321 |
+
const status = test.passed ?
|
| 322 |
+
`${colors.green}✅ PASSOU${colors.reset}` :
|
| 323 |
+
`${colors.red}❌ FALHOU${colors.reset}`;
|
| 324 |
+
|
| 325 |
+
console.log(`\n${index + 1}. "${test.question}": ${status}`);
|
| 326 |
+
console.log(` Tempo: ${test.responseTime}ms`);
|
| 327 |
+
console.log(` Coerente: ${test.isCoherent ? 'Sim' : 'Não'}`);
|
| 328 |
+
|
| 329 |
+
if (test.response) {
|
| 330 |
+
const preview = test.response.substring(0, 100);
|
| 331 |
+
console.log(` Resposta: "${preview}${test.response.length > 100 ? '...' : ''}"`);
|
| 332 |
+
}
|
| 333 |
+
|
| 334 |
+
if (test.passed) passed++;
|
| 335 |
+
else failed++;
|
| 336 |
+
});
|
| 337 |
+
|
| 338 |
+
console.log('\n' + '═'.repeat(70));
|
| 339 |
+
console.log(`${colors.bright}Total: ${passed} passou, ${failed} falhou${colors.reset}`);
|
| 340 |
+
|
| 341 |
+
const successRate = (passed / this.testResults.length * 100).toFixed(1);
|
| 342 |
+
const rateColor = successRate >= 80 ? colors.green :
|
| 343 |
+
successRate >= 50 ? colors.yellow :
|
| 344 |
+
colors.red;
|
| 345 |
+
|
| 346 |
+
console.log(`${rateColor}Taxa de sucesso: ${successRate}%${colors.reset}\n`);
|
| 347 |
+
}
|
| 348 |
+
|
| 349 |
+
wait(ms) {
|
| 350 |
+
return new Promise(resolve => setTimeout(resolve, ms));
|
| 351 |
+
}
|
| 352 |
+
|
| 353 |
+
disconnect() {
|
| 354 |
+
if (this.ws) {
|
| 355 |
+
console.log(`${colors.cyan}👋 Desconectando...${colors.reset}`);
|
| 356 |
+
this.ws.close();
|
| 357 |
+
}
|
| 358 |
+
}
|
| 359 |
+
}
|
| 360 |
+
|
| 361 |
+
// Verificar dependências
|
| 362 |
+
function checkDependencies() {
|
| 363 |
+
try {
|
| 364 |
+
// Verificar gTTS
|
| 365 |
+
execSync('which gtts-cli', { stdio: 'ignore' });
|
| 366 |
+
console.log(`${colors.green}✅ gTTS instalado${colors.reset}`);
|
| 367 |
+
} catch {
|
| 368 |
+
console.error(`${colors.red}❌ gTTS não instalado!${colors.reset}`);
|
| 369 |
+
console.log(`${colors.yellow}Instale com: pip install gtts${colors.reset}`);
|
| 370 |
+
process.exit(1);
|
| 371 |
+
}
|
| 372 |
+
|
| 373 |
+
try {
|
| 374 |
+
// Verificar ffmpeg
|
| 375 |
+
execSync('which ffmpeg', { stdio: 'ignore' });
|
| 376 |
+
console.log(`${colors.green}✅ ffmpeg instalado${colors.reset}`);
|
| 377 |
+
} catch {
|
| 378 |
+
console.error(`${colors.red}❌ ffmpeg não instalado!${colors.reset}`);
|
| 379 |
+
console.log(`${colors.yellow}Instale com: sudo apt install ffmpeg${colors.reset}`);
|
| 380 |
+
process.exit(1);
|
| 381 |
+
}
|
| 382 |
+
}
|
| 383 |
+
|
| 384 |
+
// Executar testes
|
| 385 |
+
async function main() {
|
| 386 |
+
console.log(`${colors.bright}${colors.blue}╔═══════════════════════════════════════════════╗${colors.reset}`);
|
| 387 |
+
console.log(`${colors.bright}${colors.blue}║ Teste Ultravox - Áudio Português (gTTS) ║${colors.reset}`);
|
| 388 |
+
console.log(`${colors.bright}${colors.blue}╚═══════════════════════════════════════════════╝${colors.reset}\n`);
|
| 389 |
+
|
| 390 |
+
// Verificar dependências
|
| 391 |
+
checkDependencies();
|
| 392 |
+
console.log('');
|
| 393 |
+
|
| 394 |
+
const tester = new PortugueseAudioTester();
|
| 395 |
+
|
| 396 |
+
try {
|
| 397 |
+
await tester.connect();
|
| 398 |
+
await tester.wait(500);
|
| 399 |
+
await tester.runAllTests();
|
| 400 |
+
await tester.wait(2000); // Aguardar últimas respostas
|
| 401 |
+
} catch (error) {
|
| 402 |
+
console.error(`${colors.red}Erro fatal:${colors.reset}`, error);
|
| 403 |
+
} finally {
|
| 404 |
+
tester.disconnect();
|
| 405 |
+
process.exit(0);
|
| 406 |
+
}
|
| 407 |
+
}
|
| 408 |
+
|
| 409 |
+
// Iniciar
|
| 410 |
+
main().catch(console.error);
|
services/webrtc_gateway/test-websocket-speech.js
ADDED
|
@@ -0,0 +1,184 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env node
|
| 2 |
+
/**
|
| 3 |
+
* Teste automatizado de Speech-to-Speech via WebSocket
|
| 4 |
+
* Simula exatamente o que a página web deveria fazer
|
| 5 |
+
*/
|
| 6 |
+
|
| 7 |
+
const WebSocket = require('ws');
|
| 8 |
+
const fs = require('fs');
|
| 9 |
+
const path = require('path');
|
| 10 |
+
const { spawn } = require('child_process');
|
| 11 |
+
|
| 12 |
+
// Configuração
|
| 13 |
+
const WS_URL = 'ws://localhost:8082/ws';
|
| 14 |
+
const TEST_AUDIO_TEXT = "Quanto é dois mais dois?";
|
| 15 |
+
|
| 16 |
+
// Função para gerar áudio de teste usando gtts-cli
|
| 17 |
+
async function generateTestAudio(text) {
|
| 18 |
+
return new Promise((resolve, reject) => {
|
| 19 |
+
const tempFile = `/tmp/test_audio_${Date.now()}.mp3`;
|
| 20 |
+
const wavFile = `/tmp/test_audio_${Date.now()}.wav`;
|
| 21 |
+
|
| 22 |
+
console.log(`🎤 Gerando áudio de teste: "${text}"`);
|
| 23 |
+
|
| 24 |
+
// Gerar MP3 com gTTS
|
| 25 |
+
const gtts = spawn('gtts-cli', [text, '--lang', 'pt-br', '--output', tempFile]);
|
| 26 |
+
|
| 27 |
+
gtts.on('close', (code) => {
|
| 28 |
+
if (code !== 0) {
|
| 29 |
+
reject(new Error(`gTTS falhou com código ${code}`));
|
| 30 |
+
return;
|
| 31 |
+
}
|
| 32 |
+
|
| 33 |
+
// Converter MP3 para WAV PCM 16-bit @ 16kHz
|
| 34 |
+
const ffmpeg = spawn('ffmpeg', [
|
| 35 |
+
'-i', tempFile,
|
| 36 |
+
'-ar', '16000', // 16kHz
|
| 37 |
+
'-ac', '1', // Mono
|
| 38 |
+
'-c:a', 'pcm_s16le', // PCM 16-bit
|
| 39 |
+
wavFile,
|
| 40 |
+
'-y'
|
| 41 |
+
]);
|
| 42 |
+
|
| 43 |
+
ffmpeg.on('close', (code) => {
|
| 44 |
+
if (code !== 0) {
|
| 45 |
+
reject(new Error(`ffmpeg falhou com código ${code}`));
|
| 46 |
+
return;
|
| 47 |
+
}
|
| 48 |
+
|
| 49 |
+
// Ler o arquivo WAV
|
| 50 |
+
const audioBuffer = fs.readFileSync(wavFile);
|
| 51 |
+
|
| 52 |
+
// Remover header WAV (44 bytes)
|
| 53 |
+
const pcmData = audioBuffer.slice(44);
|
| 54 |
+
|
| 55 |
+
// Converter PCM int16 para Float32
|
| 56 |
+
const pcmInt16 = new Int16Array(pcmData.buffer, pcmData.byteOffset, pcmData.length / 2);
|
| 57 |
+
const pcmFloat32 = new Float32Array(pcmInt16.length);
|
| 58 |
+
|
| 59 |
+
for (let i = 0; i < pcmInt16.length; i++) {
|
| 60 |
+
pcmFloat32[i] = pcmInt16[i] / 32768.0; // Normalizar para -1.0 a 1.0
|
| 61 |
+
}
|
| 62 |
+
|
| 63 |
+
// Limpar arquivos temporários
|
| 64 |
+
fs.unlinkSync(tempFile);
|
| 65 |
+
fs.unlinkSync(wavFile);
|
| 66 |
+
|
| 67 |
+
console.log(`✅ Áudio gerado: ${pcmFloat32.length} amostras Float32`);
|
| 68 |
+
resolve(Buffer.from(pcmFloat32.buffer));
|
| 69 |
+
});
|
| 70 |
+
});
|
| 71 |
+
});
|
| 72 |
+
}
|
| 73 |
+
|
| 74 |
+
// Função principal do teste
|
| 75 |
+
async function testSpeechToSpeech() {
|
| 76 |
+
console.log('='.repeat(60));
|
| 77 |
+
console.log('🚀 TESTE AUTOMATIZADO SPEECH-TO-SPEECH VIA WEBSOCKET');
|
| 78 |
+
console.log('='.repeat(60));
|
| 79 |
+
|
| 80 |
+
try {
|
| 81 |
+
// Gerar áudio de teste
|
| 82 |
+
const audioBuffer = await generateTestAudio(TEST_AUDIO_TEXT);
|
| 83 |
+
|
| 84 |
+
// Conectar ao WebSocket
|
| 85 |
+
console.log(`\n📡 Conectando ao servidor: ${WS_URL}`);
|
| 86 |
+
const ws = new WebSocket(WS_URL);
|
| 87 |
+
|
| 88 |
+
return new Promise((resolve, reject) => {
|
| 89 |
+
let responseReceived = false;
|
| 90 |
+
let audioChunks = [];
|
| 91 |
+
|
| 92 |
+
ws.on('open', () => {
|
| 93 |
+
console.log('✅ Conectado ao servidor WebSocket');
|
| 94 |
+
|
| 95 |
+
// Enviar mensagem de tipo 'audio'
|
| 96 |
+
const message = {
|
| 97 |
+
type: 'audio',
|
| 98 |
+
data: audioBuffer.toString('base64'),
|
| 99 |
+
format: 'float32',
|
| 100 |
+
sampleRate: 16000,
|
| 101 |
+
sessionId: `test_${Date.now()}`
|
| 102 |
+
};
|
| 103 |
+
|
| 104 |
+
console.log(`📤 Enviando áudio: ${audioBuffer.length} bytes`);
|
| 105 |
+
ws.send(JSON.stringify(message));
|
| 106 |
+
});
|
| 107 |
+
|
| 108 |
+
ws.on('message', (data) => {
|
| 109 |
+
try {
|
| 110 |
+
const message = JSON.parse(data);
|
| 111 |
+
|
| 112 |
+
if (message.type === 'transcription') {
|
| 113 |
+
console.log(`📝 Transcrição recebida: "${message.text}"`);
|
| 114 |
+
responseReceived = true;
|
| 115 |
+
} else if (message.type === 'audio') {
|
| 116 |
+
// Áudio de resposta do TTS
|
| 117 |
+
const audioData = Buffer.from(message.data, 'base64');
|
| 118 |
+
audioChunks.push(audioData);
|
| 119 |
+
console.log(`🔊 Chunk de áudio recebido: ${audioData.length} bytes`);
|
| 120 |
+
|
| 121 |
+
if (message.isFinal) {
|
| 122 |
+
console.log('✅ Áudio completo recebido');
|
| 123 |
+
|
| 124 |
+
// Salvar áudio para verificação (opcional)
|
| 125 |
+
const outputFile = '/tmp/response_audio.pcm';
|
| 126 |
+
const fullAudio = Buffer.concat(audioChunks);
|
| 127 |
+
fs.writeFileSync(outputFile, fullAudio);
|
| 128 |
+
console.log(`💾 Áudio salvo em: ${outputFile}`);
|
| 129 |
+
|
| 130 |
+
ws.close();
|
| 131 |
+
resolve();
|
| 132 |
+
}
|
| 133 |
+
} else if (message.type === 'error') {
|
| 134 |
+
console.error(`❌ Erro do servidor: ${message.message}`);
|
| 135 |
+
ws.close();
|
| 136 |
+
reject(new Error(message.message));
|
| 137 |
+
}
|
| 138 |
+
} catch (error) {
|
| 139 |
+
console.error('❌ Erro ao processar mensagem:', error);
|
| 140 |
+
}
|
| 141 |
+
});
|
| 142 |
+
|
| 143 |
+
ws.on('error', (error) => {
|
| 144 |
+
console.error('❌ Erro WebSocket:', error);
|
| 145 |
+
reject(error);
|
| 146 |
+
});
|
| 147 |
+
|
| 148 |
+
ws.on('close', () => {
|
| 149 |
+
console.log('🔌 Conexão fechada');
|
| 150 |
+
if (!responseReceived) {
|
| 151 |
+
reject(new Error('Conexão fechada sem receber resposta'));
|
| 152 |
+
}
|
| 153 |
+
});
|
| 154 |
+
|
| 155 |
+
// Timeout
|
| 156 |
+
setTimeout(() => {
|
| 157 |
+
if (ws.readyState === WebSocket.OPEN) {
|
| 158 |
+
console.log('⏱️ Timeout - fechando conexão');
|
| 159 |
+
ws.close();
|
| 160 |
+
reject(new Error('Timeout na resposta'));
|
| 161 |
+
}
|
| 162 |
+
}, 30000);
|
| 163 |
+
});
|
| 164 |
+
|
| 165 |
+
} catch (error) {
|
| 166 |
+
console.error('❌ Erro no teste:', error);
|
| 167 |
+
throw error;
|
| 168 |
+
}
|
| 169 |
+
}
|
| 170 |
+
|
| 171 |
+
// Executar teste
|
| 172 |
+
testSpeechToSpeech()
|
| 173 |
+
.then(() => {
|
| 174 |
+
console.log('\n' + '='.repeat(60));
|
| 175 |
+
console.log('✅ TESTE CONCLUÍDO COM SUCESSO!');
|
| 176 |
+
console.log('='.repeat(60));
|
| 177 |
+
process.exit(0);
|
| 178 |
+
})
|
| 179 |
+
.catch((error) => {
|
| 180 |
+
console.error('\n' + '='.repeat(60));
|
| 181 |
+
console.error('❌ TESTE FALHOU:', error.message);
|
| 182 |
+
console.error('='.repeat(60));
|
| 183 |
+
process.exit(1);
|
| 184 |
+
});
|
services/webrtc_gateway/test-websocket.js
ADDED
|
@@ -0,0 +1,317 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env node
|
| 2 |
+
|
| 3 |
+
/**
|
| 4 |
+
* Teste automatizado de WebSocket para validar respostas do Ultravox
|
| 5 |
+
* Simula conexões WebRTC e envia áudio de teste
|
| 6 |
+
*/
|
| 7 |
+
|
| 8 |
+
const WebSocket = require('ws');
|
| 9 |
+
const fs = require('fs');
|
| 10 |
+
const path = require('path');
|
| 11 |
+
|
| 12 |
+
// Configuração
|
| 13 |
+
const WS_URL = 'ws://localhost:8082/ws';
|
| 14 |
+
const SAMPLE_RATE = 16000;
|
| 15 |
+
const BITS_PER_SAMPLE = 16;
|
| 16 |
+
const CHANNELS = 1;
|
| 17 |
+
|
| 18 |
+
// Cores para output
|
| 19 |
+
const colors = {
|
| 20 |
+
reset: '\x1b[0m',
|
| 21 |
+
bright: '\x1b[1m',
|
| 22 |
+
green: '\x1b[32m',
|
| 23 |
+
red: '\x1b[31m',
|
| 24 |
+
yellow: '\x1b[33m',
|
| 25 |
+
blue: '\x1b[34m',
|
| 26 |
+
cyan: '\x1b[36m'
|
| 27 |
+
};
|
| 28 |
+
|
| 29 |
+
// Função para gerar áudio de teste (silêncio com alguns pulsos)
|
| 30 |
+
function generateTestAudio(durationMs = 1000) {
|
| 31 |
+
const samples = Math.floor((SAMPLE_RATE * durationMs) / 1000);
|
| 32 |
+
const buffer = Buffer.alloc(samples * 2); // 16-bit = 2 bytes per sample
|
| 33 |
+
|
| 34 |
+
// Adicionar alguns pulsos para simular fala
|
| 35 |
+
for (let i = 0; i < samples; i++) {
|
| 36 |
+
let value = 0;
|
| 37 |
+
|
| 38 |
+
// Criar padrão de "fala" simulada
|
| 39 |
+
if (i % 100 < 50) {
|
| 40 |
+
value = Math.sin(2 * Math.PI * 440 * i / SAMPLE_RATE) * 1000;
|
| 41 |
+
value += Math.sin(2 * Math.PI * 880 * i / SAMPLE_RATE) * 500;
|
| 42 |
+
value += (Math.random() - 0.5) * 200; // Adicionar ruído
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
+
// Converter para int16
|
| 46 |
+
const int16Value = Math.max(-32768, Math.min(32767, Math.floor(value)));
|
| 47 |
+
buffer.writeInt16LE(int16Value, i * 2);
|
| 48 |
+
}
|
| 49 |
+
|
| 50 |
+
return buffer;
|
| 51 |
+
}
|
| 52 |
+
|
| 53 |
+
// Classe de teste
|
| 54 |
+
class WebSocketTester {
|
| 55 |
+
constructor() {
|
| 56 |
+
this.ws = null;
|
| 57 |
+
this.testResults = [];
|
| 58 |
+
this.currentTest = null;
|
| 59 |
+
}
|
| 60 |
+
|
| 61 |
+
connect() {
|
| 62 |
+
return new Promise((resolve, reject) => {
|
| 63 |
+
console.log(`${colors.cyan}🔌 Conectando ao WebSocket...${colors.reset}`);
|
| 64 |
+
|
| 65 |
+
this.ws = new WebSocket(WS_URL);
|
| 66 |
+
|
| 67 |
+
this.ws.on('open', () => {
|
| 68 |
+
console.log(`${colors.green}✅ Conectado ao servidor${colors.reset}`);
|
| 69 |
+
resolve();
|
| 70 |
+
});
|
| 71 |
+
|
| 72 |
+
this.ws.on('error', (error) => {
|
| 73 |
+
console.error(`${colors.red}❌ Erro de conexão:${colors.reset}`, error.message);
|
| 74 |
+
reject(error);
|
| 75 |
+
});
|
| 76 |
+
|
| 77 |
+
this.ws.on('message', (data) => {
|
| 78 |
+
this.handleMessage(data);
|
| 79 |
+
});
|
| 80 |
+
});
|
| 81 |
+
}
|
| 82 |
+
|
| 83 |
+
handleMessage(data) {
|
| 84 |
+
// Verificar se é binário (áudio) ou JSON (mensagem)
|
| 85 |
+
if (Buffer.isBuffer(data)) {
|
| 86 |
+
console.log(`${colors.green}🔊 Áudio binário recebido: ${data.length} bytes${colors.reset}`);
|
| 87 |
+
if (this.currentTest) {
|
| 88 |
+
this.currentTest.audioReceived = true;
|
| 89 |
+
this.currentTest.audioSize = data.length;
|
| 90 |
+
// Assumir que o áudio contém a resposta
|
| 91 |
+
this.currentTest.transcription = '[Resposta de áudio recebida]';
|
| 92 |
+
}
|
| 93 |
+
return;
|
| 94 |
+
}
|
| 95 |
+
|
| 96 |
+
try {
|
| 97 |
+
const msg = JSON.parse(data);
|
| 98 |
+
|
| 99 |
+
switch (msg.type) {
|
| 100 |
+
case 'init':
|
| 101 |
+
case 'welcome':
|
| 102 |
+
console.log(`${colors.blue}👋 Cliente ID: ${msg.clientId}${colors.reset}`);
|
| 103 |
+
break;
|
| 104 |
+
|
| 105 |
+
case 'metrics':
|
| 106 |
+
console.log(`${colors.yellow}📝 Resposta: "${msg.response}"${colors.reset}`);
|
| 107 |
+
if (this.currentTest) {
|
| 108 |
+
this.currentTest.transcription = msg.response;
|
| 109 |
+
this.currentTest.latency = msg.latency;
|
| 110 |
+
}
|
| 111 |
+
break;
|
| 112 |
+
|
| 113 |
+
case 'transcription':
|
| 114 |
+
console.log(`${colors.yellow}📝 Transcrição: "${msg.text}"${colors.reset}`);
|
| 115 |
+
if (this.currentTest) {
|
| 116 |
+
this.currentTest.transcription = msg.text;
|
| 117 |
+
this.currentTest.latency = msg.latency;
|
| 118 |
+
}
|
| 119 |
+
break;
|
| 120 |
+
|
| 121 |
+
case 'audio':
|
| 122 |
+
console.log(`${colors.green}🔊 Áudio recebido: ${msg.size} bytes${colors.reset}`);
|
| 123 |
+
if (this.currentTest) {
|
| 124 |
+
this.currentTest.audioReceived = true;
|
| 125 |
+
this.currentTest.audioSize = msg.size;
|
| 126 |
+
}
|
| 127 |
+
break;
|
| 128 |
+
|
| 129 |
+
case 'error':
|
| 130 |
+
console.error(`${colors.red}❌ Erro do servidor: ${msg.message}${colors.reset}`);
|
| 131 |
+
if (this.currentTest) {
|
| 132 |
+
this.currentTest.error = msg.message;
|
| 133 |
+
}
|
| 134 |
+
break;
|
| 135 |
+
}
|
| 136 |
+
} catch (error) {
|
| 137 |
+
console.log(`${colors.cyan}📨 Dados recebidos: ${data.toString().substring(0, 100)}...${colors.reset}`);
|
| 138 |
+
}
|
| 139 |
+
}
|
| 140 |
+
|
| 141 |
+
async sendAudioTest(testName, systemPrompt = '') {
|
| 142 |
+
console.log(`\n${colors.bright}=== Teste: ${testName} ===${colors.reset}`);
|
| 143 |
+
|
| 144 |
+
this.currentTest = {
|
| 145 |
+
name: testName,
|
| 146 |
+
systemPrompt: systemPrompt,
|
| 147 |
+
startTime: Date.now(),
|
| 148 |
+
transcription: null,
|
| 149 |
+
audioReceived: false
|
| 150 |
+
};
|
| 151 |
+
|
| 152 |
+
// Enviar áudio de teste
|
| 153 |
+
const audioData = generateTestAudio(1500); // 1.5 segundos
|
| 154 |
+
|
| 155 |
+
console.log(`${colors.cyan}📤 Enviando áudio PCM direto: ${audioData.length} bytes${colors.reset}`);
|
| 156 |
+
|
| 157 |
+
// Enviar dados binários PCM diretamente (como o navegador faz)
|
| 158 |
+
this.ws.send(audioData);
|
| 159 |
+
|
| 160 |
+
// Aguardar resposta
|
| 161 |
+
await this.waitForResponse(5000);
|
| 162 |
+
|
| 163 |
+
// Avaliar resultado
|
| 164 |
+
this.evaluateTest();
|
| 165 |
+
}
|
| 166 |
+
|
| 167 |
+
waitForResponse(timeoutMs) {
|
| 168 |
+
return new Promise((resolve) => {
|
| 169 |
+
const startTime = Date.now();
|
| 170 |
+
|
| 171 |
+
const checkInterval = setInterval(() => {
|
| 172 |
+
const elapsed = Date.now() - startTime;
|
| 173 |
+
|
| 174 |
+
// Verificar se recebemos resposta completa
|
| 175 |
+
if (this.currentTest.transcription && this.currentTest.audioReceived) {
|
| 176 |
+
clearInterval(checkInterval);
|
| 177 |
+
resolve();
|
| 178 |
+
} else if (elapsed > timeoutMs) {
|
| 179 |
+
clearInterval(checkInterval);
|
| 180 |
+
console.log(`${colors.yellow}⏱️ Timeout aguardando resposta${colors.reset}`);
|
| 181 |
+
resolve();
|
| 182 |
+
}
|
| 183 |
+
}, 100);
|
| 184 |
+
});
|
| 185 |
+
}
|
| 186 |
+
|
| 187 |
+
evaluateTest() {
|
| 188 |
+
const test = this.currentTest;
|
| 189 |
+
const responseTime = Date.now() - test.startTime;
|
| 190 |
+
|
| 191 |
+
console.log(`\n${colors.bright}📊 Resultado do Teste:${colors.reset}`);
|
| 192 |
+
console.log(` Tempo de resposta: ${responseTime}ms`);
|
| 193 |
+
console.log(` Transcrição recebida: ${test.transcription ? '✅' : '❌'}`);
|
| 194 |
+
console.log(` Áudio recebido: ${test.audioReceived ? '✅' : '❌'}`);
|
| 195 |
+
|
| 196 |
+
// Verificar coerência da resposta
|
| 197 |
+
let isCoherent = false;
|
| 198 |
+
if (test.transcription) {
|
| 199 |
+
// Verificar se não contém "Brasília" ou respostas aleatórias
|
| 200 |
+
const problematicPhrases = [
|
| 201 |
+
'capital do brasil',
|
| 202 |
+
'brasília',
|
| 203 |
+
'cidade mais populosa',
|
| 204 |
+
'região centro-oeste',
|
| 205 |
+
'rio de janeiro',
|
| 206 |
+
'são paulo'
|
| 207 |
+
];
|
| 208 |
+
|
| 209 |
+
const lowerTranscription = test.transcription.toLowerCase();
|
| 210 |
+
const hasProblematicContent = problematicPhrases.some(phrase =>
|
| 211 |
+
lowerTranscription.includes(phrase)
|
| 212 |
+
);
|
| 213 |
+
|
| 214 |
+
if (hasProblematicContent) {
|
| 215 |
+
console.log(` ${colors.red}⚠️ Resposta contém conteúdo problemático${colors.reset}`);
|
| 216 |
+
isCoherent = false;
|
| 217 |
+
} else {
|
| 218 |
+
console.log(` ${colors.green}✅ Resposta parece coerente${colors.reset}`);
|
| 219 |
+
isCoherent = true;
|
| 220 |
+
}
|
| 221 |
+
}
|
| 222 |
+
|
| 223 |
+
test.responseTime = responseTime;
|
| 224 |
+
test.isCoherent = isCoherent;
|
| 225 |
+
test.passed = test.transcription && test.audioReceived && isCoherent;
|
| 226 |
+
|
| 227 |
+
this.testResults.push(test);
|
| 228 |
+
}
|
| 229 |
+
|
| 230 |
+
async runAllTests() {
|
| 231 |
+
console.log(`\n${colors.bright}${colors.cyan}🚀 Iniciando bateria de testes${colors.reset}\n`);
|
| 232 |
+
|
| 233 |
+
// Teste 1: Sem prompt de sistema
|
| 234 |
+
await this.sendAudioTest('Sem prompt de sistema', '');
|
| 235 |
+
await this.wait(1000);
|
| 236 |
+
|
| 237 |
+
// Teste 2: Com prompt simples
|
| 238 |
+
await this.sendAudioTest('Prompt simples', 'Você é um assistente útil');
|
| 239 |
+
await this.wait(1000);
|
| 240 |
+
|
| 241 |
+
// Teste 3: Prompt vazio explícito
|
| 242 |
+
await this.sendAudioTest('Prompt vazio explícito', '');
|
| 243 |
+
await this.wait(1000);
|
| 244 |
+
|
| 245 |
+
// Mostrar resumo
|
| 246 |
+
this.showSummary();
|
| 247 |
+
}
|
| 248 |
+
|
| 249 |
+
showSummary() {
|
| 250 |
+
console.log(`\n${colors.bright}${colors.cyan}📈 RESUMO DOS TESTES${colors.reset}`);
|
| 251 |
+
console.log('═'.repeat(60));
|
| 252 |
+
|
| 253 |
+
let passed = 0;
|
| 254 |
+
let failed = 0;
|
| 255 |
+
|
| 256 |
+
this.testResults.forEach((test, index) => {
|
| 257 |
+
const status = test.passed ?
|
| 258 |
+
`${colors.green}✅ PASSOU${colors.reset}` :
|
| 259 |
+
`${colors.red}❌ FALHOU${colors.reset}`;
|
| 260 |
+
|
| 261 |
+
console.log(`\n${index + 1}. ${test.name}: ${status}`);
|
| 262 |
+
console.log(` Tempo: ${test.responseTime}ms`);
|
| 263 |
+
console.log(` Coerente: ${test.isCoherent ? 'Sim' : 'Não'}`);
|
| 264 |
+
|
| 265 |
+
if (test.transcription) {
|
| 266 |
+
console.log(` Resposta: "${test.transcription.substring(0, 80)}..."`);
|
| 267 |
+
}
|
| 268 |
+
|
| 269 |
+
if (test.passed) passed++;
|
| 270 |
+
else failed++;
|
| 271 |
+
});
|
| 272 |
+
|
| 273 |
+
console.log('\n' + '═'.repeat(60));
|
| 274 |
+
console.log(`${colors.bright}Total: ${passed} passou, ${failed} falhou${colors.reset}`);
|
| 275 |
+
|
| 276 |
+
const successRate = (passed / this.testResults.length * 100).toFixed(1);
|
| 277 |
+
const rateColor = successRate >= 80 ? colors.green :
|
| 278 |
+
successRate >= 50 ? colors.yellow :
|
| 279 |
+
colors.red;
|
| 280 |
+
|
| 281 |
+
console.log(`${rateColor}Taxa de sucesso: ${successRate}%${colors.reset}\n`);
|
| 282 |
+
}
|
| 283 |
+
|
| 284 |
+
wait(ms) {
|
| 285 |
+
return new Promise(resolve => setTimeout(resolve, ms));
|
| 286 |
+
}
|
| 287 |
+
|
| 288 |
+
disconnect() {
|
| 289 |
+
if (this.ws) {
|
| 290 |
+
console.log(`${colors.cyan}👋 Desconectando...${colors.reset}`);
|
| 291 |
+
this.ws.close();
|
| 292 |
+
}
|
| 293 |
+
}
|
| 294 |
+
}
|
| 295 |
+
|
| 296 |
+
// Executar testes
|
| 297 |
+
async function main() {
|
| 298 |
+
const tester = new WebSocketTester();
|
| 299 |
+
|
| 300 |
+
try {
|
| 301 |
+
await tester.connect();
|
| 302 |
+
await tester.wait(500); // Dar tempo para estabilizar
|
| 303 |
+
await tester.runAllTests();
|
| 304 |
+
} catch (error) {
|
| 305 |
+
console.error(`${colors.red}Erro fatal:${colors.reset}`, error);
|
| 306 |
+
} finally {
|
| 307 |
+
tester.disconnect();
|
| 308 |
+
process.exit(0);
|
| 309 |
+
}
|
| 310 |
+
}
|
| 311 |
+
|
| 312 |
+
// Iniciar
|
| 313 |
+
console.log(`${colors.bright}${colors.blue}╔═══════════════════════════════════════╗${colors.reset}`);
|
| 314 |
+
console.log(`${colors.bright}${colors.blue}║ Teste WebSocket - Ultravox Chat ║${colors.reset}`);
|
| 315 |
+
console.log(`${colors.bright}${colors.blue}╚═══════════════════════════════════════╝${colors.reset}\n`);
|
| 316 |
+
|
| 317 |
+
main().catch(console.error);
|
services/webrtc_gateway/ultravox-chat-backup.html
ADDED
|
@@ -0,0 +1,964 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="pt-BR">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>Ultravox Chat PCM - Otimizado</title>
|
| 7 |
+
<script src="opus-decoder.js"></script>
|
| 8 |
+
<style>
|
| 9 |
+
* {
|
| 10 |
+
margin: 0;
|
| 11 |
+
padding: 0;
|
| 12 |
+
box-sizing: border-box;
|
| 13 |
+
}
|
| 14 |
+
|
| 15 |
+
body {
|
| 16 |
+
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, sans-serif;
|
| 17 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 18 |
+
min-height: 100vh;
|
| 19 |
+
display: flex;
|
| 20 |
+
justify-content: center;
|
| 21 |
+
align-items: center;
|
| 22 |
+
padding: 20px;
|
| 23 |
+
}
|
| 24 |
+
|
| 25 |
+
.container {
|
| 26 |
+
background: white;
|
| 27 |
+
border-radius: 20px;
|
| 28 |
+
box-shadow: 0 20px 60px rgba(0,0,0,0.3);
|
| 29 |
+
padding: 40px;
|
| 30 |
+
max-width: 600px;
|
| 31 |
+
width: 100%;
|
| 32 |
+
}
|
| 33 |
+
|
| 34 |
+
h1 {
|
| 35 |
+
text-align: center;
|
| 36 |
+
color: #333;
|
| 37 |
+
margin-bottom: 30px;
|
| 38 |
+
font-size: 28px;
|
| 39 |
+
}
|
| 40 |
+
|
| 41 |
+
.status {
|
| 42 |
+
background: #f8f9fa;
|
| 43 |
+
border-radius: 10px;
|
| 44 |
+
padding: 15px;
|
| 45 |
+
margin-bottom: 20px;
|
| 46 |
+
display: flex;
|
| 47 |
+
align-items: center;
|
| 48 |
+
justify-content: space-between;
|
| 49 |
+
}
|
| 50 |
+
|
| 51 |
+
.status-dot {
|
| 52 |
+
width: 12px;
|
| 53 |
+
height: 12px;
|
| 54 |
+
border-radius: 50%;
|
| 55 |
+
background: #dc3545;
|
| 56 |
+
margin-right: 10px;
|
| 57 |
+
display: inline-block;
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
.status-dot.connected {
|
| 61 |
+
background: #28a745;
|
| 62 |
+
animation: pulse 2s infinite;
|
| 63 |
+
}
|
| 64 |
+
|
| 65 |
+
@keyframes pulse {
|
| 66 |
+
0% { box-shadow: 0 0 0 0 rgba(40, 167, 69, 0.7); }
|
| 67 |
+
70% { box-shadow: 0 0 0 10px rgba(40, 167, 69, 0); }
|
| 68 |
+
100% { box-shadow: 0 0 0 0 rgba(40, 167, 69, 0); }
|
| 69 |
+
}
|
| 70 |
+
|
| 71 |
+
.controls {
|
| 72 |
+
display: flex;
|
| 73 |
+
gap: 10px;
|
| 74 |
+
margin-bottom: 20px;
|
| 75 |
+
}
|
| 76 |
+
|
| 77 |
+
.voice-selector {
|
| 78 |
+
display: flex;
|
| 79 |
+
align-items: center;
|
| 80 |
+
gap: 10px;
|
| 81 |
+
margin-bottom: 20px;
|
| 82 |
+
padding: 10px;
|
| 83 |
+
background: #f8f9fa;
|
| 84 |
+
border-radius: 10px;
|
| 85 |
+
}
|
| 86 |
+
|
| 87 |
+
.voice-selector label {
|
| 88 |
+
font-weight: 600;
|
| 89 |
+
color: #555;
|
| 90 |
+
}
|
| 91 |
+
|
| 92 |
+
.voice-selector select {
|
| 93 |
+
flex: 1;
|
| 94 |
+
padding: 8px;
|
| 95 |
+
border: 2px solid #ddd;
|
| 96 |
+
border-radius: 5px;
|
| 97 |
+
font-size: 14px;
|
| 98 |
+
background: white;
|
| 99 |
+
cursor: pointer;
|
| 100 |
+
}
|
| 101 |
+
|
| 102 |
+
.voice-selector select:focus {
|
| 103 |
+
outline: none;
|
| 104 |
+
border-color: #667eea;
|
| 105 |
+
}
|
| 106 |
+
|
| 107 |
+
button {
|
| 108 |
+
flex: 1;
|
| 109 |
+
padding: 15px;
|
| 110 |
+
border: none;
|
| 111 |
+
border-radius: 10px;
|
| 112 |
+
font-size: 16px;
|
| 113 |
+
font-weight: 600;
|
| 114 |
+
cursor: pointer;
|
| 115 |
+
transition: all 0.3s ease;
|
| 116 |
+
}
|
| 117 |
+
|
| 118 |
+
button:disabled {
|
| 119 |
+
opacity: 0.5;
|
| 120 |
+
cursor: not-allowed;
|
| 121 |
+
}
|
| 122 |
+
|
| 123 |
+
.btn-primary {
|
| 124 |
+
background: #007bff;
|
| 125 |
+
color: white;
|
| 126 |
+
}
|
| 127 |
+
|
| 128 |
+
.btn-primary:hover:not(:disabled) {
|
| 129 |
+
background: #0056b3;
|
| 130 |
+
transform: translateY(-2px);
|
| 131 |
+
box-shadow: 0 5px 15px rgba(0,123,255,0.3);
|
| 132 |
+
}
|
| 133 |
+
|
| 134 |
+
.btn-danger {
|
| 135 |
+
background: #dc3545;
|
| 136 |
+
color: white;
|
| 137 |
+
}
|
| 138 |
+
|
| 139 |
+
.btn-danger:hover:not(:disabled) {
|
| 140 |
+
background: #c82333;
|
| 141 |
+
}
|
| 142 |
+
|
| 143 |
+
.btn-success {
|
| 144 |
+
background: #28a745;
|
| 145 |
+
color: white;
|
| 146 |
+
}
|
| 147 |
+
|
| 148 |
+
.btn-success.recording {
|
| 149 |
+
background: #dc3545;
|
| 150 |
+
animation: recordPulse 1s infinite;
|
| 151 |
+
}
|
| 152 |
+
|
| 153 |
+
@keyframes recordPulse {
|
| 154 |
+
0%, 100% { opacity: 1; }
|
| 155 |
+
50% { opacity: 0.7; }
|
| 156 |
+
}
|
| 157 |
+
|
| 158 |
+
.metrics {
|
| 159 |
+
display: grid;
|
| 160 |
+
grid-template-columns: repeat(3, 1fr);
|
| 161 |
+
gap: 15px;
|
| 162 |
+
margin-bottom: 20px;
|
| 163 |
+
}
|
| 164 |
+
|
| 165 |
+
.metric {
|
| 166 |
+
background: #f8f9fa;
|
| 167 |
+
padding: 15px;
|
| 168 |
+
border-radius: 10px;
|
| 169 |
+
text-align: center;
|
| 170 |
+
}
|
| 171 |
+
|
| 172 |
+
.metric-label {
|
| 173 |
+
font-size: 12px;
|
| 174 |
+
color: #6c757d;
|
| 175 |
+
margin-bottom: 5px;
|
| 176 |
+
}
|
| 177 |
+
|
| 178 |
+
.metric-value {
|
| 179 |
+
font-size: 24px;
|
| 180 |
+
font-weight: bold;
|
| 181 |
+
color: #333;
|
| 182 |
+
}
|
| 183 |
+
|
| 184 |
+
.log {
|
| 185 |
+
background: #f8f9fa;
|
| 186 |
+
border-radius: 10px;
|
| 187 |
+
padding: 20px;
|
| 188 |
+
height: 300px;
|
| 189 |
+
overflow-y: auto;
|
| 190 |
+
font-family: 'Monaco', 'Menlo', monospace;
|
| 191 |
+
font-size: 12px;
|
| 192 |
+
}
|
| 193 |
+
|
| 194 |
+
.log-entry {
|
| 195 |
+
padding: 5px 0;
|
| 196 |
+
border-bottom: 1px solid #e9ecef;
|
| 197 |
+
display: flex;
|
| 198 |
+
align-items: flex-start;
|
| 199 |
+
}
|
| 200 |
+
|
| 201 |
+
.log-time {
|
| 202 |
+
color: #6c757d;
|
| 203 |
+
margin-right: 10px;
|
| 204 |
+
flex-shrink: 0;
|
| 205 |
+
}
|
| 206 |
+
|
| 207 |
+
.log-message {
|
| 208 |
+
flex: 1;
|
| 209 |
+
}
|
| 210 |
+
|
| 211 |
+
.log-entry.error { color: #dc3545; }
|
| 212 |
+
.log-entry.success { color: #28a745; }
|
| 213 |
+
.log-entry.info { color: #007bff; }
|
| 214 |
+
.log-entry.warning { color: #ffc107; }
|
| 215 |
+
|
| 216 |
+
.audio-player {
|
| 217 |
+
display: inline-flex;
|
| 218 |
+
align-items: center;
|
| 219 |
+
gap: 10px;
|
| 220 |
+
margin-left: 10px;
|
| 221 |
+
}
|
| 222 |
+
|
| 223 |
+
.play-btn {
|
| 224 |
+
background: #007bff;
|
| 225 |
+
color: white;
|
| 226 |
+
border: none;
|
| 227 |
+
border-radius: 5px;
|
| 228 |
+
padding: 5px 10px;
|
| 229 |
+
cursor: pointer;
|
| 230 |
+
font-size: 12px;
|
| 231 |
+
}
|
| 232 |
+
|
| 233 |
+
.play-btn:hover {
|
| 234 |
+
background: #0056b3;
|
| 235 |
+
}
|
| 236 |
+
</style>
|
| 237 |
+
</head>
|
| 238 |
+
<body>
|
| 239 |
+
<div class="container">
|
| 240 |
+
<h1>🚀 Ultravox PCM - Otimizado</h1>
|
| 241 |
+
|
| 242 |
+
<div class="status">
|
| 243 |
+
<div>
|
| 244 |
+
<span class="status-dot" id="statusDot"></span>
|
| 245 |
+
<span id="statusText">Desconectado</span>
|
| 246 |
+
</div>
|
| 247 |
+
<span id="latencyText">Latência: --ms</span>
|
| 248 |
+
</div>
|
| 249 |
+
|
| 250 |
+
<div class="voice-selector">
|
| 251 |
+
<label for="voiceSelect">🔊 Voz TTS:</label>
|
| 252 |
+
<select id="voiceSelect">
|
| 253 |
+
<option value="pf_dora" selected>🇧🇷 [pf_dora] Português Feminino (Dora)</option>
|
| 254 |
+
<option value="pm_alex">🇧🇷 [pm_alex] Português Masculino (Alex)</option>
|
| 255 |
+
<option value="af_heart">🌍 [af_heart] Alternativa Feminina (Heart)</option>
|
| 256 |
+
<option value="af_bella">🌍 [af_bella] Alternativa Feminina (Bella)</option>
|
| 257 |
+
</select>
|
| 258 |
+
</div>
|
| 259 |
+
|
| 260 |
+
<div class="controls">
|
| 261 |
+
<button id="connectBtn" class="btn-primary">Conectar</button>
|
| 262 |
+
<button id="talkBtn" class="btn-success" disabled>Push to Talk</button>
|
| 263 |
+
</div>
|
| 264 |
+
|
| 265 |
+
<div class="metrics">
|
| 266 |
+
<div class="metric">
|
| 267 |
+
<div class="metric-label">Enviado</div>
|
| 268 |
+
<div class="metric-value" id="sentBytes">0 KB</div>
|
| 269 |
+
</div>
|
| 270 |
+
<div class="metric">
|
| 271 |
+
<div class="metric-label">Recebido</div>
|
| 272 |
+
<div class="metric-value" id="receivedBytes">0 KB</div>
|
| 273 |
+
</div>
|
| 274 |
+
<div class="metric">
|
| 275 |
+
<div class="metric-label">Formato</div>
|
| 276 |
+
<div class="metric-value" id="format">PCM</div>
|
| 277 |
+
</div>
|
| 278 |
+
<div class="metric">
|
| 279 |
+
<div class="metric-label">🎤 Voz</div>
|
| 280 |
+
<div class="metric-value" id="currentVoice" style="font-family: monospace; color: #4CAF50; font-weight: bold;">pf_dora</div>
|
| 281 |
+
</div>
|
| 282 |
+
</div>
|
| 283 |
+
|
| 284 |
+
<div class="log" id="log"></div>
|
| 285 |
+
</div>
|
| 286 |
+
|
| 287 |
+
<!-- Seção TTS Direto -->
|
| 288 |
+
<div class="container" style="margin-top: 20px;">
|
| 289 |
+
<h2>🎵 Text-to-Speech Direto</h2>
|
| 290 |
+
<p>Digite ou edite o texto abaixo e escolha uma voz para converter em áudio</p>
|
| 291 |
+
|
| 292 |
+
<div class="section">
|
| 293 |
+
<textarea id="ttsText" style="width: 100%; height: 120px; padding: 10px; border: 1px solid #333; border-radius: 8px; background: #1e1e1e; color: #e0e0e0; font-family: 'Segoe UI', system-ui, sans-serif; font-size: 14px; resize: vertical;">Olá! Teste de voz.</textarea>
|
| 294 |
+
</div>
|
| 295 |
+
|
| 296 |
+
<div class="section" style="display: flex; gap: 10px; align-items: center; margin-top: 15px;">
|
| 297 |
+
<label for="ttsVoiceSelect" style="font-weight: 600;">🔊 Voz:</label>
|
| 298 |
+
<select id="ttsVoiceSelect" style="flex: 1; padding: 8px; border: 1px solid #333; border-radius: 5px; background: #2a2a2a; color: #e0e0e0;">
|
| 299 |
+
<optgroup label="🇧🇷 Português">
|
| 300 |
+
<option value="pf_dora" selected>[pf_dora] Feminino - Dora</option>
|
| 301 |
+
<option value="pm_alex">[pm_alex] Masculino - Alex</option>
|
| 302 |
+
<option value="pm_santa">[pm_santa] Masculino - Santa (Festivo)</option>
|
| 303 |
+
</optgroup>
|
| 304 |
+
<optgroup label="🇫🇷 Francês">
|
| 305 |
+
<option value="ff_siwis">[ff_siwis] Feminino - Siwis (Nativa)</option>
|
| 306 |
+
</optgroup>
|
| 307 |
+
<optgroup label="🇺🇸 Inglês Americano">
|
| 308 |
+
<option value="af_alloy">Feminino - Alloy</option>
|
| 309 |
+
<option value="af_aoede">Feminino - Aoede</option>
|
| 310 |
+
<option value="af_bella">Feminino - Bella</option>
|
| 311 |
+
<option value="af_heart">Feminino - Heart</option>
|
| 312 |
+
<option value="af_jessica">Feminino - Jessica</option>
|
| 313 |
+
<option value="af_kore">Feminino - Kore</option>
|
| 314 |
+
<option value="af_nicole">Feminino - Nicole</option>
|
| 315 |
+
<option value="af_nova">Feminino - Nova</option>
|
| 316 |
+
<option value="af_river">Feminino - River</option>
|
| 317 |
+
<option value="af_sarah">Feminino - Sarah</option>
|
| 318 |
+
<option value="af_sky">Feminino - Sky</option>
|
| 319 |
+
<option value="am_adam">Masculino - Adam</option>
|
| 320 |
+
<option value="am_echo">Masculino - Echo</option>
|
| 321 |
+
<option value="am_eric">Masculino - Eric</option>
|
| 322 |
+
<option value="am_fenrir">Masculino - Fenrir</option>
|
| 323 |
+
<option value="am_liam">Masculino - Liam</option>
|
| 324 |
+
<option value="am_michael">Masculino - Michael</option>
|
| 325 |
+
<option value="am_onyx">Masculino - Onyx</option>
|
| 326 |
+
<option value="am_puck">Masculino - Puck</option>
|
| 327 |
+
<option value="am_santa">Masculino - Santa</option>
|
| 328 |
+
</optgroup>
|
| 329 |
+
<optgroup label="🇬🇧 Inglês Britânico">
|
| 330 |
+
<option value="bf_alice">Feminino - Alice</option>
|
| 331 |
+
<option value="bf_emma">Feminino - Emma</option>
|
| 332 |
+
<option value="bf_isabella">Feminino - Isabella</option>
|
| 333 |
+
<option value="bf_lily">Feminino - Lily</option>
|
| 334 |
+
<option value="bm_daniel">Masculino - Daniel</option>
|
| 335 |
+
<option value="bm_fable">Masculino - Fable</option>
|
| 336 |
+
<option value="bm_george">Masculino - George</option>
|
| 337 |
+
<option value="bm_lewis">Masculino - Lewis</option>
|
| 338 |
+
</optgroup>
|
| 339 |
+
<optgroup label="🇪🇸 Espanhol">
|
| 340 |
+
<option value="ef_dora">Feminino - Dora</option>
|
| 341 |
+
<option value="em_alex">Masculino - Alex</option>
|
| 342 |
+
<option value="em_santa">Masculino - Santa</option>
|
| 343 |
+
</optgroup>
|
| 344 |
+
<optgroup label="🇮🇹 Italiano">
|
| 345 |
+
<option value="if_sara">Feminino - Sara</option>
|
| 346 |
+
<option value="im_nicola">Masculino - Nicola</option>
|
| 347 |
+
</optgroup>
|
| 348 |
+
<optgroup label="🇯🇵 Japonês">
|
| 349 |
+
<option value="jf_alpha">Feminino - Alpha</option>
|
| 350 |
+
<option value="jf_gongitsune">Feminino - Gongitsune</option>
|
| 351 |
+
<option value="jf_nezumi">Feminino - Nezumi</option>
|
| 352 |
+
<option value="jf_tebukuro">Feminino - Tebukuro</option>
|
| 353 |
+
<option value="jm_kumo">Masculino - Kumo</option>
|
| 354 |
+
</optgroup>
|
| 355 |
+
<optgroup label="🇨🇳 Chinês">
|
| 356 |
+
<option value="zf_xiaobei">Feminino - Xiaobei</option>
|
| 357 |
+
<option value="zf_xiaoni">Feminino - Xiaoni</option>
|
| 358 |
+
<option value="zf_xiaoxiao">Feminino - Xiaoxiao</option>
|
| 359 |
+
<option value="zf_xiaoyi">Feminino - Xiaoyi</option>
|
| 360 |
+
<option value="zm_yunjian">Masculino - Yunjian</option>
|
| 361 |
+
<option value="zm_yunxi">Masculino - Yunxi</option>
|
| 362 |
+
<option value="zm_yunxia">Masculino - Yunxia</option>
|
| 363 |
+
<option value="zm_yunyang">Masculino - Yunyang</option>
|
| 364 |
+
</optgroup>
|
| 365 |
+
<optgroup label="🇮🇳 Hindi">
|
| 366 |
+
<option value="hf_alpha">Feminino - Alpha</option>
|
| 367 |
+
<option value="hf_beta">Feminino - Beta</option>
|
| 368 |
+
<option value="hm_omega">Masculino - Omega</option>
|
| 369 |
+
<option value="hm_psi">Masculino - Psi</option>
|
| 370 |
+
</optgroup>
|
| 371 |
+
</select>
|
| 372 |
+
|
| 373 |
+
<button id="ttsPlayBtn" class="btn-success" disabled style="padding: 10px 20px;">
|
| 374 |
+
▶️ Gerar Áudio
|
| 375 |
+
</button>
|
| 376 |
+
</div>
|
| 377 |
+
|
| 378 |
+
<div id="ttsStatus" style="display: none; margin-top: 15px; padding: 15px; background: #2a2a2a; border-radius: 8px;">
|
| 379 |
+
<span id="ttsStatusText">⏳ Processando...</span>
|
| 380 |
+
</div>
|
| 381 |
+
|
| 382 |
+
<div id="ttsPlayer" style="display: none; margin-top: 15px;">
|
| 383 |
+
<audio id="ttsAudio" controls style="width: 100%;"></audio>
|
| 384 |
+
</div>
|
| 385 |
+
</div>
|
| 386 |
+
|
| 387 |
+
<script>
|
| 388 |
+
// Estado da aplicação
|
| 389 |
+
let ws = null;
|
| 390 |
+
let isConnected = false;
|
| 391 |
+
let isRecording = false;
|
| 392 |
+
let audioContext = null;
|
| 393 |
+
let stream = null;
|
| 394 |
+
let audioSource = null;
|
| 395 |
+
let audioProcessor = null;
|
| 396 |
+
let pcmBuffer = [];
|
| 397 |
+
|
| 398 |
+
// Métricas
|
| 399 |
+
const metrics = {
|
| 400 |
+
sentBytes: 0,
|
| 401 |
+
receivedBytes: 0,
|
| 402 |
+
latency: 0,
|
| 403 |
+
recordingStartTime: 0
|
| 404 |
+
};
|
| 405 |
+
|
| 406 |
+
// Elementos DOM
|
| 407 |
+
const elements = {
|
| 408 |
+
statusDot: document.getElementById('statusDot'),
|
| 409 |
+
statusText: document.getElementById('statusText'),
|
| 410 |
+
latencyText: document.getElementById('latencyText'),
|
| 411 |
+
connectBtn: document.getElementById('connectBtn'),
|
| 412 |
+
talkBtn: document.getElementById('talkBtn'),
|
| 413 |
+
voiceSelect: document.getElementById('voiceSelect'),
|
| 414 |
+
sentBytes: document.getElementById('sentBytes'),
|
| 415 |
+
receivedBytes: document.getElementById('receivedBytes'),
|
| 416 |
+
format: document.getElementById('format'),
|
| 417 |
+
log: document.getElementById('log'),
|
| 418 |
+
// TTS elements
|
| 419 |
+
ttsText: document.getElementById('ttsText'),
|
| 420 |
+
ttsVoiceSelect: document.getElementById('ttsVoiceSelect'),
|
| 421 |
+
ttsPlayBtn: document.getElementById('ttsPlayBtn'),
|
| 422 |
+
ttsStatus: document.getElementById('ttsStatus'),
|
| 423 |
+
ttsStatusText: document.getElementById('ttsStatusText'),
|
| 424 |
+
ttsPlayer: document.getElementById('ttsPlayer'),
|
| 425 |
+
ttsAudio: document.getElementById('ttsAudio')
|
| 426 |
+
};
|
| 427 |
+
|
| 428 |
+
// Log no console visual
|
| 429 |
+
function log(message, type = 'info') {
|
| 430 |
+
const time = new Date().toLocaleTimeString('pt-BR');
|
| 431 |
+
const entry = document.createElement('div');
|
| 432 |
+
entry.className = `log-entry ${type}`;
|
| 433 |
+
entry.innerHTML = `
|
| 434 |
+
<span class="log-time">[${time}]</span>
|
| 435 |
+
<span class="log-message">${message}</span>
|
| 436 |
+
`;
|
| 437 |
+
elements.log.appendChild(entry);
|
| 438 |
+
elements.log.scrollTop = elements.log.scrollHeight;
|
| 439 |
+
console.log(`[${type}] ${message}`);
|
| 440 |
+
}
|
| 441 |
+
|
| 442 |
+
// Atualizar métricas
|
| 443 |
+
function updateMetrics() {
|
| 444 |
+
elements.sentBytes.textContent = `${(metrics.sentBytes / 1024).toFixed(1)} KB`;
|
| 445 |
+
elements.receivedBytes.textContent = `${(metrics.receivedBytes / 1024).toFixed(1)} KB`;
|
| 446 |
+
elements.latencyText.textContent = `Latência: ${metrics.latency}ms`;
|
| 447 |
+
}
|
| 448 |
+
|
| 449 |
+
// Conectar ao WebSocket
|
| 450 |
+
async function connect() {
|
| 451 |
+
try {
|
| 452 |
+
// Solicitar acesso ao microfone
|
| 453 |
+
stream = await navigator.mediaDevices.getUserMedia({
|
| 454 |
+
audio: {
|
| 455 |
+
echoCancellation: true,
|
| 456 |
+
noiseSuppression: true,
|
| 457 |
+
sampleRate: 24000 // High quality 24kHz
|
| 458 |
+
}
|
| 459 |
+
});
|
| 460 |
+
|
| 461 |
+
log('✅ Microfone acessado', 'success');
|
| 462 |
+
|
| 463 |
+
// Conectar WebSocket com suporte binário
|
| 464 |
+
const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
|
| 465 |
+
const wsUrl = `${protocol}//${window.location.host}/ws`;
|
| 466 |
+
ws = new WebSocket(wsUrl);
|
| 467 |
+
ws.binaryType = 'arraybuffer';
|
| 468 |
+
|
| 469 |
+
ws.onopen = () => {
|
| 470 |
+
isConnected = true;
|
| 471 |
+
elements.statusDot.classList.add('connected');
|
| 472 |
+
elements.statusText.textContent = 'Conectado';
|
| 473 |
+
elements.connectBtn.textContent = 'Desconectar';
|
| 474 |
+
elements.connectBtn.classList.remove('btn-primary');
|
| 475 |
+
elements.connectBtn.classList.add('btn-danger');
|
| 476 |
+
elements.talkBtn.disabled = false;
|
| 477 |
+
|
| 478 |
+
// Enviar voz selecionada ao conectar
|
| 479 |
+
const currentVoice = elements.voiceSelect.value || elements.ttsVoiceSelect.value || 'pf_dora';
|
| 480 |
+
ws.send(JSON.stringify({
|
| 481 |
+
type: 'set-voice',
|
| 482 |
+
voice_id: currentVoice
|
| 483 |
+
}));
|
| 484 |
+
log(`🔊 Voz configurada: ${currentVoice}`, 'info');
|
| 485 |
+
elements.ttsPlayBtn.disabled = false; // Habilitar TTS button
|
| 486 |
+
log('✅ Conectado ao servidor', 'success');
|
| 487 |
+
};
|
| 488 |
+
|
| 489 |
+
ws.onmessage = (event) => {
|
| 490 |
+
if (event.data instanceof ArrayBuffer) {
|
| 491 |
+
// Áudio PCM binário recebido
|
| 492 |
+
handlePCMAudio(event.data);
|
| 493 |
+
} else {
|
| 494 |
+
// Mensagem JSON
|
| 495 |
+
const data = JSON.parse(event.data);
|
| 496 |
+
handleMessage(data);
|
| 497 |
+
}
|
| 498 |
+
};
|
| 499 |
+
|
| 500 |
+
ws.onerror = (error) => {
|
| 501 |
+
log(`❌ Erro WebSocket: ${error}`, 'error');
|
| 502 |
+
};
|
| 503 |
+
|
| 504 |
+
ws.onclose = () => {
|
| 505 |
+
disconnect();
|
| 506 |
+
};
|
| 507 |
+
|
| 508 |
+
} catch (error) {
|
| 509 |
+
log(`❌ Erro ao conectar: ${error.message}`, 'error');
|
| 510 |
+
}
|
| 511 |
+
}
|
| 512 |
+
|
| 513 |
+
// Desconectar
|
| 514 |
+
function disconnect() {
|
| 515 |
+
isConnected = false;
|
| 516 |
+
|
| 517 |
+
if (ws) {
|
| 518 |
+
ws.close();
|
| 519 |
+
ws = null;
|
| 520 |
+
}
|
| 521 |
+
|
| 522 |
+
if (stream) {
|
| 523 |
+
stream.getTracks().forEach(track => track.stop());
|
| 524 |
+
stream = null;
|
| 525 |
+
}
|
| 526 |
+
|
| 527 |
+
if (audioContext) {
|
| 528 |
+
audioContext.close();
|
| 529 |
+
audioContext = null;
|
| 530 |
+
}
|
| 531 |
+
|
| 532 |
+
elements.statusDot.classList.remove('connected');
|
| 533 |
+
elements.statusText.textContent = 'Desconectado';
|
| 534 |
+
elements.connectBtn.textContent = 'Conectar';
|
| 535 |
+
elements.connectBtn.classList.remove('btn-danger');
|
| 536 |
+
elements.connectBtn.classList.add('btn-primary');
|
| 537 |
+
elements.talkBtn.disabled = true;
|
| 538 |
+
|
| 539 |
+
log('👋 Desconectado', 'warning');
|
| 540 |
+
}
|
| 541 |
+
|
| 542 |
+
// Iniciar gravação PCM
|
| 543 |
+
function startRecording() {
|
| 544 |
+
if (isRecording) return;
|
| 545 |
+
|
| 546 |
+
isRecording = true;
|
| 547 |
+
metrics.recordingStartTime = Date.now();
|
| 548 |
+
elements.talkBtn.classList.add('recording');
|
| 549 |
+
elements.talkBtn.textContent = 'Gravando...';
|
| 550 |
+
pcmBuffer = [];
|
| 551 |
+
|
| 552 |
+
const sampleRate = 24000; // Sempre usar melhor qualidade
|
| 553 |
+
log(`🎤 Gravando PCM 16-bit @ ${sampleRate}Hz (alta qualidade)`, 'info');
|
| 554 |
+
|
| 555 |
+
// Criar AudioContext se necessário
|
| 556 |
+
if (!audioContext) {
|
| 557 |
+
// Sempre usar melhor qualidade (24kHz)
|
| 558 |
+
const sampleRate = 24000;
|
| 559 |
+
|
| 560 |
+
audioContext = new (window.AudioContext || window.webkitAudioContext)({
|
| 561 |
+
sampleRate: sampleRate
|
| 562 |
+
});
|
| 563 |
+
|
| 564 |
+
log(`🎧 AudioContext criado: ${sampleRate}Hz (alta qualidade)`, 'info');
|
| 565 |
+
}
|
| 566 |
+
|
| 567 |
+
// Criar processador de áudio
|
| 568 |
+
audioSource = audioContext.createMediaStreamSource(stream);
|
| 569 |
+
audioProcessor = audioContext.createScriptProcessor(4096, 1, 1);
|
| 570 |
+
|
| 571 |
+
audioProcessor.onaudioprocess = (e) => {
|
| 572 |
+
if (!isRecording) return;
|
| 573 |
+
|
| 574 |
+
const inputData = e.inputBuffer.getChannelData(0);
|
| 575 |
+
|
| 576 |
+
// Calcular RMS (Root Mean Square) para melhor detecção de volume
|
| 577 |
+
let sumSquares = 0;
|
| 578 |
+
for (let i = 0; i < inputData.length; i++) {
|
| 579 |
+
sumSquares += inputData[i] * inputData[i];
|
| 580 |
+
}
|
| 581 |
+
const rms = Math.sqrt(sumSquares / inputData.length);
|
| 582 |
+
|
| 583 |
+
// Calcular amplitude máxima também
|
| 584 |
+
let maxAmplitude = 0;
|
| 585 |
+
for (let i = 0; i < inputData.length; i++) {
|
| 586 |
+
maxAmplitude = Math.max(maxAmplitude, Math.abs(inputData[i]));
|
| 587 |
+
}
|
| 588 |
+
|
| 589 |
+
// Detecção de voz baseada em RMS (mais confiável que amplitude máxima)
|
| 590 |
+
const voiceThreshold = 0.01; // Threshold para detectar voz
|
| 591 |
+
const hasVoice = rms > voiceThreshold;
|
| 592 |
+
|
| 593 |
+
// Aplicar ganho suave apenas se necessário
|
| 594 |
+
let gain = 1.0;
|
| 595 |
+
if (hasVoice && rms < 0.05) {
|
| 596 |
+
// Ganho suave baseado em RMS, máximo 5x
|
| 597 |
+
gain = Math.min(5.0, 0.05 / rms);
|
| 598 |
+
if (gain > 1.2) {
|
| 599 |
+
log(`🎤 Volume baixo detectado, aplicando ganho: ${gain.toFixed(1)}x`, 'info');
|
| 600 |
+
}
|
| 601 |
+
}
|
| 602 |
+
|
| 603 |
+
// Converter Float32 para Int16 com processamento melhorado
|
| 604 |
+
const pcmData = new Int16Array(inputData.length);
|
| 605 |
+
for (let i = 0; i < inputData.length; i++) {
|
| 606 |
+
// Aplicar ganho suave
|
| 607 |
+
let sample = inputData[i] * gain;
|
| 608 |
+
|
| 609 |
+
// Soft clipping para evitar distorção
|
| 610 |
+
if (Math.abs(sample) > 0.95) {
|
| 611 |
+
sample = Math.sign(sample) * (0.95 + 0.05 * Math.tanh((Math.abs(sample) - 0.95) * 10));
|
| 612 |
+
}
|
| 613 |
+
|
| 614 |
+
// Converter para Int16
|
| 615 |
+
sample = Math.max(-1, Math.min(1, sample));
|
| 616 |
+
pcmData[i] = sample < 0 ? sample * 0x8000 : sample * 0x7FFF;
|
| 617 |
+
}
|
| 618 |
+
|
| 619 |
+
// Adicionar ao buffer apenas se detectar voz
|
| 620 |
+
if (hasVoice) {
|
| 621 |
+
pcmBuffer.push(pcmData);
|
| 622 |
+
}
|
| 623 |
+
};
|
| 624 |
+
|
| 625 |
+
audioSource.connect(audioProcessor);
|
| 626 |
+
audioProcessor.connect(audioContext.destination);
|
| 627 |
+
}
|
| 628 |
+
|
| 629 |
+
// Parar gravação e enviar
|
| 630 |
+
function stopRecording() {
|
| 631 |
+
if (!isRecording) return;
|
| 632 |
+
|
| 633 |
+
isRecording = false;
|
| 634 |
+
const duration = Date.now() - metrics.recordingStartTime;
|
| 635 |
+
elements.talkBtn.classList.remove('recording');
|
| 636 |
+
elements.talkBtn.textContent = 'Push to Talk';
|
| 637 |
+
|
| 638 |
+
// Desconectar processador
|
| 639 |
+
if (audioProcessor) {
|
| 640 |
+
audioProcessor.disconnect();
|
| 641 |
+
audioProcessor = null;
|
| 642 |
+
}
|
| 643 |
+
if (audioSource) {
|
| 644 |
+
audioSource.disconnect();
|
| 645 |
+
audioSource = null;
|
| 646 |
+
}
|
| 647 |
+
|
| 648 |
+
// Verificar se há áudio para enviar
|
| 649 |
+
if (pcmBuffer.length === 0) {
|
| 650 |
+
log(`⚠️ Nenhum áudio capturado (silêncio ou volume muito baixo)`, 'warning');
|
| 651 |
+
pcmBuffer = [];
|
| 652 |
+
return;
|
| 653 |
+
}
|
| 654 |
+
|
| 655 |
+
// Combinar todos os chunks PCM
|
| 656 |
+
const totalLength = pcmBuffer.reduce((acc, chunk) => acc + chunk.length, 0);
|
| 657 |
+
|
| 658 |
+
// Verificar tamanho mínimo (0.5 segundos)
|
| 659 |
+
const sampleRate = 24000; // Sempre 24kHz
|
| 660 |
+
const minSamples = sampleRate * 0.5;
|
| 661 |
+
|
| 662 |
+
if (totalLength < minSamples) {
|
| 663 |
+
log(`⚠️ Áudio muito curto: ${(totalLength/sampleRate).toFixed(2)}s (mínimo 0.5s)`, 'warning');
|
| 664 |
+
pcmBuffer = [];
|
| 665 |
+
return;
|
| 666 |
+
}
|
| 667 |
+
|
| 668 |
+
const fullPCM = new Int16Array(totalLength);
|
| 669 |
+
let offset = 0;
|
| 670 |
+
for (const chunk of pcmBuffer) {
|
| 671 |
+
fullPCM.set(chunk, offset);
|
| 672 |
+
offset += chunk.length;
|
| 673 |
+
}
|
| 674 |
+
|
| 675 |
+
// Calcular amplitude final para debug
|
| 676 |
+
let maxAmp = 0;
|
| 677 |
+
for (let i = 0; i < Math.min(fullPCM.length, 1000); i++) {
|
| 678 |
+
maxAmp = Math.max(maxAmp, Math.abs(fullPCM[i] / 32768));
|
| 679 |
+
}
|
| 680 |
+
|
| 681 |
+
// Enviar PCM binário direto (sem Base64!)
|
| 682 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 683 |
+
// Enviar um header simples antes do áudio
|
| 684 |
+
const header = new ArrayBuffer(8);
|
| 685 |
+
const view = new DataView(header);
|
| 686 |
+
view.setUint32(0, 0x50434D16); // Magic: "PCM16"
|
| 687 |
+
view.setUint32(4, fullPCM.length * 2); // Tamanho em bytes
|
| 688 |
+
|
| 689 |
+
ws.send(header);
|
| 690 |
+
ws.send(fullPCM.buffer);
|
| 691 |
+
|
| 692 |
+
metrics.sentBytes += fullPCM.length * 2;
|
| 693 |
+
updateMetrics();
|
| 694 |
+
const sampleRate = 24000; // Sempre 24kHz
|
| 695 |
+
log(`📤 PCM enviado: ${(fullPCM.length * 2 / 1024).toFixed(1)}KB, ${(totalLength/sampleRate).toFixed(1)}s @ ${sampleRate}Hz, amp:${maxAmp.toFixed(3)}`, 'success');
|
| 696 |
+
}
|
| 697 |
+
|
| 698 |
+
// Limpar buffer após enviar
|
| 699 |
+
pcmBuffer = [];
|
| 700 |
+
}
|
| 701 |
+
|
| 702 |
+
// Processar mensagem JSON
|
| 703 |
+
function handleMessage(data) {
|
| 704 |
+
switch (data.type) {
|
| 705 |
+
case 'metrics':
|
| 706 |
+
metrics.latency = data.latency;
|
| 707 |
+
updateMetrics();
|
| 708 |
+
log(`📊 Resposta: "${data.response}" (${data.latency}ms)`, 'success');
|
| 709 |
+
break;
|
| 710 |
+
|
| 711 |
+
case 'error':
|
| 712 |
+
log(`❌ Erro: ${data.message}`, 'error');
|
| 713 |
+
break;
|
| 714 |
+
|
| 715 |
+
case 'tts-response':
|
| 716 |
+
// Resposta do TTS direto (Opus 24kHz ou PCM)
|
| 717 |
+
if (data.audio) {
|
| 718 |
+
// Decodificar base64 para arraybuffer
|
| 719 |
+
const binaryString = atob(data.audio);
|
| 720 |
+
const bytes = new Uint8Array(binaryString.length);
|
| 721 |
+
for (let i = 0; i < binaryString.length; i++) {
|
| 722 |
+
bytes[i] = binaryString.charCodeAt(i);
|
| 723 |
+
}
|
| 724 |
+
|
| 725 |
+
let audioData = bytes.buffer;
|
| 726 |
+
// IMPORTANTE: Usar a taxa enviada pelo servidor
|
| 727 |
+
const sampleRate = data.sampleRate || 24000;
|
| 728 |
+
|
| 729 |
+
console.log(`🎯 TTS Response - Taxa recebida: ${sampleRate}Hz, Formato: ${data.format}, Tamanho: ${bytes.length} bytes`);
|
| 730 |
+
|
| 731 |
+
// Se for Opus, usar WebAudio API para decodificar nativamente
|
| 732 |
+
let wavBuffer;
|
| 733 |
+
if (data.format === 'opus') {
|
| 734 |
+
console.log(`🗜️ Opus 24kHz recebido: ${(bytes.length/1024).toFixed(1)}KB`);
|
| 735 |
+
|
| 736 |
+
// Log de economia de banda
|
| 737 |
+
if (data.originalSize) {
|
| 738 |
+
const compression = Math.round(100 - (bytes.length / data.originalSize) * 100);
|
| 739 |
+
console.log(`📊 Economia de banda: ${compression}% (${(data.originalSize/1024).toFixed(1)}KB → ${(bytes.length/1024).toFixed(1)}KB)`);
|
| 740 |
+
}
|
| 741 |
+
|
| 742 |
+
// WebAudio API pode decodificar Opus nativamente
|
| 743 |
+
// Por agora, tratar como PCM até implementar decoder completo
|
| 744 |
+
wavBuffer = addWavHeader(audioData, sampleRate);
|
| 745 |
+
} else {
|
| 746 |
+
// PCM - adicionar WAV header com a taxa correta
|
| 747 |
+
wavBuffer = addWavHeader(audioData, sampleRate);
|
| 748 |
+
}
|
| 749 |
+
|
| 750 |
+
// Log da qualidade recebida
|
| 751 |
+
console.log(`🎵 TTS pronto: ${(audioData.byteLength/1024).toFixed(1)}KB @ ${sampleRate}Hz (${data.quality || 'high'} quality, ${data.format || 'pcm'})`);
|
| 752 |
+
|
| 753 |
+
// Criar blob e URL
|
| 754 |
+
const blob = new Blob([wavBuffer], { type: 'audio/wav' });
|
| 755 |
+
const audioUrl = URL.createObjectURL(blob);
|
| 756 |
+
|
| 757 |
+
// Atualizar player
|
| 758 |
+
elements.ttsAudio.src = audioUrl;
|
| 759 |
+
elements.ttsPlayer.style.display = 'block';
|
| 760 |
+
elements.ttsStatus.style.display = 'none';
|
| 761 |
+
elements.ttsPlayBtn.disabled = false;
|
| 762 |
+
elements.ttsPlayBtn.textContent = '▶️ Gerar Áudio';
|
| 763 |
+
|
| 764 |
+
log('🎵 Áudio TTS gerado com sucesso!', 'success');
|
| 765 |
+
}
|
| 766 |
+
break;
|
| 767 |
+
}
|
| 768 |
+
}
|
| 769 |
+
|
| 770 |
+
// Processar áudio PCM recebido
|
| 771 |
+
function handlePCMAudio(arrayBuffer) {
|
| 772 |
+
metrics.receivedBytes += arrayBuffer.byteLength;
|
| 773 |
+
updateMetrics();
|
| 774 |
+
|
| 775 |
+
// Criar WAV header para reproduzir
|
| 776 |
+
const wavBuffer = addWavHeader(arrayBuffer);
|
| 777 |
+
|
| 778 |
+
// Criar blob e URL para o áudio
|
| 779 |
+
const blob = new Blob([wavBuffer], { type: 'audio/wav' });
|
| 780 |
+
const audioUrl = URL.createObjectURL(blob);
|
| 781 |
+
|
| 782 |
+
// Criar log com botão de play
|
| 783 |
+
const time = new Date().toLocaleTimeString('pt-BR');
|
| 784 |
+
const entry = document.createElement('div');
|
| 785 |
+
entry.className = 'log-entry success';
|
| 786 |
+
entry.innerHTML = `
|
| 787 |
+
<span class="log-time">[${time}]</span>
|
| 788 |
+
<span class="log-message">🔊 Áudio recebido: ${(arrayBuffer.byteLength / 1024).toFixed(1)}KB</span>
|
| 789 |
+
<div class="audio-player">
|
| 790 |
+
<button class="play-btn" onclick="playAudio('${audioUrl}')">▶️ Play</button>
|
| 791 |
+
<audio id="audio-${Date.now()}" src="${audioUrl}" style="display: none;"></audio>
|
| 792 |
+
</div>
|
| 793 |
+
`;
|
| 794 |
+
elements.log.appendChild(entry);
|
| 795 |
+
elements.log.scrollTop = elements.log.scrollHeight;
|
| 796 |
+
|
| 797 |
+
// Auto-play o áudio
|
| 798 |
+
const audio = new Audio(audioUrl);
|
| 799 |
+
audio.play().catch(err => {
|
| 800 |
+
console.log('Auto-play bloqueado, use o botão para reproduzir');
|
| 801 |
+
});
|
| 802 |
+
}
|
| 803 |
+
|
| 804 |
+
// Função para tocar áudio manualmente
|
| 805 |
+
function playAudio(url) {
|
| 806 |
+
const audio = new Audio(url);
|
| 807 |
+
audio.play();
|
| 808 |
+
}
|
| 809 |
+
|
| 810 |
+
// Adicionar header WAV ao PCM
|
| 811 |
+
function addWavHeader(pcmBuffer, customSampleRate) {
|
| 812 |
+
const pcmData = new Uint8Array(pcmBuffer);
|
| 813 |
+
const wavBuffer = new ArrayBuffer(44 + pcmData.length);
|
| 814 |
+
const view = new DataView(wavBuffer);
|
| 815 |
+
|
| 816 |
+
// WAV header
|
| 817 |
+
const writeString = (offset, string) => {
|
| 818 |
+
for (let i = 0; i < string.length; i++) {
|
| 819 |
+
view.setUint8(offset + i, string.charCodeAt(i));
|
| 820 |
+
}
|
| 821 |
+
};
|
| 822 |
+
|
| 823 |
+
writeString(0, 'RIFF');
|
| 824 |
+
view.setUint32(4, 36 + pcmData.length, true);
|
| 825 |
+
writeString(8, 'WAVE');
|
| 826 |
+
writeString(12, 'fmt ');
|
| 827 |
+
view.setUint32(16, 16, true); // fmt chunk size
|
| 828 |
+
view.setUint16(20, 1, true); // PCM format
|
| 829 |
+
view.setUint16(22, 1, true); // Mono
|
| 830 |
+
|
| 831 |
+
// Usar taxa customizada se fornecida, senão usar 24kHz
|
| 832 |
+
let sampleRate = customSampleRate || 24000;
|
| 833 |
+
|
| 834 |
+
console.log(`📝 WAV Header - Configurando taxa: ${sampleRate}Hz`);
|
| 835 |
+
|
| 836 |
+
view.setUint32(24, sampleRate, true); // Sample rate
|
| 837 |
+
view.setUint32(28, sampleRate * 2, true); // Byte rate: sampleRate * 1 * 2
|
| 838 |
+
view.setUint16(32, 2, true); // Block align: 1 * 2
|
| 839 |
+
view.setUint16(34, 16, true); // Bits per sample: 16-bit
|
| 840 |
+
writeString(36, 'data');
|
| 841 |
+
view.setUint32(40, pcmData.length, true);
|
| 842 |
+
|
| 843 |
+
// Copiar dados PCM
|
| 844 |
+
new Uint8Array(wavBuffer, 44).set(pcmData);
|
| 845 |
+
|
| 846 |
+
return wavBuffer;
|
| 847 |
+
}
|
| 848 |
+
|
| 849 |
+
// Event Listeners
|
| 850 |
+
elements.connectBtn.addEventListener('click', () => {
|
| 851 |
+
if (isConnected) {
|
| 852 |
+
disconnect();
|
| 853 |
+
} else {
|
| 854 |
+
connect();
|
| 855 |
+
}
|
| 856 |
+
});
|
| 857 |
+
|
| 858 |
+
elements.talkBtn.addEventListener('mousedown', startRecording);
|
| 859 |
+
elements.talkBtn.addEventListener('mouseup', stopRecording);
|
| 860 |
+
elements.talkBtn.addEventListener('mouseleave', stopRecording);
|
| 861 |
+
|
| 862 |
+
// Voice selector listener
|
| 863 |
+
elements.voiceSelect.addEventListener('change', (e) => {
|
| 864 |
+
const voice_id = e.target.value;
|
| 865 |
+
console.log('Voice select changed to:', voice_id);
|
| 866 |
+
|
| 867 |
+
// Update current voice display
|
| 868 |
+
const currentVoiceElement = document.getElementById('currentVoice');
|
| 869 |
+
if (currentVoiceElement) {
|
| 870 |
+
currentVoiceElement.textContent = voice_id;
|
| 871 |
+
}
|
| 872 |
+
|
| 873 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 874 |
+
console.log('Sending set-voice command:', voice_id);
|
| 875 |
+
ws.send(JSON.stringify({
|
| 876 |
+
type: 'set-voice',
|
| 877 |
+
voice_id: voice_id
|
| 878 |
+
}));
|
| 879 |
+
log(`🔊 Voz alterada para: ${voice_id} - ${e.target.options[e.target.selectedIndex].text}`, 'info');
|
| 880 |
+
} else {
|
| 881 |
+
console.log('WebSocket not connected, cannot send voice change');
|
| 882 |
+
log(`⚠️ Conecte-se primeiro para mudar a voz`, 'warning');
|
| 883 |
+
}
|
| 884 |
+
});
|
| 885 |
+
elements.talkBtn.addEventListener('touchstart', startRecording);
|
| 886 |
+
elements.talkBtn.addEventListener('touchend', stopRecording);
|
| 887 |
+
|
| 888 |
+
// TTS Voice selector listener
|
| 889 |
+
elements.ttsVoiceSelect.addEventListener('change', (e) => {
|
| 890 |
+
const voice_id = e.target.value;
|
| 891 |
+
|
| 892 |
+
// Update main voice selector
|
| 893 |
+
elements.voiceSelect.value = voice_id;
|
| 894 |
+
|
| 895 |
+
// Update current voice display
|
| 896 |
+
const currentVoiceElement = document.getElementById('currentVoice');
|
| 897 |
+
if (currentVoiceElement) {
|
| 898 |
+
currentVoiceElement.textContent = voice_id;
|
| 899 |
+
}
|
| 900 |
+
|
| 901 |
+
// Send voice change to server
|
| 902 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 903 |
+
ws.send(JSON.stringify({
|
| 904 |
+
type: 'set-voice',
|
| 905 |
+
voice_id: voice_id
|
| 906 |
+
}));
|
| 907 |
+
log(`🎤 Voz TTS alterada para: ${voice_id}`, 'info');
|
| 908 |
+
}
|
| 909 |
+
});
|
| 910 |
+
|
| 911 |
+
// TTS Button Event Listener
|
| 912 |
+
elements.ttsPlayBtn.addEventListener('click', (e) => {
|
| 913 |
+
e.preventDefault();
|
| 914 |
+
e.stopPropagation();
|
| 915 |
+
|
| 916 |
+
console.log('TTS Button clicked!');
|
| 917 |
+
const text = elements.ttsText.value.trim();
|
| 918 |
+
const voice = elements.ttsVoiceSelect.value;
|
| 919 |
+
|
| 920 |
+
console.log('TTS Text:', text);
|
| 921 |
+
console.log('TTS Voice:', voice);
|
| 922 |
+
|
| 923 |
+
if (!text) {
|
| 924 |
+
alert('Por favor, digite algum texto para converter em áudio');
|
| 925 |
+
return;
|
| 926 |
+
}
|
| 927 |
+
|
| 928 |
+
if (!ws || ws.readyState !== WebSocket.OPEN) {
|
| 929 |
+
alert('Por favor, conecte-se primeiro clicando em "Conectar"');
|
| 930 |
+
return;
|
| 931 |
+
}
|
| 932 |
+
|
| 933 |
+
// Mostrar status
|
| 934 |
+
elements.ttsStatus.style.display = 'block';
|
| 935 |
+
elements.ttsStatusText.textContent = '⏳ Gerando áudio...';
|
| 936 |
+
elements.ttsPlayBtn.disabled = true;
|
| 937 |
+
elements.ttsPlayBtn.textContent = '⏳ Processando...';
|
| 938 |
+
elements.ttsPlayer.style.display = 'none';
|
| 939 |
+
|
| 940 |
+
// Sempre usar melhor qualidade (24kHz)
|
| 941 |
+
const quality = 'high';
|
| 942 |
+
|
| 943 |
+
// Enviar request para TTS com qualidade máxima
|
| 944 |
+
const ttsRequest = {
|
| 945 |
+
type: 'text-to-speech',
|
| 946 |
+
text: text,
|
| 947 |
+
voice_id: voice,
|
| 948 |
+
quality: quality,
|
| 949 |
+
format: 'opus' // Opus 24kHz @ 32kbps - máxima qualidade, mínima banda
|
| 950 |
+
};
|
| 951 |
+
|
| 952 |
+
console.log('Sending TTS request:', ttsRequest);
|
| 953 |
+
ws.send(JSON.stringify(ttsRequest));
|
| 954 |
+
|
| 955 |
+
log(`🎤 Solicitando TTS: voz=${voice}, texto="${text.substring(0, 50)}..."`, 'info');
|
| 956 |
+
});
|
| 957 |
+
|
| 958 |
+
// Inicialização
|
| 959 |
+
log('🚀 Ultravox Chat PCM Otimizado', 'info');
|
| 960 |
+
log('📊 Formato: PCM 16-bit @ 16kHz', 'info');
|
| 961 |
+
log('⚡ Sem FFmpeg, sem Base64!', 'success');
|
| 962 |
+
</script>
|
| 963 |
+
</body>
|
| 964 |
+
</html>
|
services/webrtc_gateway/ultravox-chat-ios.html
ADDED
|
@@ -0,0 +1,1843 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="pt-BR">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
|
| 6 |
+
<meta name="apple-mobile-web-app-capable" content="yes">
|
| 7 |
+
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">
|
| 8 |
+
<title>Ultravox AI Assistant</title>
|
| 9 |
+
|
| 10 |
+
<!-- Material Icons -->
|
| 11 |
+
<link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet">
|
| 12 |
+
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap" rel="stylesheet">
|
| 13 |
+
|
| 14 |
+
<!-- Opus Decoder -->
|
| 15 |
+
<script src="opus-decoder.js"></script>
|
| 16 |
+
|
| 17 |
+
<style>
|
| 18 |
+
* {
|
| 19 |
+
margin: 0;
|
| 20 |
+
padding: 0;
|
| 21 |
+
box-sizing: border-box;
|
| 22 |
+
-webkit-tap-highlight-color: transparent;
|
| 23 |
+
}
|
| 24 |
+
|
| 25 |
+
:root {
|
| 26 |
+
--ios-blue: #007AFF;
|
| 27 |
+
--ios-gray: #8E8E93;
|
| 28 |
+
--ios-gray-2: #C7C7CC;
|
| 29 |
+
--ios-gray-3: #D1D1D6;
|
| 30 |
+
--ios-gray-4: #E5E5EA;
|
| 31 |
+
--ios-gray-5: #F2F2F7;
|
| 32 |
+
--ios-gray-6: #FFFFFF;
|
| 33 |
+
--ios-red: #FF3B30;
|
| 34 |
+
--ios-green: #34C759;
|
| 35 |
+
--ios-orange: #FF9500;
|
| 36 |
+
--ios-purple: #AF52DE;
|
| 37 |
+
--sidebar-width: 280px;
|
| 38 |
+
--header-height: 60px;
|
| 39 |
+
}
|
| 40 |
+
|
| 41 |
+
/* Pull to Refresh */
|
| 42 |
+
.pull-to-refresh {
|
| 43 |
+
position: fixed;
|
| 44 |
+
top: -60px;
|
| 45 |
+
left: 0;
|
| 46 |
+
right: 0;
|
| 47 |
+
height: 60px;
|
| 48 |
+
background: rgba(255, 255, 255, 0.95);
|
| 49 |
+
backdrop-filter: blur(20px);
|
| 50 |
+
-webkit-backdrop-filter: blur(20px);
|
| 51 |
+
display: flex;
|
| 52 |
+
align-items: center;
|
| 53 |
+
justify-content: center;
|
| 54 |
+
z-index: 2000;
|
| 55 |
+
transition: transform 0.3s ease;
|
| 56 |
+
border-bottom: 1px solid var(--ios-gray-4);
|
| 57 |
+
}
|
| 58 |
+
|
| 59 |
+
.pull-to-refresh.show {
|
| 60 |
+
transform: translateY(60px);
|
| 61 |
+
}
|
| 62 |
+
|
| 63 |
+
.pull-to-refresh-spinner {
|
| 64 |
+
width: 20px;
|
| 65 |
+
height: 20px;
|
| 66 |
+
border: 2px solid var(--ios-gray-3);
|
| 67 |
+
border-top-color: var(--ios-blue);
|
| 68 |
+
border-radius: 50%;
|
| 69 |
+
animation: none;
|
| 70 |
+
margin-right: 10px;
|
| 71 |
+
}
|
| 72 |
+
|
| 73 |
+
.pull-to-refresh.refreshing .pull-to-refresh-spinner {
|
| 74 |
+
animation: spin 1s linear infinite;
|
| 75 |
+
}
|
| 76 |
+
|
| 77 |
+
.pull-to-refresh-text {
|
| 78 |
+
font-size: 14px;
|
| 79 |
+
color: var(--ios-gray);
|
| 80 |
+
}
|
| 81 |
+
|
| 82 |
+
body {
|
| 83 |
+
font-family: 'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
|
| 84 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 85 |
+
color: #000;
|
| 86 |
+
overflow: hidden;
|
| 87 |
+
height: 100vh;
|
| 88 |
+
position: fixed;
|
| 89 |
+
width: 100%;
|
| 90 |
+
user-select: none;
|
| 91 |
+
-webkit-user-select: none;
|
| 92 |
+
}
|
| 93 |
+
|
| 94 |
+
/* App Container */
|
| 95 |
+
.app-container {
|
| 96 |
+
display: flex;
|
| 97 |
+
height: 100vh;
|
| 98 |
+
position: relative;
|
| 99 |
+
}
|
| 100 |
+
|
| 101 |
+
/* Sidebar */
|
| 102 |
+
.sidebar {
|
| 103 |
+
width: var(--sidebar-width);
|
| 104 |
+
background: rgba(255, 255, 255, 0.95);
|
| 105 |
+
backdrop-filter: blur(20px);
|
| 106 |
+
-webkit-backdrop-filter: blur(20px);
|
| 107 |
+
border-right: 1px solid var(--ios-gray-4);
|
| 108 |
+
display: flex;
|
| 109 |
+
flex-direction: column;
|
| 110 |
+
transition: transform 0.3s cubic-bezier(0.4, 0, 0.2, 1);
|
| 111 |
+
position: relative;
|
| 112 |
+
z-index: 100;
|
| 113 |
+
}
|
| 114 |
+
|
| 115 |
+
.sidebar-header {
|
| 116 |
+
padding: 20px;
|
| 117 |
+
border-bottom: 1px solid var(--ios-gray-4);
|
| 118 |
+
}
|
| 119 |
+
|
| 120 |
+
.app-title {
|
| 121 |
+
font-size: 24px;
|
| 122 |
+
font-weight: 700;
|
| 123 |
+
color: #000;
|
| 124 |
+
display: flex;
|
| 125 |
+
align-items: center;
|
| 126 |
+
gap: 10px;
|
| 127 |
+
}
|
| 128 |
+
|
| 129 |
+
.app-subtitle {
|
| 130 |
+
font-size: 12px;
|
| 131 |
+
color: var(--ios-gray);
|
| 132 |
+
margin-top: 4px;
|
| 133 |
+
}
|
| 134 |
+
|
| 135 |
+
.nav-menu {
|
| 136 |
+
flex: 1;
|
| 137 |
+
padding: 12px 0;
|
| 138 |
+
}
|
| 139 |
+
|
| 140 |
+
.nav-item {
|
| 141 |
+
display: flex;
|
| 142 |
+
align-items: center;
|
| 143 |
+
padding: 14px 20px;
|
| 144 |
+
color: #000;
|
| 145 |
+
text-decoration: none;
|
| 146 |
+
transition: all 0.2s ease;
|
| 147 |
+
position: relative;
|
| 148 |
+
cursor: pointer;
|
| 149 |
+
font-size: 15px;
|
| 150 |
+
font-weight: 500;
|
| 151 |
+
}
|
| 152 |
+
|
| 153 |
+
.nav-item:hover {
|
| 154 |
+
background: var(--ios-gray-5);
|
| 155 |
+
}
|
| 156 |
+
|
| 157 |
+
.nav-item.active {
|
| 158 |
+
background: var(--ios-blue);
|
| 159 |
+
color: white;
|
| 160 |
+
}
|
| 161 |
+
|
| 162 |
+
.nav-item .material-icons {
|
| 163 |
+
margin-right: 16px;
|
| 164 |
+
font-size: 22px;
|
| 165 |
+
}
|
| 166 |
+
|
| 167 |
+
.nav-badge {
|
| 168 |
+
margin-left: auto;
|
| 169 |
+
background: var(--ios-red);
|
| 170 |
+
color: white;
|
| 171 |
+
font-size: 11px;
|
| 172 |
+
padding: 2px 8px;
|
| 173 |
+
border-radius: 12px;
|
| 174 |
+
font-weight: 600;
|
| 175 |
+
}
|
| 176 |
+
|
| 177 |
+
/* Main Content */
|
| 178 |
+
.main-content {
|
| 179 |
+
flex: 1;
|
| 180 |
+
display: flex;
|
| 181 |
+
flex-direction: column;
|
| 182 |
+
overflow: hidden;
|
| 183 |
+
background: transparent;
|
| 184 |
+
}
|
| 185 |
+
|
| 186 |
+
/* Header */
|
| 187 |
+
.header {
|
| 188 |
+
height: var(--header-height);
|
| 189 |
+
background: rgba(255, 255, 255, 0.95);
|
| 190 |
+
backdrop-filter: blur(20px);
|
| 191 |
+
-webkit-backdrop-filter: blur(20px);
|
| 192 |
+
border-bottom: 1px solid var(--ios-gray-4);
|
| 193 |
+
display: flex;
|
| 194 |
+
align-items: center;
|
| 195 |
+
padding: 0 20px;
|
| 196 |
+
justify-content: space-between;
|
| 197 |
+
}
|
| 198 |
+
|
| 199 |
+
.menu-toggle {
|
| 200 |
+
display: none;
|
| 201 |
+
background: none;
|
| 202 |
+
border: none;
|
| 203 |
+
color: var(--ios-blue);
|
| 204 |
+
cursor: pointer;
|
| 205 |
+
padding: 8px;
|
| 206 |
+
}
|
| 207 |
+
|
| 208 |
+
.header-title {
|
| 209 |
+
font-size: 17px;
|
| 210 |
+
font-weight: 600;
|
| 211 |
+
color: #000;
|
| 212 |
+
}
|
| 213 |
+
|
| 214 |
+
.connection-status {
|
| 215 |
+
display: flex;
|
| 216 |
+
align-items: center;
|
| 217 |
+
gap: 8px;
|
| 218 |
+
padding: 6px 12px;
|
| 219 |
+
background: var(--ios-gray-5);
|
| 220 |
+
border-radius: 20px;
|
| 221 |
+
font-size: 13px;
|
| 222 |
+
}
|
| 223 |
+
|
| 224 |
+
.status-dot {
|
| 225 |
+
width: 8px;
|
| 226 |
+
height: 8px;
|
| 227 |
+
border-radius: 50%;
|
| 228 |
+
background: var(--ios-red);
|
| 229 |
+
}
|
| 230 |
+
|
| 231 |
+
.status-dot.connected {
|
| 232 |
+
background: var(--ios-green);
|
| 233 |
+
animation: pulse 2s infinite;
|
| 234 |
+
}
|
| 235 |
+
|
| 236 |
+
@keyframes pulse {
|
| 237 |
+
0%, 100% {
|
| 238 |
+
opacity: 1;
|
| 239 |
+
transform: scale(1);
|
| 240 |
+
}
|
| 241 |
+
50% {
|
| 242 |
+
opacity: 0.8;
|
| 243 |
+
transform: scale(1.05);
|
| 244 |
+
}
|
| 245 |
+
}
|
| 246 |
+
|
| 247 |
+
/* View Container */
|
| 248 |
+
.view-container {
|
| 249 |
+
flex: 1;
|
| 250 |
+
overflow-y: auto;
|
| 251 |
+
padding: 20px;
|
| 252 |
+
display: none;
|
| 253 |
+
}
|
| 254 |
+
|
| 255 |
+
.view-container.active {
|
| 256 |
+
display: block;
|
| 257 |
+
}
|
| 258 |
+
|
| 259 |
+
/* iOS Card Style - Minimal */
|
| 260 |
+
.ios-card {
|
| 261 |
+
background: rgba(255, 255, 255, 0.95);
|
| 262 |
+
backdrop-filter: blur(20px);
|
| 263 |
+
-webkit-backdrop-filter: blur(20px);
|
| 264 |
+
border-radius: 16px;
|
| 265 |
+
padding: 20px;
|
| 266 |
+
margin-bottom: 16px;
|
| 267 |
+
border: 1px solid rgba(255, 255, 255, 0.3);
|
| 268 |
+
}
|
| 269 |
+
|
| 270 |
+
.card-title {
|
| 271 |
+
font-size: 20px;
|
| 272 |
+
font-weight: 600;
|
| 273 |
+
margin-bottom: 16px;
|
| 274 |
+
color: #000;
|
| 275 |
+
}
|
| 276 |
+
|
| 277 |
+
/* Voice Selector */
|
| 278 |
+
.voice-selector {
|
| 279 |
+
width: 100%;
|
| 280 |
+
padding: 12px 16px;
|
| 281 |
+
background: var(--ios-gray-5);
|
| 282 |
+
border: 1px solid var(--ios-gray-4);
|
| 283 |
+
border-radius: 10px;
|
| 284 |
+
font-size: 15px;
|
| 285 |
+
font-family: inherit;
|
| 286 |
+
appearance: none;
|
| 287 |
+
background-image: url("data:image/svg+xml;charset=UTF-8,%3csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24' fill='none' stroke='%23007AFF' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3e%3cpolyline points='6 9 12 15 18 9'%3e%3c/polyline%3e%3c/svg%3e");
|
| 288 |
+
background-repeat: no-repeat;
|
| 289 |
+
background-position: right 12px center;
|
| 290 |
+
background-size: 20px;
|
| 291 |
+
padding-right: 40px;
|
| 292 |
+
}
|
| 293 |
+
|
| 294 |
+
/* iOS Button */
|
| 295 |
+
.ios-button {
|
| 296 |
+
width: 100%;
|
| 297 |
+
padding: 16px;
|
| 298 |
+
background: var(--ios-blue);
|
| 299 |
+
color: white;
|
| 300 |
+
border: none;
|
| 301 |
+
border-radius: 12px;
|
| 302 |
+
font-size: 17px;
|
| 303 |
+
font-weight: 600;
|
| 304 |
+
cursor: pointer;
|
| 305 |
+
transition: all 0.2s ease;
|
| 306 |
+
display: flex;
|
| 307 |
+
align-items: center;
|
| 308 |
+
justify-content: center;
|
| 309 |
+
gap: 8px;
|
| 310 |
+
font-family: inherit;
|
| 311 |
+
}
|
| 312 |
+
|
| 313 |
+
.ios-button:hover {
|
| 314 |
+
opacity: 0.9;
|
| 315 |
+
}
|
| 316 |
+
|
| 317 |
+
.ios-button:active {
|
| 318 |
+
transform: scale(0.98);
|
| 319 |
+
}
|
| 320 |
+
|
| 321 |
+
.ios-button:disabled {
|
| 322 |
+
background: var(--ios-gray-3);
|
| 323 |
+
cursor: not-allowed;
|
| 324 |
+
}
|
| 325 |
+
|
| 326 |
+
.ios-button.secondary {
|
| 327 |
+
background: var(--ios-gray-5);
|
| 328 |
+
color: var(--ios-blue);
|
| 329 |
+
}
|
| 330 |
+
|
| 331 |
+
.ios-button.danger {
|
| 332 |
+
background: var(--ios-red);
|
| 333 |
+
}
|
| 334 |
+
|
| 335 |
+
.ios-button.success {
|
| 336 |
+
background: var(--ios-green);
|
| 337 |
+
}
|
| 338 |
+
|
| 339 |
+
.ios-button.recording {
|
| 340 |
+
background: var(--ios-red);
|
| 341 |
+
animation: recordPulse 1s infinite;
|
| 342 |
+
}
|
| 343 |
+
|
| 344 |
+
@keyframes recordPulse {
|
| 345 |
+
0%, 100% { opacity: 1; }
|
| 346 |
+
50% { opacity: 0.8; }
|
| 347 |
+
}
|
| 348 |
+
|
| 349 |
+
/* Push to Talk View - Compact Professional */
|
| 350 |
+
.ptt-container {
|
| 351 |
+
display: grid;
|
| 352 |
+
grid-template-columns: 1fr;
|
| 353 |
+
gap: 20px;
|
| 354 |
+
max-width: 500px;
|
| 355 |
+
margin: 0 auto;
|
| 356 |
+
}
|
| 357 |
+
|
| 358 |
+
.ptt-main-section {
|
| 359 |
+
display: flex;
|
| 360 |
+
flex-direction: column;
|
| 361 |
+
align-items: center;
|
| 362 |
+
gap: 20px;
|
| 363 |
+
}
|
| 364 |
+
|
| 365 |
+
.ptt-button {
|
| 366 |
+
width: 140px;
|
| 367 |
+
height: 140px;
|
| 368 |
+
border-radius: 50%;
|
| 369 |
+
background: linear-gradient(145deg, #ffffff, #f0f0f5);
|
| 370 |
+
color: var(--ios-blue);
|
| 371 |
+
border: none;
|
| 372 |
+
font-size: 14px;
|
| 373 |
+
font-weight: 600;
|
| 374 |
+
cursor: pointer;
|
| 375 |
+
transition: all 0.2s ease;
|
| 376 |
+
display: flex;
|
| 377 |
+
flex-direction: column;
|
| 378 |
+
align-items: center;
|
| 379 |
+
justify-content: center;
|
| 380 |
+
gap: 8px;
|
| 381 |
+
box-shadow: 0 8px 24px rgba(0, 0, 0, 0.1);
|
| 382 |
+
position: relative;
|
| 383 |
+
user-select: none;
|
| 384 |
+
-webkit-user-select: none;
|
| 385 |
+
-webkit-tap-highlight-color: transparent;
|
| 386 |
+
}
|
| 387 |
+
|
| 388 |
+
.ptt-button::before {
|
| 389 |
+
content: '';
|
| 390 |
+
position: absolute;
|
| 391 |
+
width: 100%;
|
| 392 |
+
height: 100%;
|
| 393 |
+
border-radius: 50%;
|
| 394 |
+
border: 2px solid var(--ios-blue);
|
| 395 |
+
animation: ripple 2s linear infinite;
|
| 396 |
+
opacity: 0;
|
| 397 |
+
}
|
| 398 |
+
|
| 399 |
+
.ptt-button:active {
|
| 400 |
+
transform: scale(0.92);
|
| 401 |
+
box-shadow: 0 2px 8px rgba(0, 122, 255, 0.3);
|
| 402 |
+
}
|
| 403 |
+
|
| 404 |
+
.ptt-button.recording {
|
| 405 |
+
background: linear-gradient(145deg, #ff453a, #ff6b6b);
|
| 406 |
+
color: white;
|
| 407 |
+
transform: scale(1.05);
|
| 408 |
+
box-shadow: 0 12px 32px rgba(255, 59, 48, 0.3);
|
| 409 |
+
}
|
| 410 |
+
|
| 411 |
+
.ptt-button.recording::before {
|
| 412 |
+
border-color: var(--ios-red);
|
| 413 |
+
animation: ripple 1s linear infinite;
|
| 414 |
+
}
|
| 415 |
+
|
| 416 |
+
@keyframes ripple {
|
| 417 |
+
0% {
|
| 418 |
+
transform: scale(1);
|
| 419 |
+
opacity: 1;
|
| 420 |
+
}
|
| 421 |
+
100% {
|
| 422 |
+
transform: scale(1.5);
|
| 423 |
+
opacity: 0;
|
| 424 |
+
}
|
| 425 |
+
}
|
| 426 |
+
|
| 427 |
+
.ptt-button .material-icons {
|
| 428 |
+
font-size: 40px;
|
| 429 |
+
user-select: none;
|
| 430 |
+
-webkit-user-select: none;
|
| 431 |
+
}
|
| 432 |
+
|
| 433 |
+
.ptt-button span:not(.material-icons) {
|
| 434 |
+
font-size: 12px;
|
| 435 |
+
opacity: 0.9;
|
| 436 |
+
}
|
| 437 |
+
|
| 438 |
+
/* Metrics Grid */
|
| 439 |
+
.metrics-grid {
|
| 440 |
+
display: grid;
|
| 441 |
+
grid-template-columns: repeat(auto-fit, minmax(140px, 1fr));
|
| 442 |
+
gap: 12px;
|
| 443 |
+
margin-top: 20px;
|
| 444 |
+
}
|
| 445 |
+
|
| 446 |
+
.metric-card {
|
| 447 |
+
background: var(--ios-gray-5);
|
| 448 |
+
padding: 16px;
|
| 449 |
+
border-radius: 10px;
|
| 450 |
+
text-align: center;
|
| 451 |
+
}
|
| 452 |
+
|
| 453 |
+
.metric-label {
|
| 454 |
+
font-size: 11px;
|
| 455 |
+
color: var(--ios-gray);
|
| 456 |
+
text-transform: uppercase;
|
| 457 |
+
letter-spacing: 0.5px;
|
| 458 |
+
margin-bottom: 4px;
|
| 459 |
+
}
|
| 460 |
+
|
| 461 |
+
.metric-value {
|
| 462 |
+
font-size: 24px;
|
| 463 |
+
font-weight: 600;
|
| 464 |
+
color: var(--ios-blue);
|
| 465 |
+
}
|
| 466 |
+
|
| 467 |
+
/* TTS Textarea */
|
| 468 |
+
.tts-textarea {
|
| 469 |
+
width: 100%;
|
| 470 |
+
min-height: 120px;
|
| 471 |
+
padding: 16px;
|
| 472 |
+
background: var(--ios-gray-5);
|
| 473 |
+
border: 1px solid var(--ios-gray-4);
|
| 474 |
+
border-radius: 10px;
|
| 475 |
+
font-size: 15px;
|
| 476 |
+
font-family: inherit;
|
| 477 |
+
resize: vertical;
|
| 478 |
+
}
|
| 479 |
+
|
| 480 |
+
.tts-textarea:focus {
|
| 481 |
+
outline: none;
|
| 482 |
+
border-color: var(--ios-blue);
|
| 483 |
+
}
|
| 484 |
+
|
| 485 |
+
/* Log Console */
|
| 486 |
+
.log-container {
|
| 487 |
+
background: #1c1c1e;
|
| 488 |
+
border-radius: 10px;
|
| 489 |
+
padding: 16px;
|
| 490 |
+
height: 300px;
|
| 491 |
+
overflow-y: auto;
|
| 492 |
+
font-family: 'SF Mono', Monaco, monospace;
|
| 493 |
+
font-size: 12px;
|
| 494 |
+
}
|
| 495 |
+
|
| 496 |
+
.log-entry {
|
| 497 |
+
padding: 4px 0;
|
| 498 |
+
display: flex;
|
| 499 |
+
align-items: flex-start;
|
| 500 |
+
color: #e0e0e0;
|
| 501 |
+
}
|
| 502 |
+
|
| 503 |
+
.log-time {
|
| 504 |
+
color: #8e8e93;
|
| 505 |
+
margin-right: 10px;
|
| 506 |
+
flex-shrink: 0;
|
| 507 |
+
}
|
| 508 |
+
|
| 509 |
+
.log-entry.error { color: #ff453a; }
|
| 510 |
+
.log-entry.success { color: #30d158; }
|
| 511 |
+
.log-entry.info { color: #0a84ff; }
|
| 512 |
+
.log-entry.warning { color: #ffd60a; }
|
| 513 |
+
|
| 514 |
+
/* Audio Player */
|
| 515 |
+
.audio-player {
|
| 516 |
+
display: inline-flex;
|
| 517 |
+
align-items: center;
|
| 518 |
+
gap: 8px;
|
| 519 |
+
margin-left: 8px;
|
| 520 |
+
}
|
| 521 |
+
|
| 522 |
+
.play-btn {
|
| 523 |
+
background: var(--ios-blue);
|
| 524 |
+
color: white;
|
| 525 |
+
border: none;
|
| 526 |
+
border-radius: 4px;
|
| 527 |
+
padding: 4px 8px;
|
| 528 |
+
cursor: pointer;
|
| 529 |
+
font-size: 11px;
|
| 530 |
+
}
|
| 531 |
+
|
| 532 |
+
/* Loading Spinner */
|
| 533 |
+
.loading-spinner {
|
| 534 |
+
display: none;
|
| 535 |
+
width: 40px;
|
| 536 |
+
height: 40px;
|
| 537 |
+
border: 3px solid var(--ios-gray-4);
|
| 538 |
+
border-top-color: var(--ios-blue);
|
| 539 |
+
border-radius: 50%;
|
| 540 |
+
animation: spin 1s linear infinite;
|
| 541 |
+
margin: 20px auto;
|
| 542 |
+
}
|
| 543 |
+
|
| 544 |
+
.loading-spinner.active {
|
| 545 |
+
display: block;
|
| 546 |
+
}
|
| 547 |
+
|
| 548 |
+
@keyframes spin {
|
| 549 |
+
to { transform: rotate(360deg); }
|
| 550 |
+
}
|
| 551 |
+
|
| 552 |
+
/* Mobile Styles */
|
| 553 |
+
@media (max-width: 768px) {
|
| 554 |
+
.sidebar {
|
| 555 |
+
position: fixed;
|
| 556 |
+
left: 0;
|
| 557 |
+
top: 0;
|
| 558 |
+
height: 100%;
|
| 559 |
+
transform: translateX(-100%);
|
| 560 |
+
z-index: 1000;
|
| 561 |
+
}
|
| 562 |
+
|
| 563 |
+
.sidebar.open {
|
| 564 |
+
transform: translateX(0);
|
| 565 |
+
}
|
| 566 |
+
|
| 567 |
+
.menu-toggle {
|
| 568 |
+
display: block;
|
| 569 |
+
}
|
| 570 |
+
|
| 571 |
+
.overlay {
|
| 572 |
+
display: none;
|
| 573 |
+
position: fixed;
|
| 574 |
+
top: 0;
|
| 575 |
+
left: 0;
|
| 576 |
+
right: 0;
|
| 577 |
+
bottom: 0;
|
| 578 |
+
background: rgba(0, 0, 0, 0.5);
|
| 579 |
+
z-index: 999;
|
| 580 |
+
}
|
| 581 |
+
|
| 582 |
+
.overlay.active {
|
| 583 |
+
display: block;
|
| 584 |
+
}
|
| 585 |
+
}
|
| 586 |
+
|
| 587 |
+
/* Settings View */
|
| 588 |
+
.settings-group {
|
| 589 |
+
margin-bottom: 24px;
|
| 590 |
+
}
|
| 591 |
+
|
| 592 |
+
.settings-label {
|
| 593 |
+
font-size: 13px;
|
| 594 |
+
color: var(--ios-gray);
|
| 595 |
+
text-transform: uppercase;
|
| 596 |
+
letter-spacing: 0.5px;
|
| 597 |
+
margin-bottom: 12px;
|
| 598 |
+
}
|
| 599 |
+
|
| 600 |
+
.toggle-switch {
|
| 601 |
+
display: flex;
|
| 602 |
+
align-items: center;
|
| 603 |
+
justify-content: space-between;
|
| 604 |
+
padding: 12px 0;
|
| 605 |
+
}
|
| 606 |
+
|
| 607 |
+
.toggle-label {
|
| 608 |
+
font-size: 15px;
|
| 609 |
+
color: #000;
|
| 610 |
+
}
|
| 611 |
+
|
| 612 |
+
.toggle-input {
|
| 613 |
+
position: relative;
|
| 614 |
+
width: 51px;
|
| 615 |
+
height: 31px;
|
| 616 |
+
background: var(--ios-gray-3);
|
| 617 |
+
border-radius: 31px;
|
| 618 |
+
cursor: pointer;
|
| 619 |
+
transition: background 0.3s;
|
| 620 |
+
}
|
| 621 |
+
|
| 622 |
+
.toggle-input.checked {
|
| 623 |
+
background: var(--ios-green);
|
| 624 |
+
}
|
| 625 |
+
|
| 626 |
+
.toggle-input::after {
|
| 627 |
+
content: '';
|
| 628 |
+
position: absolute;
|
| 629 |
+
width: 27px;
|
| 630 |
+
height: 27px;
|
| 631 |
+
border-radius: 50%;
|
| 632 |
+
background: white;
|
| 633 |
+
top: 2px;
|
| 634 |
+
left: 2px;
|
| 635 |
+
transition: transform 0.3s;
|
| 636 |
+
box-shadow: 0 2px 4px rgba(0, 0, 0, 0.2);
|
| 637 |
+
}
|
| 638 |
+
|
| 639 |
+
.toggle-input.checked::after {
|
| 640 |
+
transform: translateX(20px);
|
| 641 |
+
}
|
| 642 |
+
</style>
|
| 643 |
+
</head>
|
| 644 |
+
<body>
|
| 645 |
+
<!-- Pull to Refresh Indicator -->
|
| 646 |
+
<div class="pull-to-refresh" id="pullToRefresh">
|
| 647 |
+
<div class="pull-to-refresh-spinner"></div>
|
| 648 |
+
<span class="pull-to-refresh-text">Refreshing...</span>
|
| 649 |
+
</div>
|
| 650 |
+
|
| 651 |
+
<div class="app-container">
|
| 652 |
+
<!-- Sidebar -->
|
| 653 |
+
<nav class="sidebar" id="sidebar">
|
| 654 |
+
<div class="sidebar-header">
|
| 655 |
+
<div class="app-title">
|
| 656 |
+
<span class="material-icons">smart_toy</span>
|
| 657 |
+
Ultravox AI
|
| 658 |
+
</div>
|
| 659 |
+
<div class="app-subtitle">Voice Assistant</div>
|
| 660 |
+
</div>
|
| 661 |
+
|
| 662 |
+
<div class="nav-menu">
|
| 663 |
+
<a class="nav-item active" data-view="push-to-talk">
|
| 664 |
+
<span class="material-icons">mic</span>
|
| 665 |
+
Push to Talk
|
| 666 |
+
<span class="nav-badge" id="pttBadge" style="display: none;">Live</span>
|
| 667 |
+
</a>
|
| 668 |
+
|
| 669 |
+
<a class="nav-item" data-view="text-to-speech">
|
| 670 |
+
<span class="material-icons">record_voice_over</span>
|
| 671 |
+
Text to Speech
|
| 672 |
+
</a>
|
| 673 |
+
|
| 674 |
+
<a class="nav-item" data-view="logs">
|
| 675 |
+
<span class="material-icons">terminal</span>
|
| 676 |
+
Console Logs
|
| 677 |
+
</a>
|
| 678 |
+
|
| 679 |
+
<a class="nav-item" data-view="settings">
|
| 680 |
+
<span class="material-icons">settings</span>
|
| 681 |
+
Settings
|
| 682 |
+
</a>
|
| 683 |
+
</div>
|
| 684 |
+
</nav>
|
| 685 |
+
|
| 686 |
+
<!-- Overlay for mobile -->
|
| 687 |
+
<div class="overlay" id="overlay"></div>
|
| 688 |
+
|
| 689 |
+
<!-- Main Content -->
|
| 690 |
+
<main class="main-content">
|
| 691 |
+
<!-- Header -->
|
| 692 |
+
<header class="header">
|
| 693 |
+
<button class="menu-toggle" id="menuToggle">
|
| 694 |
+
<span class="material-icons">menu</span>
|
| 695 |
+
</button>
|
| 696 |
+
|
| 697 |
+
<h1 class="header-title" id="headerTitle">Push to Talk</h1>
|
| 698 |
+
|
| 699 |
+
<div class="connection-status">
|
| 700 |
+
<span class="status-dot" id="statusDot"></span>
|
| 701 |
+
<span id="statusText">Disconnected</span>
|
| 702 |
+
</div>
|
| 703 |
+
</header>
|
| 704 |
+
|
| 705 |
+
<!-- Push to Talk View -->
|
| 706 |
+
<div class="view-container active" id="push-to-talk">
|
| 707 |
+
<div class="ptt-container">
|
| 708 |
+
<!-- Single Clean Card -->
|
| 709 |
+
<div style="background: rgba(255, 255, 255, 0.98); backdrop-filter: blur(20px); border-radius: 24px; padding: 32px; box-shadow: 0 20px 40px rgba(0, 0, 0, 0.1);">
|
| 710 |
+
<!-- Connection Message -->
|
| 711 |
+
<div id="connectionMessage" style="background: linear-gradient(145deg, #FFF4E6, #FFF9F0); border: 1px solid #FFD700; border-radius: 12px; padding: 16px; margin-bottom: 24px; text-align: center; display: block;">
|
| 712 |
+
<span class="material-icons" style="color: var(--ios-orange); font-size: 24px; margin-bottom: 8px; display: block;">info</span>
|
| 713 |
+
<p style="margin: 0; color: #333; font-size: 14px; font-weight: 500;">Connect to start using voice assistant</p>
|
| 714 |
+
<p style="margin: 4px 0 0 0; color: var(--ios-gray); font-size: 12px;">Click the connect button below to begin</p>
|
| 715 |
+
</div>
|
| 716 |
+
|
| 717 |
+
<!-- Voice Selector at Top -->
|
| 718 |
+
<div style="text-align: center; margin-bottom: 32px;">
|
| 719 |
+
<select class="voice-selector" id="quickVoiceSelect" disabled style="background: linear-gradient(145deg, #f0f0f5, #ffffff); border: none; padding: 12px 24px; font-size: 14px; font-weight: 500; border-radius: 12px; width: auto; min-width: 180px; box-shadow: 0 2px 8px rgba(0, 0, 0, 0.05); opacity: 0.5; cursor: not-allowed;">
|
| 720 |
+
<option value="pf_dora" selected>🇧🇷 Portuguese Female</option>
|
| 721 |
+
<option value="pm_alex">🇧🇷 Portuguese Male</option>
|
| 722 |
+
<option value="af_bella">🇺🇸 English Female</option>
|
| 723 |
+
<option value="am_adam">🇺🇸 English Male</option>
|
| 724 |
+
</select>
|
| 725 |
+
</div>
|
| 726 |
+
|
| 727 |
+
<!-- Main Button Area -->
|
| 728 |
+
<div class="ptt-main-section" style="margin-bottom: 32px;">
|
| 729 |
+
<button class="ptt-button" id="talkBtn" disabled style="opacity: 0.3; cursor: not-allowed;">
|
| 730 |
+
<span class="material-icons">mic_off</span>
|
| 731 |
+
<span style="font-size: 11px; text-transform: uppercase; letter-spacing: 1px;">Offline</span>
|
| 732 |
+
</button>
|
| 733 |
+
|
| 734 |
+
<button class="ios-button" id="connectBtn" style="background: linear-gradient(145deg, #34C759, #30D158); width: 200px; padding: 16px; font-size: 16px; border-radius: 16px; margin-top: 24px; box-shadow: 0 6px 20px rgba(52, 199, 89, 0.3); font-weight: 600;">
|
| 735 |
+
<span class="material-icons" style="font-size: 22px;">wifi</span>
|
| 736 |
+
Connect Now
|
| 737 |
+
</button>
|
| 738 |
+
</div>
|
| 739 |
+
|
| 740 |
+
<!-- Inline Metrics -->
|
| 741 |
+
<div style="background: linear-gradient(145deg, #f8f9fa, #ffffff); border-radius: 16px; padding: 20px; margin-bottom: 24px;">
|
| 742 |
+
<div style="display: flex; justify-content: space-around; text-align: center;">
|
| 743 |
+
<div>
|
| 744 |
+
<div style="font-size: 20px; font-weight: 700; color: var(--ios-blue);" id="sentBytes">0</div>
|
| 745 |
+
<div style="font-size: 10px; color: var(--ios-gray); text-transform: uppercase; margin-top: 4px;">KB Sent</div>
|
| 746 |
+
</div>
|
| 747 |
+
<div style="width: 1px; background: var(--ios-gray-4);"></div>
|
| 748 |
+
<div>
|
| 749 |
+
<div style="font-size: 20px; font-weight: 700; color: var(--ios-green);" id="receivedBytes">0</div>
|
| 750 |
+
<div style="font-size: 10px; color: var(--ios-gray); text-transform: uppercase; margin-top: 4px;">KB Received</div>
|
| 751 |
+
</div>
|
| 752 |
+
<div style="width: 1px; background: var(--ios-gray-4);"></div>
|
| 753 |
+
<div>
|
| 754 |
+
<div style="font-size: 20px; font-weight: 700; color: var(--ios-orange);" id="latency">--</div>
|
| 755 |
+
<div style="font-size: 10px; color: var(--ios-gray); text-transform: uppercase; margin-top: 4px;">MS Latency</div>
|
| 756 |
+
</div>
|
| 757 |
+
</div>
|
| 758 |
+
</div>
|
| 759 |
+
|
| 760 |
+
<!-- Messages Area -->
|
| 761 |
+
<div style="margin-top: 20px;">
|
| 762 |
+
<div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 10px;">
|
| 763 |
+
<h4 style="color: var(--ios-gray); font-size: 14px; margin: 0;">📝 Conversation History</h4>
|
| 764 |
+
<button onclick="clearMessages()" style="background: linear-gradient(145deg, #ff453a, #ff6b6b); color: white; border: none; padding: 6px 12px; border-radius: 8px; font-size: 12px; cursor: pointer;">Clear</button>
|
| 765 |
+
</div>
|
| 766 |
+
<div id="messagesContainer" style="background: white; border-radius: 12px; padding: 12px; height: 200px; overflow-y: auto; border: 1px solid var(--ios-gray-4); box-shadow: inset 0 2px 5px rgba(0, 0, 0, 0.05);">
|
| 767 |
+
<div id="messagesList" style="font-size: 13px; color: var(--ios-gray);">
|
| 768 |
+
<p style="margin: 0; text-align: center; color: var(--ios-gray-3);">No messages yet. Connect and start talking!</p>
|
| 769 |
+
</div>
|
| 770 |
+
</div>
|
| 771 |
+
</div>
|
| 772 |
+
|
| 773 |
+
<!-- Status Line -->
|
| 774 |
+
<div style="text-align: center; margin-top: 15px;">
|
| 775 |
+
<div id="recentActivity" style="font-size: 12px; color: var(--ios-gray); padding: 8px; background: rgba(0, 0, 0, 0.02); border-radius: 8px; min-height: 30px; display: flex; align-items: center; justify-content: center;">
|
| 776 |
+
<p style="margin: 0;">Ready to connect</p>
|
| 777 |
+
</div>
|
| 778 |
+
</div>
|
| 779 |
+
|
| 780 |
+
<!-- Processing Indicator -->
|
| 781 |
+
<div id="processingIndicator" style="display: none; margin-top: 20px; text-align: center;">
|
| 782 |
+
<div style="background: linear-gradient(145deg, #E8F4FD, #F0F8FF); border-radius: 12px; padding: 16px; display: inline-flex; align-items: center; gap: 12px;">
|
| 783 |
+
<div class="processing-spinner" style="width: 24px; height: 24px; border: 3px solid var(--ios-gray-3); border-top-color: var(--ios-blue); border-radius: 50%; animation: spin 1s linear infinite;"></div>
|
| 784 |
+
<span style="color: var(--ios-blue); font-size: 14px; font-weight: 500;">Processing audio...</span>
|
| 785 |
+
</div>
|
| 786 |
+
</div>
|
| 787 |
+
|
| 788 |
+
<!-- Audio Replay Button (Hidden by default) -->
|
| 789 |
+
<div id="audioReplayContainer" style="display: none; margin-top: 20px; text-align: center;">
|
| 790 |
+
<button id="replayAudioBtn" class="ios-button" style="background: linear-gradient(145deg, #34C759, #30D158); padding: 12px 24px; font-size: 14px; border-radius: 12px; box-shadow: 0 4px 12px rgba(52, 199, 89, 0.2);">
|
| 791 |
+
<span class="material-icons" style="font-size: 18px;">replay</span>
|
| 792 |
+
Replay Last Audio
|
| 793 |
+
</button>
|
| 794 |
+
</div>
|
| 795 |
+
</div>
|
| 796 |
+
</div>
|
| 797 |
+
</div>
|
| 798 |
+
|
| 799 |
+
<!-- Text to Speech View -->
|
| 800 |
+
<div class="view-container" id="text-to-speech">
|
| 801 |
+
<div class="ios-card">
|
| 802 |
+
<h2 class="card-title">Text to Speech</h2>
|
| 803 |
+
|
| 804 |
+
<div style="margin-bottom: 16px;">
|
| 805 |
+
<textarea class="tts-textarea" id="ttsText" placeholder="Enter text to convert to speech...">Olá! Este é um teste de voz.</textarea>
|
| 806 |
+
</div>
|
| 807 |
+
|
| 808 |
+
<div style="margin-bottom: 16px;">
|
| 809 |
+
<select class="voice-selector" id="voiceSelect">
|
| 810 |
+
<optgroup label="Portuguese">
|
| 811 |
+
<option value="pf_dora" selected>Female - Dora</option>
|
| 812 |
+
<option value="pm_alex">Male - Alex</option>
|
| 813 |
+
<option value="pm_santa">Male - Santa</option>
|
| 814 |
+
</optgroup>
|
| 815 |
+
<optgroup label="English">
|
| 816 |
+
<option value="af_bella">Female - Bella</option>
|
| 817 |
+
<option value="af_heart">Female - Heart</option>
|
| 818 |
+
<option value="am_adam">Male - Adam</option>
|
| 819 |
+
</optgroup>
|
| 820 |
+
</select>
|
| 821 |
+
</div>
|
| 822 |
+
|
| 823 |
+
<button class="ios-button success" id="ttsPlayBtn" disabled>
|
| 824 |
+
<span class="material-icons">play_arrow</span>
|
| 825 |
+
Generate Audio
|
| 826 |
+
</button>
|
| 827 |
+
|
| 828 |
+
<div class="loading-spinner" id="ttsLoader"></div>
|
| 829 |
+
|
| 830 |
+
<div id="ttsPlayer" style="display: none; margin-top: 16px;">
|
| 831 |
+
<audio id="ttsAudio" controls style="width: 100%;"></audio>
|
| 832 |
+
</div>
|
| 833 |
+
</div>
|
| 834 |
+
</div>
|
| 835 |
+
|
| 836 |
+
<!-- Logs View -->
|
| 837 |
+
<div class="view-container" id="logs">
|
| 838 |
+
<div class="ios-card">
|
| 839 |
+
<h2 class="card-title">Console Output</h2>
|
| 840 |
+
<div class="log-container" id="log"></div>
|
| 841 |
+
|
| 842 |
+
<div style="margin-top: 16px; display: flex; gap: 12px;">
|
| 843 |
+
<button class="ios-button" style="background: linear-gradient(145deg, #007AFF, #0051D5);" onclick="copyAllLogs()">
|
| 844 |
+
<span class="material-icons">content_copy</span>
|
| 845 |
+
Copy All Logs
|
| 846 |
+
</button>
|
| 847 |
+
<button class="ios-button secondary" onclick="document.getElementById('log').innerHTML = ''; log('Console cleared', 'info');">
|
| 848 |
+
<span class="material-icons">clear_all</span>
|
| 849 |
+
Clear Logs
|
| 850 |
+
</button>
|
| 851 |
+
</div>
|
| 852 |
+
</div>
|
| 853 |
+
</div>
|
| 854 |
+
|
| 855 |
+
<!-- Settings View -->
|
| 856 |
+
<div class="view-container" id="settings">
|
| 857 |
+
<div class="ios-card">
|
| 858 |
+
<h2 class="card-title">Voice Settings</h2>
|
| 859 |
+
|
| 860 |
+
<div class="settings-group">
|
| 861 |
+
<div class="settings-label">Default Voice</div>
|
| 862 |
+
<select class="voice-selector" id="settingsVoiceSelect">
|
| 863 |
+
<optgroup label="Portuguese">
|
| 864 |
+
<option value="pf_dora" selected>Female - Dora</option>
|
| 865 |
+
<option value="pm_alex">Male - Alex</option>
|
| 866 |
+
<option value="pm_santa">Male - Santa</option>
|
| 867 |
+
</optgroup>
|
| 868 |
+
</select>
|
| 869 |
+
</div>
|
| 870 |
+
|
| 871 |
+
<div class="settings-group">
|
| 872 |
+
<div class="settings-label">Audio Settings</div>
|
| 873 |
+
<div class="toggle-switch">
|
| 874 |
+
<span class="toggle-label">Auto-play responses</span>
|
| 875 |
+
<div class="toggle-input checked" id="autoplayToggle"></div>
|
| 876 |
+
</div>
|
| 877 |
+
<div class="toggle-switch">
|
| 878 |
+
<span class="toggle-label">Echo cancellation</span>
|
| 879 |
+
<div class="toggle-input checked" id="echoCancelToggle"></div>
|
| 880 |
+
</div>
|
| 881 |
+
<div class="toggle-switch">
|
| 882 |
+
<span class="toggle-label">Noise suppression</span>
|
| 883 |
+
<div class="toggle-input checked" id="noiseToggle"></div>
|
| 884 |
+
</div>
|
| 885 |
+
</div>
|
| 886 |
+
</div>
|
| 887 |
+
|
| 888 |
+
<div class="ios-card">
|
| 889 |
+
<h2 class="card-title">About</h2>
|
| 890 |
+
<p style="color: var(--ios-gray); font-size: 14px; line-height: 1.6;">
|
| 891 |
+
Ultravox AI Assistant v1.0<br>
|
| 892 |
+
Powered by advanced speech recognition and synthesis.<br>
|
| 893 |
+
<br>
|
| 894 |
+
Format: PCM 16-bit @ 24kHz<br>
|
| 895 |
+
Protocol: WebSocket + gRPC
|
| 896 |
+
</p>
|
| 897 |
+
</div>
|
| 898 |
+
</div>
|
| 899 |
+
</main>
|
| 900 |
+
</div>
|
| 901 |
+
|
| 902 |
+
<!-- Hidden selects for compatibility -->
|
| 903 |
+
<select id="ttsVoiceSelect" style="display: none;">
|
| 904 |
+
<option value="pf_dora" selected>pf_dora</option>
|
| 905 |
+
<option value="pm_alex">pm_alex</option>
|
| 906 |
+
<option value="pm_santa">pm_santa</option>
|
| 907 |
+
<option value="af_bella">af_bella</option>
|
| 908 |
+
<option value="af_heart">af_heart</option>
|
| 909 |
+
<option value="am_adam">am_adam</option>
|
| 910 |
+
</select>
|
| 911 |
+
|
| 912 |
+
<script>
|
| 913 |
+
// Navigation
|
| 914 |
+
const navItems = document.querySelectorAll('.nav-item');
|
| 915 |
+
const viewContainers = document.querySelectorAll('.view-container');
|
| 916 |
+
const headerTitle = document.getElementById('headerTitle');
|
| 917 |
+
const sidebar = document.getElementById('sidebar');
|
| 918 |
+
const overlay = document.getElementById('overlay');
|
| 919 |
+
const menuToggle = document.getElementById('menuToggle');
|
| 920 |
+
|
| 921 |
+
// Handle navigation
|
| 922 |
+
navItems.forEach(item => {
|
| 923 |
+
item.addEventListener('click', (e) => {
|
| 924 |
+
e.preventDefault();
|
| 925 |
+
const viewId = item.dataset.view;
|
| 926 |
+
|
| 927 |
+
// Update active nav
|
| 928 |
+
navItems.forEach(nav => nav.classList.remove('active'));
|
| 929 |
+
item.classList.add('active');
|
| 930 |
+
|
| 931 |
+
// Update active view
|
| 932 |
+
viewContainers.forEach(view => view.classList.remove('active'));
|
| 933 |
+
document.getElementById(viewId).classList.add('active');
|
| 934 |
+
|
| 935 |
+
// Update header title
|
| 936 |
+
headerTitle.textContent = item.textContent.trim();
|
| 937 |
+
|
| 938 |
+
// Close mobile menu
|
| 939 |
+
if (window.innerWidth <= 768) {
|
| 940 |
+
sidebar.classList.remove('open');
|
| 941 |
+
overlay.classList.remove('active');
|
| 942 |
+
}
|
| 943 |
+
});
|
| 944 |
+
});
|
| 945 |
+
|
| 946 |
+
// Mobile menu toggle
|
| 947 |
+
menuToggle.addEventListener('click', () => {
|
| 948 |
+
sidebar.classList.toggle('open');
|
| 949 |
+
overlay.classList.toggle('active');
|
| 950 |
+
});
|
| 951 |
+
|
| 952 |
+
overlay.addEventListener('click', () => {
|
| 953 |
+
sidebar.classList.remove('open');
|
| 954 |
+
overlay.classList.remove('active');
|
| 955 |
+
});
|
| 956 |
+
|
| 957 |
+
// Toggle switches
|
| 958 |
+
document.querySelectorAll('.toggle-input').forEach(toggle => {
|
| 959 |
+
toggle.addEventListener('click', () => {
|
| 960 |
+
toggle.classList.toggle('checked');
|
| 961 |
+
});
|
| 962 |
+
});
|
| 963 |
+
|
| 964 |
+
// Sync voice selectors
|
| 965 |
+
const voiceSelects = [
|
| 966 |
+
document.getElementById('voiceSelect'),
|
| 967 |
+
document.getElementById('settingsVoiceSelect'),
|
| 968 |
+
document.getElementById('quickVoiceSelect')
|
| 969 |
+
];
|
| 970 |
+
|
| 971 |
+
voiceSelects.forEach(select => {
|
| 972 |
+
if (select) {
|
| 973 |
+
select.addEventListener('change', () => {
|
| 974 |
+
const value = select.value;
|
| 975 |
+
voiceSelects.forEach(s => {
|
| 976 |
+
if (s) s.value = value;
|
| 977 |
+
});
|
| 978 |
+
document.getElementById('ttsVoiceSelect').value = value;
|
| 979 |
+
document.getElementById('currentVoice').textContent = value.split('_')[1] || value;
|
| 980 |
+
|
| 981 |
+
// Update recent activity
|
| 982 |
+
const recentActivity = document.getElementById('recentActivity');
|
| 983 |
+
if (recentActivity) {
|
| 984 |
+
const time = new Date().toLocaleTimeString('pt-BR', { hour: '2-digit', minute: '2-digit' });
|
| 985 |
+
recentActivity.innerHTML = `<p style="margin: 0; color: var(--ios-blue);">${time} - Voice changed to ${value}</p>` + recentActivity.innerHTML;
|
| 986 |
+
}
|
| 987 |
+
|
| 988 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 989 |
+
ws.send(JSON.stringify({
|
| 990 |
+
type: 'set-voice',
|
| 991 |
+
voice_id: value
|
| 992 |
+
}));
|
| 993 |
+
log(`Voice changed to: ${value}`, 'info');
|
| 994 |
+
}
|
| 995 |
+
});
|
| 996 |
+
}
|
| 997 |
+
});
|
| 998 |
+
|
| 999 |
+
// ========= ORIGINAL WEBSOCKET AND AUDIO CODE =========
|
| 1000 |
+
|
| 1001 |
+
// Estado da aplicação
|
| 1002 |
+
let ws = null;
|
| 1003 |
+
let isConnected = false;
|
| 1004 |
+
let isRecording = false;
|
| 1005 |
+
let audioContext = null;
|
| 1006 |
+
let stream = null;
|
| 1007 |
+
let audioSource = null;
|
| 1008 |
+
let audioProcessor = null;
|
| 1009 |
+
let pcmBuffer = [];
|
| 1010 |
+
|
| 1011 |
+
// Métricas
|
| 1012 |
+
const metrics = {
|
| 1013 |
+
sentBytes: 0,
|
| 1014 |
+
receivedBytes: 0,
|
| 1015 |
+
latency: 0,
|
| 1016 |
+
recordingStartTime: 0
|
| 1017 |
+
};
|
| 1018 |
+
|
| 1019 |
+
// Elementos DOM
|
| 1020 |
+
const elements = {
|
| 1021 |
+
statusDot: document.getElementById('statusDot'),
|
| 1022 |
+
statusText: document.getElementById('statusText'),
|
| 1023 |
+
connectBtn: document.getElementById('connectBtn'),
|
| 1024 |
+
talkBtn: document.getElementById('talkBtn'),
|
| 1025 |
+
voiceSelect: document.getElementById('voiceSelect'),
|
| 1026 |
+
sentBytes: document.getElementById('sentBytes'),
|
| 1027 |
+
receivedBytes: document.getElementById('receivedBytes'),
|
| 1028 |
+
latency: document.getElementById('latency'),
|
| 1029 |
+
log: document.getElementById('log'),
|
| 1030 |
+
// TTS elements
|
| 1031 |
+
ttsText: document.getElementById('ttsText'),
|
| 1032 |
+
ttsVoiceSelect: document.getElementById('ttsVoiceSelect'),
|
| 1033 |
+
ttsPlayBtn: document.getElementById('ttsPlayBtn'),
|
| 1034 |
+
ttsLoader: document.getElementById('ttsLoader'),
|
| 1035 |
+
ttsPlayer: document.getElementById('ttsPlayer'),
|
| 1036 |
+
ttsAudio: document.getElementById('ttsAudio')
|
| 1037 |
+
};
|
| 1038 |
+
|
| 1039 |
+
// Log no console visual
|
| 1040 |
+
function log(message, type = 'info') {
|
| 1041 |
+
const time = new Date().toLocaleTimeString('pt-BR');
|
| 1042 |
+
const entry = document.createElement('div');
|
| 1043 |
+
entry.className = `log-entry ${type}`;
|
| 1044 |
+
entry.innerHTML = `
|
| 1045 |
+
<span class="log-time">[${time}]</span>
|
| 1046 |
+
<span class="log-message">${message}</span>
|
| 1047 |
+
`;
|
| 1048 |
+
elements.log.appendChild(entry);
|
| 1049 |
+
elements.log.scrollTop = elements.log.scrollHeight;
|
| 1050 |
+
console.log(`[${type}] ${message}`);
|
| 1051 |
+
|
| 1052 |
+
// Update recent activity in Push to Talk view
|
| 1053 |
+
const recentActivity = document.getElementById('recentActivity');
|
| 1054 |
+
if (recentActivity && (type === 'success' || type === 'info')) {
|
| 1055 |
+
const shortTime = new Date().toLocaleTimeString('pt-BR', { hour: '2-digit', minute: '2-digit' });
|
| 1056 |
+
const color = type === 'success' ? 'var(--ios-green)' : 'var(--ios-gray)';
|
| 1057 |
+
const shortMessage = message.length > 50 ? message.substring(0, 50) + '...' : message;
|
| 1058 |
+
recentActivity.innerHTML = `<p style="margin: 0; color: ${color};">${shortTime} - ${shortMessage}</p>`;
|
| 1059 |
+
}
|
| 1060 |
+
}
|
| 1061 |
+
|
| 1062 |
+
// Atualizar métricas
|
| 1063 |
+
function updateMetrics() {
|
| 1064 |
+
elements.sentBytes.textContent = `${(metrics.sentBytes / 1024).toFixed(1)}`;
|
| 1065 |
+
elements.receivedBytes.textContent = `${(metrics.receivedBytes / 1024).toFixed(1)}`;
|
| 1066 |
+
elements.latency.textContent = `${metrics.latency}`;
|
| 1067 |
+
}
|
| 1068 |
+
|
| 1069 |
+
// Conectar ao WebSocket
|
| 1070 |
+
async function connect() {
|
| 1071 |
+
try {
|
| 1072 |
+
// Solicitar acesso ao microfone
|
| 1073 |
+
stream = await navigator.mediaDevices.getUserMedia({
|
| 1074 |
+
audio: {
|
| 1075 |
+
echoCancellation: true,
|
| 1076 |
+
noiseSuppression: true,
|
| 1077 |
+
sampleRate: 24000
|
| 1078 |
+
}
|
| 1079 |
+
});
|
| 1080 |
+
|
| 1081 |
+
log('Microphone accessed', 'success');
|
| 1082 |
+
|
| 1083 |
+
// Conectar WebSocket
|
| 1084 |
+
const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
|
| 1085 |
+
const wsUrl = `${protocol}//${window.location.host}/ws`;
|
| 1086 |
+
ws = new WebSocket(wsUrl);
|
| 1087 |
+
ws.binaryType = 'arraybuffer';
|
| 1088 |
+
|
| 1089 |
+
ws.onopen = () => {
|
| 1090 |
+
isConnected = true;
|
| 1091 |
+
elements.statusDot.classList.add('connected');
|
| 1092 |
+
elements.statusText.textContent = 'Connected';
|
| 1093 |
+
elements.connectBtn.innerHTML = '<span class="material-icons">power_settings_new</span>Disconnect';
|
| 1094 |
+
elements.connectBtn.style.background = 'linear-gradient(145deg, #FF3B30, #FF453A)';
|
| 1095 |
+
elements.talkBtn.disabled = false;
|
| 1096 |
+
elements.talkBtn.style.opacity = '1';
|
| 1097 |
+
elements.talkBtn.style.cursor = 'pointer';
|
| 1098 |
+
elements.talkBtn.innerHTML = '<span class="material-icons">mic</span><span style="font-size: 11px; text-transform: uppercase; letter-spacing: 1px;">Hold</span>';
|
| 1099 |
+
document.getElementById('pttBadge').style.display = 'block';
|
| 1100 |
+
|
| 1101 |
+
// Enable voice selector
|
| 1102 |
+
const quickVoiceSelect = document.getElementById('quickVoiceSelect');
|
| 1103 |
+
if (quickVoiceSelect) {
|
| 1104 |
+
quickVoiceSelect.disabled = false;
|
| 1105 |
+
quickVoiceSelect.style.opacity = '1';
|
| 1106 |
+
quickVoiceSelect.style.cursor = 'pointer';
|
| 1107 |
+
}
|
| 1108 |
+
|
| 1109 |
+
// Hide connection message
|
| 1110 |
+
const connectionMessage = document.getElementById('connectionMessage');
|
| 1111 |
+
if (connectionMessage) {
|
| 1112 |
+
connectionMessage.style.display = 'none';
|
| 1113 |
+
}
|
| 1114 |
+
|
| 1115 |
+
// Enviar voz selecionada
|
| 1116 |
+
const currentVoice = elements.voiceSelect.value || 'pf_dora';
|
| 1117 |
+
ws.send(JSON.stringify({
|
| 1118 |
+
type: 'set-voice',
|
| 1119 |
+
voice_id: currentVoice
|
| 1120 |
+
}));
|
| 1121 |
+
|
| 1122 |
+
elements.ttsPlayBtn.disabled = false;
|
| 1123 |
+
log('Connected to server', 'success');
|
| 1124 |
+
};
|
| 1125 |
+
|
| 1126 |
+
ws.onmessage = (event) => {
|
| 1127 |
+
if (event.data instanceof ArrayBuffer) {
|
| 1128 |
+
handlePCMAudio(event.data);
|
| 1129 |
+
} else {
|
| 1130 |
+
const data = JSON.parse(event.data);
|
| 1131 |
+
handleMessage(data);
|
| 1132 |
+
}
|
| 1133 |
+
};
|
| 1134 |
+
|
| 1135 |
+
ws.onerror = (error) => {
|
| 1136 |
+
log(`WebSocket error: ${error}`, 'error');
|
| 1137 |
+
};
|
| 1138 |
+
|
| 1139 |
+
ws.onclose = () => {
|
| 1140 |
+
disconnect();
|
| 1141 |
+
};
|
| 1142 |
+
|
| 1143 |
+
} catch (error) {
|
| 1144 |
+
log(`Connection error: ${error.message}`, 'error');
|
| 1145 |
+
}
|
| 1146 |
+
}
|
| 1147 |
+
|
| 1148 |
+
// Desconectar
|
| 1149 |
+
function disconnect() {
|
| 1150 |
+
isConnected = false;
|
| 1151 |
+
|
| 1152 |
+
if (ws) {
|
| 1153 |
+
ws.close();
|
| 1154 |
+
ws = null;
|
| 1155 |
+
}
|
| 1156 |
+
|
| 1157 |
+
if (stream) {
|
| 1158 |
+
stream.getTracks().forEach(track => track.stop());
|
| 1159 |
+
stream = null;
|
| 1160 |
+
}
|
| 1161 |
+
|
| 1162 |
+
if (audioContext) {
|
| 1163 |
+
audioContext.close();
|
| 1164 |
+
audioContext = null;
|
| 1165 |
+
}
|
| 1166 |
+
|
| 1167 |
+
elements.statusDot.classList.remove('connected');
|
| 1168 |
+
elements.statusText.textContent = 'Disconnected';
|
| 1169 |
+
elements.connectBtn.innerHTML = '<span class="material-icons">wifi</span>Connect Now';
|
| 1170 |
+
elements.connectBtn.style.background = 'linear-gradient(145deg, #34C759, #30D158)';
|
| 1171 |
+
elements.talkBtn.disabled = true;
|
| 1172 |
+
elements.talkBtn.style.opacity = '0.3';
|
| 1173 |
+
elements.talkBtn.style.cursor = 'not-allowed';
|
| 1174 |
+
elements.talkBtn.innerHTML = '<span class="material-icons">mic_off</span><span style="font-size: 11px; text-transform: uppercase; letter-spacing: 1px;">Offline</span>';
|
| 1175 |
+
document.getElementById('pttBadge').style.display = 'none';
|
| 1176 |
+
|
| 1177 |
+
// Disable voice selector
|
| 1178 |
+
const quickVoiceSelect = document.getElementById('quickVoiceSelect');
|
| 1179 |
+
if (quickVoiceSelect) {
|
| 1180 |
+
quickVoiceSelect.disabled = true;
|
| 1181 |
+
quickVoiceSelect.style.opacity = '0.5';
|
| 1182 |
+
quickVoiceSelect.style.cursor = 'not-allowed';
|
| 1183 |
+
}
|
| 1184 |
+
|
| 1185 |
+
// Show connection message
|
| 1186 |
+
const connectionMessage = document.getElementById('connectionMessage');
|
| 1187 |
+
if (connectionMessage) {
|
| 1188 |
+
connectionMessage.style.display = 'block';
|
| 1189 |
+
}
|
| 1190 |
+
|
| 1191 |
+
// Hide replay button
|
| 1192 |
+
const audioReplayContainer = document.getElementById('audioReplayContainer');
|
| 1193 |
+
if (audioReplayContainer) {
|
| 1194 |
+
audioReplayContainer.style.display = 'none';
|
| 1195 |
+
}
|
| 1196 |
+
|
| 1197 |
+
log('Disconnected', 'warning');
|
| 1198 |
+
}
|
| 1199 |
+
|
| 1200 |
+
// Variáveis para MediaRecorder
|
| 1201 |
+
let mediaRecorder = null;
|
| 1202 |
+
let audioChunks = [];
|
| 1203 |
+
|
| 1204 |
+
// Iniciar gravação com PCM (Opus desabilitado temporariamente)
|
| 1205 |
+
function startRecording() {
|
| 1206 |
+
if (isRecording) return;
|
| 1207 |
+
|
| 1208 |
+
isRecording = true;
|
| 1209 |
+
audioChunks = [];
|
| 1210 |
+
pcmBuffer = [];
|
| 1211 |
+
metrics.recordingStartTime = Date.now();
|
| 1212 |
+
elements.talkBtn.classList.add('recording');
|
| 1213 |
+
elements.talkBtn.innerHTML = '<span class="material-icons">stop</span><span>Recording</span>';
|
| 1214 |
+
|
| 1215 |
+
// FORÇAR USO DE PCM - Opus está com problemas no servidor
|
| 1216 |
+
const usingOpus = false;
|
| 1217 |
+
|
| 1218 |
+
// Usar apenas PCM
|
| 1219 |
+
if (!usingOpus) {
|
| 1220 |
+
if (!audioContext) {
|
| 1221 |
+
audioContext = new (window.AudioContext || window.webkitAudioContext)({
|
| 1222 |
+
sampleRate: 24000
|
| 1223 |
+
});
|
| 1224 |
+
}
|
| 1225 |
+
|
| 1226 |
+
audioSource = audioContext.createMediaStreamSource(stream);
|
| 1227 |
+
audioProcessor = audioContext.createScriptProcessor(4096, 1, 1);
|
| 1228 |
+
|
| 1229 |
+
audioProcessor.onaudioprocess = (e) => {
|
| 1230 |
+
if (!isRecording) return;
|
| 1231 |
+
|
| 1232 |
+
const inputData = e.inputBuffer.getChannelData(0);
|
| 1233 |
+
|
| 1234 |
+
// Calculate RMS
|
| 1235 |
+
let sumSquares = 0;
|
| 1236 |
+
for (let i = 0; i < inputData.length; i++) {
|
| 1237 |
+
sumSquares += inputData[i] * inputData[i];
|
| 1238 |
+
}
|
| 1239 |
+
const rms = Math.sqrt(sumSquares / inputData.length);
|
| 1240 |
+
|
| 1241 |
+
const voiceThreshold = 0.01;
|
| 1242 |
+
const hasVoice = rms > voiceThreshold;
|
| 1243 |
+
|
| 1244 |
+
let gain = 1.0;
|
| 1245 |
+
if (hasVoice && rms < 0.05) {
|
| 1246 |
+
gain = Math.min(5.0, 0.05 / rms);
|
| 1247 |
+
}
|
| 1248 |
+
|
| 1249 |
+
// Convert to PCM
|
| 1250 |
+
const pcmData = new Int16Array(inputData.length);
|
| 1251 |
+
for (let i = 0; i < inputData.length; i++) {
|
| 1252 |
+
let sample = inputData[i] * gain;
|
| 1253 |
+
|
| 1254 |
+
if (Math.abs(sample) > 0.95) {
|
| 1255 |
+
sample = Math.sign(sample) * (0.95 + 0.05 * Math.tanh((Math.abs(sample) - 0.95) * 10));
|
| 1256 |
+
}
|
| 1257 |
+
|
| 1258 |
+
sample = Math.max(-1, Math.min(1, sample));
|
| 1259 |
+
pcmData[i] = sample < 0 ? sample * 0x8000 : sample * 0x7FFF;
|
| 1260 |
+
}
|
| 1261 |
+
|
| 1262 |
+
if (hasVoice) {
|
| 1263 |
+
pcmBuffer.push(pcmData);
|
| 1264 |
+
}
|
| 1265 |
+
};
|
| 1266 |
+
|
| 1267 |
+
audioSource.connect(audioProcessor);
|
| 1268 |
+
audioProcessor.connect(audioContext.destination);
|
| 1269 |
+
|
| 1270 |
+
log('Recording with PCM 16-bit @ 24kHz', 'info');
|
| 1271 |
+
}
|
| 1272 |
+
}
|
| 1273 |
+
|
| 1274 |
+
// Enviar áudio Opus para o servidor
|
| 1275 |
+
async function sendOpusAudioToServer(audioBlob) {
|
| 1276 |
+
if (!ws || ws.readyState !== WebSocket.OPEN) {
|
| 1277 |
+
log('WebSocket not connected', 'error');
|
| 1278 |
+
return;
|
| 1279 |
+
}
|
| 1280 |
+
|
| 1281 |
+
// Show processing indicator
|
| 1282 |
+
const processingIndicator = document.getElementById('processingIndicator');
|
| 1283 |
+
if (processingIndicator) {
|
| 1284 |
+
processingIndicator.style.display = 'block';
|
| 1285 |
+
}
|
| 1286 |
+
|
| 1287 |
+
// Update recent activity
|
| 1288 |
+
const recentActivity = document.getElementById('recentActivity');
|
| 1289 |
+
if (recentActivity) {
|
| 1290 |
+
recentActivity.innerHTML = '<p style="margin: 0; color: var(--ios-blue);">⏳ Sending Opus audio to server...</p>';
|
| 1291 |
+
}
|
| 1292 |
+
|
| 1293 |
+
try {
|
| 1294 |
+
// Converter Blob para ArrayBuffer
|
| 1295 |
+
const arrayBuffer = await audioBlob.arrayBuffer();
|
| 1296 |
+
const uint8Array = new Uint8Array(arrayBuffer);
|
| 1297 |
+
|
| 1298 |
+
// Criar header para Opus (similar ao PCM mas com tipo diferente)
|
| 1299 |
+
const header = new ArrayBuffer(8);
|
| 1300 |
+
const view = new DataView(header);
|
| 1301 |
+
view.setUint32(0, 0x4F505553); // 'OPUS' em hex
|
| 1302 |
+
view.setUint32(4, uint8Array.length);
|
| 1303 |
+
|
| 1304 |
+
// Enviar header e dados
|
| 1305 |
+
ws.send(header);
|
| 1306 |
+
ws.send(uint8Array);
|
| 1307 |
+
|
| 1308 |
+
// Update metrics
|
| 1309 |
+
metrics.totalBytesSent += uint8Array.length;
|
| 1310 |
+
updateMetrics();
|
| 1311 |
+
|
| 1312 |
+
log(`Sent Opus audio: ${(uint8Array.length / 1024).toFixed(2)} KB`, 'info');
|
| 1313 |
+
|
| 1314 |
+
} catch (error) {
|
| 1315 |
+
log('Error sending Opus audio: ' + error.message, 'error');
|
| 1316 |
+
console.error('Opus send error:', error);
|
| 1317 |
+
}
|
| 1318 |
+
}
|
| 1319 |
+
|
| 1320 |
+
// Parar gravação
|
| 1321 |
+
function stopRecording() {
|
| 1322 |
+
if (!isRecording) return;
|
| 1323 |
+
|
| 1324 |
+
isRecording = false;
|
| 1325 |
+
elements.talkBtn.classList.remove('recording');
|
| 1326 |
+
elements.talkBtn.innerHTML = '<span class="material-icons">mic</span><span>Hold</span>';
|
| 1327 |
+
|
| 1328 |
+
// Sempre usar PCM
|
| 1329 |
+
if (audioProcessor) {
|
| 1330 |
+
audioProcessor.disconnect();
|
| 1331 |
+
audioProcessor = null;
|
| 1332 |
+
}
|
| 1333 |
+
if (audioSource) {
|
| 1334 |
+
audioSource.disconnect();
|
| 1335 |
+
audioSource = null;
|
| 1336 |
+
}
|
| 1337 |
+
|
| 1338 |
+
if (pcmBuffer.length === 0) {
|
| 1339 |
+
log('No audio captured', 'warning');
|
| 1340 |
+
return;
|
| 1341 |
+
}
|
| 1342 |
+
|
| 1343 |
+
// Combine PCM chunks
|
| 1344 |
+
const totalLength = pcmBuffer.reduce((acc, chunk) => acc + chunk.length, 0);
|
| 1345 |
+
const fullPCM = new Int16Array(totalLength);
|
| 1346 |
+
let offset = 0;
|
| 1347 |
+
for (const chunk of pcmBuffer) {
|
| 1348 |
+
fullPCM.set(chunk, offset);
|
| 1349 |
+
offset += chunk.length;
|
| 1350 |
+
}
|
| 1351 |
+
|
| 1352 |
+
// Send PCM
|
| 1353 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 1354 |
+
// Show processing indicator
|
| 1355 |
+
const processingIndicator = document.getElementById('processingIndicator');
|
| 1356 |
+
if (processingIndicator) {
|
| 1357 |
+
processingIndicator.style.display = 'block';
|
| 1358 |
+
}
|
| 1359 |
+
|
| 1360 |
+
// Update recent activity
|
| 1361 |
+
const recentActivity = document.getElementById('recentActivity');
|
| 1362 |
+
if (recentActivity) {
|
| 1363 |
+
recentActivity.innerHTML = '<p style="margin: 0; color: var(--ios-blue);">⏳ Sending audio to server...</p>';
|
| 1364 |
+
}
|
| 1365 |
+
|
| 1366 |
+
const header = new ArrayBuffer(8);
|
| 1367 |
+
const view = new DataView(header);
|
| 1368 |
+
view.setUint32(0, 0x50434D16);
|
| 1369 |
+
view.setUint32(4, fullPCM.length * 2);
|
| 1370 |
+
|
| 1371 |
+
ws.send(header);
|
| 1372 |
+
ws.send(fullPCM.buffer);
|
| 1373 |
+
|
| 1374 |
+
metrics.sentBytes += fullPCM.length * 2;
|
| 1375 |
+
updateMetrics();
|
| 1376 |
+
log(`PCM sent: ${(fullPCM.length * 2 / 1024).toFixed(1)}KB`, 'success');
|
| 1377 |
+
}
|
| 1378 |
+
|
| 1379 |
+
pcmBuffer = [];
|
| 1380 |
+
}
|
| 1381 |
+
|
| 1382 |
+
// Process messages
|
| 1383 |
+
function handleMessage(data) {
|
| 1384 |
+
switch (data.type) {
|
| 1385 |
+
case 'metrics':
|
| 1386 |
+
metrics.latency = data.latency;
|
| 1387 |
+
updateMetrics();
|
| 1388 |
+
|
| 1389 |
+
// Hide processing indicator when we get metrics (means processing is done)
|
| 1390 |
+
const processingIndicator = document.getElementById('processingIndicator');
|
| 1391 |
+
if (processingIndicator) {
|
| 1392 |
+
processingIndicator.style.display = 'none';
|
| 1393 |
+
}
|
| 1394 |
+
|
| 1395 |
+
// Update recent activity with response
|
| 1396 |
+
const recentActivity = document.getElementById('recentActivity');
|
| 1397 |
+
if (recentActivity) {
|
| 1398 |
+
recentActivity.innerHTML = `<p style="margin: 0; color: var(--ios-green);">✅ Response received (${data.latency}ms)</p>`;
|
| 1399 |
+
}
|
| 1400 |
+
|
| 1401 |
+
// Add to messages container
|
| 1402 |
+
const messagesList = document.getElementById('messagesList');
|
| 1403 |
+
if (messagesList) {
|
| 1404 |
+
// Clear initial message if it's the first message
|
| 1405 |
+
if (messagesList.innerHTML.includes('No messages yet')) {
|
| 1406 |
+
messagesList.innerHTML = '';
|
| 1407 |
+
}
|
| 1408 |
+
|
| 1409 |
+
// Add user message (audio)
|
| 1410 |
+
const userMsg = document.createElement('div');
|
| 1411 |
+
userMsg.style.cssText = 'margin-bottom: 10px; padding: 8px; background: linear-gradient(145deg, #007AFF, #0051D5); border-radius: 12px; color: white; word-wrap: break-word;';
|
| 1412 |
+
userMsg.innerHTML = `<strong>🎵 You:</strong> [Audio message sent]`;
|
| 1413 |
+
messagesList.appendChild(userMsg);
|
| 1414 |
+
|
| 1415 |
+
// Add assistant response (full message)
|
| 1416 |
+
const assistantMsg = document.createElement('div');
|
| 1417 |
+
assistantMsg.style.cssText = 'margin-bottom: 10px; padding: 8px; background: rgba(52, 199, 89, 0.1); border-radius: 12px; color: #333; word-wrap: break-word;';
|
| 1418 |
+
assistantMsg.innerHTML = `<strong>🤖 Assistant:</strong> ${data.response}`;
|
| 1419 |
+
messagesList.appendChild(assistantMsg);
|
| 1420 |
+
|
| 1421 |
+
// Add timestamp
|
| 1422 |
+
const timestamp = document.createElement('div');
|
| 1423 |
+
timestamp.style.cssText = 'font-size: 11px; color: var(--ios-gray-3); text-align: right; margin-bottom: 5px;';
|
| 1424 |
+
timestamp.innerHTML = new Date().toLocaleTimeString('pt-BR', { hour: '2-digit', minute: '2-digit', second: '2-digit' });
|
| 1425 |
+
messagesList.appendChild(timestamp);
|
| 1426 |
+
|
| 1427 |
+
// Scroll to bottom
|
| 1428 |
+
const container = document.getElementById('messagesContainer');
|
| 1429 |
+
if (container) {
|
| 1430 |
+
container.scrollTop = container.scrollHeight;
|
| 1431 |
+
}
|
| 1432 |
+
}
|
| 1433 |
+
|
| 1434 |
+
log(`Response: "${data.response}" (${data.latency}ms)`, 'success');
|
| 1435 |
+
break;
|
| 1436 |
+
|
| 1437 |
+
case 'error':
|
| 1438 |
+
log(`Error: ${data.message}`, 'error');
|
| 1439 |
+
break;
|
| 1440 |
+
|
| 1441 |
+
case 'tts-response':
|
| 1442 |
+
if (data.audio) {
|
| 1443 |
+
const binaryString = atob(data.audio);
|
| 1444 |
+
const bytes = new Uint8Array(binaryString.length);
|
| 1445 |
+
for (let i = 0; i < binaryString.length; i++) {
|
| 1446 |
+
bytes[i] = binaryString.charCodeAt(i);
|
| 1447 |
+
}
|
| 1448 |
+
|
| 1449 |
+
const sampleRate = data.sampleRate || 24000;
|
| 1450 |
+
const wavBuffer = addWavHeader(bytes.buffer, sampleRate);
|
| 1451 |
+
const blob = new Blob([wavBuffer], { type: 'audio/wav' });
|
| 1452 |
+
const audioUrl = URL.createObjectURL(blob);
|
| 1453 |
+
|
| 1454 |
+
elements.ttsAudio.src = audioUrl;
|
| 1455 |
+
elements.ttsPlayer.style.display = 'block';
|
| 1456 |
+
elements.ttsLoader.classList.remove('active');
|
| 1457 |
+
elements.ttsPlayBtn.disabled = false;
|
| 1458 |
+
elements.ttsPlayBtn.innerHTML = '<span class="material-icons">play_arrow</span>Generate Audio';
|
| 1459 |
+
|
| 1460 |
+
log('TTS audio generated', 'success');
|
| 1461 |
+
}
|
| 1462 |
+
break;
|
| 1463 |
+
}
|
| 1464 |
+
}
|
| 1465 |
+
|
| 1466 |
+
// Global variable to store last audio URL
|
| 1467 |
+
let lastAudioUrl = null;
|
| 1468 |
+
|
| 1469 |
+
// Handle PCM audio
|
| 1470 |
+
function handlePCMAudio(arrayBuffer) {
|
| 1471 |
+
metrics.receivedBytes += arrayBuffer.byteLength;
|
| 1472 |
+
updateMetrics();
|
| 1473 |
+
|
| 1474 |
+
// Hide processing indicator
|
| 1475 |
+
const processingIndicator = document.getElementById('processingIndicator');
|
| 1476 |
+
if (processingIndicator) {
|
| 1477 |
+
processingIndicator.style.display = 'none';
|
| 1478 |
+
}
|
| 1479 |
+
|
| 1480 |
+
const wavBuffer = addWavHeader(arrayBuffer);
|
| 1481 |
+
const blob = new Blob([wavBuffer], { type: 'audio/wav' });
|
| 1482 |
+
const audioUrl = URL.createObjectURL(blob);
|
| 1483 |
+
|
| 1484 |
+
// Store last audio URL
|
| 1485 |
+
lastAudioUrl = audioUrl;
|
| 1486 |
+
|
| 1487 |
+
// Show replay button
|
| 1488 |
+
const replayContainer = document.getElementById('audioReplayContainer');
|
| 1489 |
+
if (replayContainer) {
|
| 1490 |
+
replayContainer.style.display = 'block';
|
| 1491 |
+
}
|
| 1492 |
+
|
| 1493 |
+
const time = new Date().toLocaleTimeString('pt-BR');
|
| 1494 |
+
const entry = document.createElement('div');
|
| 1495 |
+
entry.className = 'log-entry success';
|
| 1496 |
+
entry.innerHTML = `
|
| 1497 |
+
<span class="log-time">[${time}]</span>
|
| 1498 |
+
<span class="log-message">🔊 Audio received: ${(arrayBuffer.byteLength / 1024).toFixed(1)}KB</span>
|
| 1499 |
+
<div class="audio-player">
|
| 1500 |
+
<button class="play-btn" onclick="playAudio('${audioUrl}')">▶️ Play</button>
|
| 1501 |
+
</div>
|
| 1502 |
+
`;
|
| 1503 |
+
elements.log.appendChild(entry);
|
| 1504 |
+
elements.log.scrollTop = elements.log.scrollHeight;
|
| 1505 |
+
|
| 1506 |
+
// Update recent activity
|
| 1507 |
+
const recentActivity = document.getElementById('recentActivity');
|
| 1508 |
+
if (recentActivity) {
|
| 1509 |
+
recentActivity.innerHTML = `<p style="margin: 0; color: var(--ios-green);">✅ Response received - ${(arrayBuffer.byteLength / 1024).toFixed(1)}KB</p>`;
|
| 1510 |
+
}
|
| 1511 |
+
|
| 1512 |
+
// Always try to auto-play
|
| 1513 |
+
const audio = new Audio(audioUrl);
|
| 1514 |
+
audio.play().then(() => {
|
| 1515 |
+
console.log('Audio playing automatically');
|
| 1516 |
+
}).catch(err => {
|
| 1517 |
+
console.log('Auto-play blocked, use replay button');
|
| 1518 |
+
// Flash the replay button to draw attention
|
| 1519 |
+
const replayBtn = document.getElementById('replayAudioBtn');
|
| 1520 |
+
if (replayBtn) {
|
| 1521 |
+
replayBtn.style.animation = 'pulse 1s ease-in-out 2';
|
| 1522 |
+
setTimeout(() => {
|
| 1523 |
+
replayBtn.style.animation = '';
|
| 1524 |
+
}, 2000);
|
| 1525 |
+
}
|
| 1526 |
+
});
|
| 1527 |
+
}
|
| 1528 |
+
|
| 1529 |
+
// Play audio
|
| 1530 |
+
function playAudio(url) {
|
| 1531 |
+
const audio = new Audio(url);
|
| 1532 |
+
audio.play();
|
| 1533 |
+
}
|
| 1534 |
+
|
| 1535 |
+
// Add WAV header
|
| 1536 |
+
function addWavHeader(pcmBuffer, customSampleRate) {
|
| 1537 |
+
const pcmData = new Uint8Array(pcmBuffer);
|
| 1538 |
+
const wavBuffer = new ArrayBuffer(44 + pcmData.length);
|
| 1539 |
+
const view = new DataView(wavBuffer);
|
| 1540 |
+
|
| 1541 |
+
const writeString = (offset, string) => {
|
| 1542 |
+
for (let i = 0; i < string.length; i++) {
|
| 1543 |
+
view.setUint8(offset + i, string.charCodeAt(i));
|
| 1544 |
+
}
|
| 1545 |
+
};
|
| 1546 |
+
|
| 1547 |
+
writeString(0, 'RIFF');
|
| 1548 |
+
view.setUint32(4, 36 + pcmData.length, true);
|
| 1549 |
+
writeString(8, 'WAVE');
|
| 1550 |
+
writeString(12, 'fmt ');
|
| 1551 |
+
view.setUint32(16, 16, true);
|
| 1552 |
+
view.setUint16(20, 1, true);
|
| 1553 |
+
view.setUint16(22, 1, true);
|
| 1554 |
+
|
| 1555 |
+
const sampleRate = customSampleRate || 24000;
|
| 1556 |
+
view.setUint32(24, sampleRate, true);
|
| 1557 |
+
view.setUint32(28, sampleRate * 2, true);
|
| 1558 |
+
view.setUint16(32, 2, true);
|
| 1559 |
+
view.setUint16(34, 16, true);
|
| 1560 |
+
writeString(36, 'data');
|
| 1561 |
+
view.setUint32(40, pcmData.length, true);
|
| 1562 |
+
|
| 1563 |
+
new Uint8Array(wavBuffer, 44).set(pcmData);
|
| 1564 |
+
return wavBuffer;
|
| 1565 |
+
}
|
| 1566 |
+
|
| 1567 |
+
// Event Listeners
|
| 1568 |
+
elements.connectBtn.addEventListener('click', () => {
|
| 1569 |
+
if (isConnected) {
|
| 1570 |
+
disconnect();
|
| 1571 |
+
} else {
|
| 1572 |
+
connect();
|
| 1573 |
+
}
|
| 1574 |
+
});
|
| 1575 |
+
|
| 1576 |
+
elements.talkBtn.addEventListener('mousedown', startRecording);
|
| 1577 |
+
elements.talkBtn.addEventListener('mouseup', stopRecording);
|
| 1578 |
+
elements.talkBtn.addEventListener('mouseleave', stopRecording);
|
| 1579 |
+
elements.talkBtn.addEventListener('touchstart', startRecording);
|
| 1580 |
+
elements.talkBtn.addEventListener('touchend', stopRecording);
|
| 1581 |
+
|
| 1582 |
+
// TTS Button
|
| 1583 |
+
elements.ttsPlayBtn.addEventListener('click', (e) => {
|
| 1584 |
+
e.preventDefault();
|
| 1585 |
+
|
| 1586 |
+
const text = elements.ttsText.value.trim();
|
| 1587 |
+
const voice = elements.ttsVoiceSelect.value;
|
| 1588 |
+
|
| 1589 |
+
if (!text) {
|
| 1590 |
+
alert('Please enter some text');
|
| 1591 |
+
return;
|
| 1592 |
+
}
|
| 1593 |
+
|
| 1594 |
+
if (!ws || ws.readyState !== WebSocket.OPEN) {
|
| 1595 |
+
alert('Please connect first');
|
| 1596 |
+
return;
|
| 1597 |
+
}
|
| 1598 |
+
|
| 1599 |
+
elements.ttsLoader.classList.add('active');
|
| 1600 |
+
elements.ttsPlayBtn.disabled = true;
|
| 1601 |
+
elements.ttsPlayBtn.innerHTML = '<span class="material-icons">hourglass_empty</span>Processing...';
|
| 1602 |
+
elements.ttsPlayer.style.display = 'none';
|
| 1603 |
+
|
| 1604 |
+
ws.send(JSON.stringify({
|
| 1605 |
+
type: 'text-to-speech',
|
| 1606 |
+
text: text,
|
| 1607 |
+
voice_id: voice,
|
| 1608 |
+
quality: 'high',
|
| 1609 |
+
format: 'opus'
|
| 1610 |
+
}));
|
| 1611 |
+
|
| 1612 |
+
log(`TTS requested: voice=${voice}`, 'info');
|
| 1613 |
+
});
|
| 1614 |
+
|
| 1615 |
+
// Replay button event listener
|
| 1616 |
+
const replayBtn = document.getElementById('replayAudioBtn');
|
| 1617 |
+
if (replayBtn) {
|
| 1618 |
+
replayBtn.addEventListener('click', () => {
|
| 1619 |
+
if (lastAudioUrl) {
|
| 1620 |
+
const audio = new Audio(lastAudioUrl);
|
| 1621 |
+
audio.play().then(() => {
|
| 1622 |
+
log('Replaying last audio', 'info');
|
| 1623 |
+
}).catch(err => {
|
| 1624 |
+
log('Error playing audio', 'error');
|
| 1625 |
+
});
|
| 1626 |
+
} else {
|
| 1627 |
+
log('No audio to replay', 'warning');
|
| 1628 |
+
}
|
| 1629 |
+
});
|
| 1630 |
+
}
|
| 1631 |
+
|
| 1632 |
+
// Pull to Refresh Implementation
|
| 1633 |
+
let startY = 0;
|
| 1634 |
+
let pullDistance = 0;
|
| 1635 |
+
let isPulling = false;
|
| 1636 |
+
const pullThreshold = 100;
|
| 1637 |
+
const pullToRefreshEl = document.getElementById('pullToRefresh');
|
| 1638 |
+
|
| 1639 |
+
// Touch events for pull to refresh
|
| 1640 |
+
document.addEventListener('touchstart', (e) => {
|
| 1641 |
+
if (window.scrollY === 0) {
|
| 1642 |
+
startY = e.touches[0].pageY;
|
| 1643 |
+
isPulling = true;
|
| 1644 |
+
}
|
| 1645 |
+
}, { passive: true });
|
| 1646 |
+
|
| 1647 |
+
document.addEventListener('touchmove', (e) => {
|
| 1648 |
+
if (!isPulling) return;
|
| 1649 |
+
|
| 1650 |
+
const currentY = e.touches[0].pageY;
|
| 1651 |
+
pullDistance = currentY - startY;
|
| 1652 |
+
|
| 1653 |
+
if (pullDistance > 0 && window.scrollY === 0) {
|
| 1654 |
+
e.preventDefault();
|
| 1655 |
+
|
| 1656 |
+
// Show pull to refresh indicator
|
| 1657 |
+
if (pullDistance > 20) {
|
| 1658 |
+
pullToRefreshEl.classList.add('show');
|
| 1659 |
+
|
| 1660 |
+
// Update text based on pull distance
|
| 1661 |
+
const pullText = pullToRefreshEl.querySelector('.pull-to-refresh-text');
|
| 1662 |
+
if (pullDistance > pullThreshold) {
|
| 1663 |
+
pullText.textContent = 'Release to refresh';
|
| 1664 |
+
} else {
|
| 1665 |
+
pullText.textContent = 'Pull to refresh';
|
| 1666 |
+
}
|
| 1667 |
+
|
| 1668 |
+
// Apply transform based on pull distance (with resistance)
|
| 1669 |
+
const resistance = Math.min(pullDistance / 3, 60);
|
| 1670 |
+
pullToRefreshEl.style.transform = `translateY(${60 + resistance}px)`;
|
| 1671 |
+
}
|
| 1672 |
+
}
|
| 1673 |
+
}, { passive: false });
|
| 1674 |
+
|
| 1675 |
+
document.addEventListener('touchend', () => {
|
| 1676 |
+
if (!isPulling) return;
|
| 1677 |
+
|
| 1678 |
+
if (pullDistance > pullThreshold) {
|
| 1679 |
+
// Trigger refresh
|
| 1680 |
+
pullToRefreshEl.classList.add('refreshing');
|
| 1681 |
+
pullToRefreshEl.querySelector('.pull-to-refresh-text').textContent = 'Refreshing...';
|
| 1682 |
+
|
| 1683 |
+
// Reload page after animation
|
| 1684 |
+
setTimeout(() => {
|
| 1685 |
+
window.location.reload();
|
| 1686 |
+
}, 1000);
|
| 1687 |
+
} else {
|
| 1688 |
+
// Hide pull to refresh
|
| 1689 |
+
pullToRefreshEl.classList.remove('show');
|
| 1690 |
+
pullToRefreshEl.style.transform = '';
|
| 1691 |
+
}
|
| 1692 |
+
|
| 1693 |
+
isPulling = false;
|
| 1694 |
+
pullDistance = 0;
|
| 1695 |
+
});
|
| 1696 |
+
|
| 1697 |
+
// Mouse events for desktop testing
|
| 1698 |
+
let mouseDown = false;
|
| 1699 |
+
let mouseStartY = 0;
|
| 1700 |
+
|
| 1701 |
+
document.addEventListener('mousedown', (e) => {
|
| 1702 |
+
if (window.scrollY === 0) {
|
| 1703 |
+
mouseStartY = e.pageY;
|
| 1704 |
+
mouseDown = true;
|
| 1705 |
+
}
|
| 1706 |
+
});
|
| 1707 |
+
|
| 1708 |
+
document.addEventListener('mousemove', (e) => {
|
| 1709 |
+
if (!mouseDown) return;
|
| 1710 |
+
|
| 1711 |
+
const currentY = e.pageY;
|
| 1712 |
+
const distance = currentY - mouseStartY;
|
| 1713 |
+
|
| 1714 |
+
if (distance > 0 && window.scrollY === 0) {
|
| 1715 |
+
e.preventDefault();
|
| 1716 |
+
|
| 1717 |
+
if (distance > 20) {
|
| 1718 |
+
pullToRefreshEl.classList.add('show');
|
| 1719 |
+
|
| 1720 |
+
const pullText = pullToRefreshEl.querySelector('.pull-to-refresh-text');
|
| 1721 |
+
if (distance > pullThreshold) {
|
| 1722 |
+
pullText.textContent = 'Release to refresh';
|
| 1723 |
+
pullToRefreshEl.classList.add('ready');
|
| 1724 |
+
} else {
|
| 1725 |
+
pullText.textContent = 'Pull to refresh';
|
| 1726 |
+
pullToRefreshEl.classList.remove('ready');
|
| 1727 |
+
}
|
| 1728 |
+
|
| 1729 |
+
const resistance = Math.min(distance / 3, 60);
|
| 1730 |
+
pullToRefreshEl.style.transform = `translateY(${60 + resistance}px)`;
|
| 1731 |
+
}
|
| 1732 |
+
}
|
| 1733 |
+
});
|
| 1734 |
+
|
| 1735 |
+
document.addEventListener('mouseup', () => {
|
| 1736 |
+
if (!mouseDown) return;
|
| 1737 |
+
|
| 1738 |
+
const distance = mouseStartY ? event.pageY - mouseStartY : 0;
|
| 1739 |
+
|
| 1740 |
+
if (distance > pullThreshold) {
|
| 1741 |
+
pullToRefreshEl.classList.add('refreshing');
|
| 1742 |
+
pullToRefreshEl.querySelector('.pull-to-refresh-text').textContent = 'Refreshing...';
|
| 1743 |
+
|
| 1744 |
+
setTimeout(() => {
|
| 1745 |
+
window.location.reload();
|
| 1746 |
+
}, 1000);
|
| 1747 |
+
} else {
|
| 1748 |
+
pullToRefreshEl.classList.remove('show', 'ready');
|
| 1749 |
+
pullToRefreshEl.style.transform = '';
|
| 1750 |
+
}
|
| 1751 |
+
|
| 1752 |
+
mouseDown = false;
|
| 1753 |
+
mouseStartY = 0;
|
| 1754 |
+
});
|
| 1755 |
+
|
| 1756 |
+
// Clear messages function
|
| 1757 |
+
function clearMessages() {
|
| 1758 |
+
const messagesList = document.getElementById('messagesList');
|
| 1759 |
+
if (messagesList) {
|
| 1760 |
+
messagesList.innerHTML = '<p style="margin: 0; text-align: center; color: var(--ios-gray-3);">No messages yet. Connect and start talking!</p>';
|
| 1761 |
+
}
|
| 1762 |
+
}
|
| 1763 |
+
|
| 1764 |
+
// Copy all logs function
|
| 1765 |
+
function copyAllLogs() {
|
| 1766 |
+
const logContainer = document.getElementById('log');
|
| 1767 |
+
if (!logContainer) {
|
| 1768 |
+
alert('No logs to copy');
|
| 1769 |
+
return;
|
| 1770 |
+
}
|
| 1771 |
+
|
| 1772 |
+
// Get all log entries
|
| 1773 |
+
const logEntries = logContainer.querySelectorAll('.log-entry');
|
| 1774 |
+
let logsText = '';
|
| 1775 |
+
|
| 1776 |
+
// Build text from all log entries
|
| 1777 |
+
logEntries.forEach(entry => {
|
| 1778 |
+
const time = entry.querySelector('.log-time')?.textContent || '';
|
| 1779 |
+
const message = entry.querySelector('.log-message')?.textContent || '';
|
| 1780 |
+
logsText += `${time} ${message}\n`;
|
| 1781 |
+
});
|
| 1782 |
+
|
| 1783 |
+
if (!logsText) {
|
| 1784 |
+
alert('No logs to copy');
|
| 1785 |
+
return;
|
| 1786 |
+
}
|
| 1787 |
+
|
| 1788 |
+
// Copy to clipboard
|
| 1789 |
+
if (navigator.clipboard && navigator.clipboard.writeText) {
|
| 1790 |
+
navigator.clipboard.writeText(logsText).then(() => {
|
| 1791 |
+
// Visual feedback
|
| 1792 |
+
const copyBtn = event.target.closest('button');
|
| 1793 |
+
const originalHTML = copyBtn.innerHTML;
|
| 1794 |
+
copyBtn.innerHTML = '<span class="material-icons">check</span>Copied!';
|
| 1795 |
+
copyBtn.style.background = 'linear-gradient(145deg, #34C759, #30D158)';
|
| 1796 |
+
|
| 1797 |
+
setTimeout(() => {
|
| 1798 |
+
copyBtn.innerHTML = originalHTML;
|
| 1799 |
+
copyBtn.style.background = 'linear-gradient(145deg, #007AFF, #0051D5)';
|
| 1800 |
+
}, 2000);
|
| 1801 |
+
|
| 1802 |
+
log('Logs copied to clipboard', 'success');
|
| 1803 |
+
}).catch(err => {
|
| 1804 |
+
// Fallback method
|
| 1805 |
+
fallbackCopyToClipboard(logsText);
|
| 1806 |
+
});
|
| 1807 |
+
} else {
|
| 1808 |
+
// Fallback for older browsers
|
| 1809 |
+
fallbackCopyToClipboard(logsText);
|
| 1810 |
+
}
|
| 1811 |
+
}
|
| 1812 |
+
|
| 1813 |
+
// Fallback copy method for older browsers
|
| 1814 |
+
function fallbackCopyToClipboard(text) {
|
| 1815 |
+
const textArea = document.createElement('textarea');
|
| 1816 |
+
textArea.value = text;
|
| 1817 |
+
textArea.style.position = 'fixed';
|
| 1818 |
+
textArea.style.top = '-9999px';
|
| 1819 |
+
document.body.appendChild(textArea);
|
| 1820 |
+
textArea.focus();
|
| 1821 |
+
textArea.select();
|
| 1822 |
+
|
| 1823 |
+
try {
|
| 1824 |
+
const successful = document.execCommand('copy');
|
| 1825 |
+
if (successful) {
|
| 1826 |
+
log('Logs copied to clipboard (fallback)', 'success');
|
| 1827 |
+
} else {
|
| 1828 |
+
alert('Failed to copy logs');
|
| 1829 |
+
}
|
| 1830 |
+
} catch (err) {
|
| 1831 |
+
alert('Failed to copy logs: ' + err);
|
| 1832 |
+
}
|
| 1833 |
+
|
| 1834 |
+
document.body.removeChild(textArea);
|
| 1835 |
+
}
|
| 1836 |
+
|
| 1837 |
+
// Initialize
|
| 1838 |
+
log('Ultravox AI Assistant initialized', 'info');
|
| 1839 |
+
log('Format: PCM 16-bit @ 24kHz', 'info');
|
| 1840 |
+
log('Pull down to refresh the page', 'info');
|
| 1841 |
+
</script>
|
| 1842 |
+
</body>
|
| 1843 |
+
</html>
|
services/webrtc_gateway/ultravox-chat-material.html
ADDED
|
@@ -0,0 +1,1116 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="pt-BR">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>Ultravox Chat PCM - Material Design</title>
|
| 7 |
+
|
| 8 |
+
<!-- Material Design CSS via CDN -->
|
| 9 |
+
<link href="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.css" rel="stylesheet">
|
| 10 |
+
<link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet">
|
| 11 |
+
<link href="https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;500;700&display=swap" rel="stylesheet">
|
| 12 |
+
|
| 13 |
+
<!-- Opus Decoder -->
|
| 14 |
+
<script src="opus-decoder.js"></script>
|
| 15 |
+
|
| 16 |
+
<style>
|
| 17 |
+
:root {
|
| 18 |
+
--mdc-theme-primary: #6200ee;
|
| 19 |
+
--mdc-theme-secondary: #03dac6;
|
| 20 |
+
--mdc-theme-error: #b00020;
|
| 21 |
+
--mdc-theme-surface: #ffffff;
|
| 22 |
+
--mdc-theme-background: #f5f5f5;
|
| 23 |
+
}
|
| 24 |
+
|
| 25 |
+
* {
|
| 26 |
+
margin: 0;
|
| 27 |
+
padding: 0;
|
| 28 |
+
box-sizing: border-box;
|
| 29 |
+
}
|
| 30 |
+
|
| 31 |
+
body {
|
| 32 |
+
font-family: 'Roboto', sans-serif;
|
| 33 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 34 |
+
min-height: 100vh;
|
| 35 |
+
padding: 20px;
|
| 36 |
+
}
|
| 37 |
+
|
| 38 |
+
.main-container {
|
| 39 |
+
max-width: 1200px;
|
| 40 |
+
margin: 0 auto;
|
| 41 |
+
}
|
| 42 |
+
|
| 43 |
+
.mdc-card {
|
| 44 |
+
margin-bottom: 20px;
|
| 45 |
+
padding: 24px;
|
| 46 |
+
}
|
| 47 |
+
|
| 48 |
+
.header-title {
|
| 49 |
+
font-size: 28px;
|
| 50 |
+
font-weight: 500;
|
| 51 |
+
color: #333;
|
| 52 |
+
margin-bottom: 24px;
|
| 53 |
+
display: flex;
|
| 54 |
+
align-items: center;
|
| 55 |
+
gap: 12px;
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
.status-chip {
|
| 59 |
+
display: inline-flex;
|
| 60 |
+
align-items: center;
|
| 61 |
+
padding: 8px 16px;
|
| 62 |
+
border-radius: 16px;
|
| 63 |
+
background: #f5f5f5;
|
| 64 |
+
margin-bottom: 16px;
|
| 65 |
+
}
|
| 66 |
+
|
| 67 |
+
.status-dot {
|
| 68 |
+
width: 12px;
|
| 69 |
+
height: 12px;
|
| 70 |
+
border-radius: 50%;
|
| 71 |
+
background: #dc3545;
|
| 72 |
+
margin-right: 8px;
|
| 73 |
+
display: inline-block;
|
| 74 |
+
}
|
| 75 |
+
|
| 76 |
+
.status-dot.connected {
|
| 77 |
+
background: #28a745;
|
| 78 |
+
animation: pulse 2s infinite;
|
| 79 |
+
}
|
| 80 |
+
|
| 81 |
+
@keyframes pulse {
|
| 82 |
+
0% { box-shadow: 0 0 0 0 rgba(40, 167, 69, 0.7); }
|
| 83 |
+
70% { box-shadow: 0 0 0 10px rgba(40, 167, 69, 0); }
|
| 84 |
+
100% { box-shadow: 0 0 0 0 rgba(40, 167, 69, 0); }
|
| 85 |
+
}
|
| 86 |
+
|
| 87 |
+
.controls-grid {
|
| 88 |
+
display: grid;
|
| 89 |
+
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
|
| 90 |
+
gap: 16px;
|
| 91 |
+
margin-bottom: 24px;
|
| 92 |
+
}
|
| 93 |
+
|
| 94 |
+
.voice-selector-container {
|
| 95 |
+
margin-bottom: 24px;
|
| 96 |
+
}
|
| 97 |
+
|
| 98 |
+
.metrics-grid {
|
| 99 |
+
display: grid;
|
| 100 |
+
grid-template-columns: repeat(auto-fit, minmax(150px, 1fr));
|
| 101 |
+
gap: 16px;
|
| 102 |
+
margin-bottom: 24px;
|
| 103 |
+
}
|
| 104 |
+
|
| 105 |
+
.metric-card {
|
| 106 |
+
background: #f8f9fa;
|
| 107 |
+
padding: 16px;
|
| 108 |
+
border-radius: 8px;
|
| 109 |
+
text-align: center;
|
| 110 |
+
}
|
| 111 |
+
|
| 112 |
+
.metric-label {
|
| 113 |
+
font-size: 12px;
|
| 114 |
+
color: #6c757d;
|
| 115 |
+
margin-bottom: 8px;
|
| 116 |
+
text-transform: uppercase;
|
| 117 |
+
letter-spacing: 0.5px;
|
| 118 |
+
}
|
| 119 |
+
|
| 120 |
+
.metric-value {
|
| 121 |
+
font-size: 24px;
|
| 122 |
+
font-weight: 500;
|
| 123 |
+
color: #333;
|
| 124 |
+
}
|
| 125 |
+
|
| 126 |
+
.log-container {
|
| 127 |
+
background: #1e1e1e;
|
| 128 |
+
border-radius: 8px;
|
| 129 |
+
padding: 16px;
|
| 130 |
+
height: 300px;
|
| 131 |
+
overflow-y: auto;
|
| 132 |
+
font-family: 'Monaco', 'Menlo', monospace;
|
| 133 |
+
font-size: 12px;
|
| 134 |
+
}
|
| 135 |
+
|
| 136 |
+
.log-entry {
|
| 137 |
+
padding: 4px 0;
|
| 138 |
+
display: flex;
|
| 139 |
+
align-items: flex-start;
|
| 140 |
+
color: #e0e0e0;
|
| 141 |
+
}
|
| 142 |
+
|
| 143 |
+
.log-time {
|
| 144 |
+
color: #6c757d;
|
| 145 |
+
margin-right: 10px;
|
| 146 |
+
flex-shrink: 0;
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
.log-message {
|
| 150 |
+
flex: 1;
|
| 151 |
+
}
|
| 152 |
+
|
| 153 |
+
.log-entry.error { color: #ff5252; }
|
| 154 |
+
.log-entry.success { color: #69f0ae; }
|
| 155 |
+
.log-entry.info { color: #448aff; }
|
| 156 |
+
.log-entry.warning { color: #ffd740; }
|
| 157 |
+
|
| 158 |
+
.tts-textarea {
|
| 159 |
+
width: 100%;
|
| 160 |
+
min-height: 120px;
|
| 161 |
+
padding: 12px;
|
| 162 |
+
border: 1px solid #ddd;
|
| 163 |
+
border-radius: 4px;
|
| 164 |
+
font-family: 'Roboto', sans-serif;
|
| 165 |
+
font-size: 14px;
|
| 166 |
+
resize: vertical;
|
| 167 |
+
margin-bottom: 16px;
|
| 168 |
+
}
|
| 169 |
+
|
| 170 |
+
.tts-textarea:focus {
|
| 171 |
+
outline: none;
|
| 172 |
+
border-color: var(--mdc-theme-primary);
|
| 173 |
+
}
|
| 174 |
+
|
| 175 |
+
.audio-player {
|
| 176 |
+
display: inline-flex;
|
| 177 |
+
align-items: center;
|
| 178 |
+
gap: 10px;
|
| 179 |
+
margin-left: 10px;
|
| 180 |
+
}
|
| 181 |
+
|
| 182 |
+
.play-btn {
|
| 183 |
+
background: #007bff;
|
| 184 |
+
color: white;
|
| 185 |
+
border: none;
|
| 186 |
+
border-radius: 4px;
|
| 187 |
+
padding: 4px 8px;
|
| 188 |
+
cursor: pointer;
|
| 189 |
+
font-size: 12px;
|
| 190 |
+
}
|
| 191 |
+
|
| 192 |
+
.play-btn:hover {
|
| 193 |
+
background: #0056b3;
|
| 194 |
+
}
|
| 195 |
+
|
| 196 |
+
.mdc-button.recording {
|
| 197 |
+
background: #dc3545 !important;
|
| 198 |
+
animation: recordPulse 1s infinite;
|
| 199 |
+
}
|
| 200 |
+
|
| 201 |
+
@keyframes recordPulse {
|
| 202 |
+
0%, 100% { opacity: 1; }
|
| 203 |
+
50% { opacity: 0.7; }
|
| 204 |
+
}
|
| 205 |
+
|
| 206 |
+
#ttsStatus {
|
| 207 |
+
padding: 16px;
|
| 208 |
+
background: #f5f5f5;
|
| 209 |
+
border-radius: 8px;
|
| 210 |
+
margin-top: 16px;
|
| 211 |
+
}
|
| 212 |
+
|
| 213 |
+
#ttsPlayer {
|
| 214 |
+
margin-top: 16px;
|
| 215 |
+
}
|
| 216 |
+
|
| 217 |
+
#ttsPlayer audio {
|
| 218 |
+
width: 100%;
|
| 219 |
+
}
|
| 220 |
+
|
| 221 |
+
/* Mobile responsive */
|
| 222 |
+
@media (max-width: 600px) {
|
| 223 |
+
.main-container {
|
| 224 |
+
padding: 0;
|
| 225 |
+
}
|
| 226 |
+
|
| 227 |
+
.mdc-card {
|
| 228 |
+
border-radius: 0;
|
| 229 |
+
margin-bottom: 8px;
|
| 230 |
+
}
|
| 231 |
+
|
| 232 |
+
.header-title {
|
| 233 |
+
font-size: 24px;
|
| 234 |
+
}
|
| 235 |
+
|
| 236 |
+
.metrics-grid {
|
| 237 |
+
grid-template-columns: repeat(2, 1fr);
|
| 238 |
+
}
|
| 239 |
+
}
|
| 240 |
+
</style>
|
| 241 |
+
</head>
|
| 242 |
+
<body>
|
| 243 |
+
<div class="main-container">
|
| 244 |
+
<!-- Main Card -->
|
| 245 |
+
<div class="mdc-card mdc-elevation--z8">
|
| 246 |
+
<h1 class="header-title">
|
| 247 |
+
<span class="material-icons">rocket_launch</span>
|
| 248 |
+
Ultravox PCM - Otimizado
|
| 249 |
+
</h1>
|
| 250 |
+
|
| 251 |
+
<!-- Status -->
|
| 252 |
+
<div class="status-chip">
|
| 253 |
+
<span class="status-dot" id="statusDot"></span>
|
| 254 |
+
<span id="statusText">Desconectado</span>
|
| 255 |
+
<span style="margin-left: auto; margin-right: 8px;" id="latencyText">Latência: --ms</span>
|
| 256 |
+
</div>
|
| 257 |
+
|
| 258 |
+
<!-- Voice Selector -->
|
| 259 |
+
<div class="voice-selector-container">
|
| 260 |
+
<div class="mdc-select mdc-select--filled" style="width: 100%;">
|
| 261 |
+
<div class="mdc-select__anchor" role="button" aria-haspopup="listbox" aria-expanded="false">
|
| 262 |
+
<span class="mdc-select__ripple"></span>
|
| 263 |
+
<span class="mdc-floating-label">Voz TTS</span>
|
| 264 |
+
<span class="mdc-select__selected-text"></span>
|
| 265 |
+
<span class="mdc-select__dropdown-icon">
|
| 266 |
+
<span class="material-icons">arrow_drop_down</span>
|
| 267 |
+
</span>
|
| 268 |
+
<span class="mdc-line-ripple"></span>
|
| 269 |
+
</div>
|
| 270 |
+
<div class="mdc-select__menu mdc-menu mdc-menu-surface mdc-menu-surface--fullwidth">
|
| 271 |
+
<ul class="mdc-list" role="listbox">
|
| 272 |
+
<li class="mdc-list-item mdc-list-item--selected" data-value="pf_dora" role="option">
|
| 273 |
+
<span class="mdc-list-item__ripple"></span>
|
| 274 |
+
<span class="mdc-list-item__text">🇧🇷 [pf_dora] Português Feminino (Dora)</span>
|
| 275 |
+
</li>
|
| 276 |
+
<li class="mdc-list-item" data-value="pm_alex" role="option">
|
| 277 |
+
<span class="mdc-list-item__ripple"></span>
|
| 278 |
+
<span class="mdc-list-item__text">🇧🇷 [pm_alex] Português Masculino (Alex)</span>
|
| 279 |
+
</li>
|
| 280 |
+
<li class="mdc-list-item" data-value="af_heart" role="option">
|
| 281 |
+
<span class="mdc-list-item__ripple"></span>
|
| 282 |
+
<span class="mdc-list-item__text">🌍 [af_heart] Alternativa Feminina (Heart)</span>
|
| 283 |
+
</li>
|
| 284 |
+
<li class="mdc-list-item" data-value="af_bella" role="option">
|
| 285 |
+
<span class="mdc-list-item__ripple"></span>
|
| 286 |
+
<span class="mdc-list-item__text">🌍 [af_bella] Alternativa Feminina (Bella)</span>
|
| 287 |
+
</li>
|
| 288 |
+
</ul>
|
| 289 |
+
</div>
|
| 290 |
+
</div>
|
| 291 |
+
<!-- Hidden select for compatibility with existing JS -->
|
| 292 |
+
<select id="voiceSelect" style="display: none;">
|
| 293 |
+
<option value="pf_dora" selected>Português Feminino (Dora)</option>
|
| 294 |
+
<option value="pm_alex">Português Masculino (Alex)</option>
|
| 295 |
+
<option value="af_heart">Alternativa Feminina (Heart)</option>
|
| 296 |
+
<option value="af_bella">Alternativa Feminina (Bella)</option>
|
| 297 |
+
</select>
|
| 298 |
+
</div>
|
| 299 |
+
|
| 300 |
+
<!-- Control Buttons -->
|
| 301 |
+
<div class="controls-grid">
|
| 302 |
+
<button id="connectBtn" class="mdc-button mdc-button--raised">
|
| 303 |
+
<span class="mdc-button__ripple"></span>
|
| 304 |
+
<i class="material-icons mdc-button__icon" aria-hidden="true">power_settings_new</i>
|
| 305 |
+
<span class="mdc-button__label">Conectar</span>
|
| 306 |
+
</button>
|
| 307 |
+
|
| 308 |
+
<button id="talkBtn" class="mdc-button mdc-button--raised" disabled>
|
| 309 |
+
<span class="mdc-button__ripple"></span>
|
| 310 |
+
<i class="material-icons mdc-button__icon" aria-hidden="true">mic</i>
|
| 311 |
+
<span class="mdc-button__label">Push to Talk</span>
|
| 312 |
+
</button>
|
| 313 |
+
</div>
|
| 314 |
+
|
| 315 |
+
<!-- Metrics -->
|
| 316 |
+
<div class="metrics-grid">
|
| 317 |
+
<div class="metric-card mdc-elevation--z2">
|
| 318 |
+
<div class="metric-label">Enviado</div>
|
| 319 |
+
<div class="metric-value" id="sentBytes">0 KB</div>
|
| 320 |
+
</div>
|
| 321 |
+
<div class="metric-card mdc-elevation--z2">
|
| 322 |
+
<div class="metric-label">Recebido</div>
|
| 323 |
+
<div class="metric-value" id="receivedBytes">0 KB</div>
|
| 324 |
+
</div>
|
| 325 |
+
<div class="metric-card mdc-elevation--z2">
|
| 326 |
+
<div class="metric-label">Formato</div>
|
| 327 |
+
<div class="metric-value" id="format">PCM</div>
|
| 328 |
+
</div>
|
| 329 |
+
<div class="metric-card mdc-elevation--z2">
|
| 330 |
+
<div class="metric-label">🎤 Voz</div>
|
| 331 |
+
<div class="metric-value" id="currentVoice" style="font-family: monospace; color: #4CAF50; font-weight: bold;">pf_dora</div>
|
| 332 |
+
</div>
|
| 333 |
+
</div>
|
| 334 |
+
|
| 335 |
+
<!-- Log -->
|
| 336 |
+
<div class="log-container" id="log"></div>
|
| 337 |
+
</div>
|
| 338 |
+
|
| 339 |
+
<!-- TTS Direct Card -->
|
| 340 |
+
<div class="mdc-card mdc-elevation--z8">
|
| 341 |
+
<h2 class="header-title">
|
| 342 |
+
<span class="material-icons">record_voice_over</span>
|
| 343 |
+
Text-to-Speech Direto
|
| 344 |
+
</h2>
|
| 345 |
+
<p style="color: #666; margin-bottom: 16px;">Digite ou edite o texto abaixo e escolha uma voz para converter em áudio</p>
|
| 346 |
+
|
| 347 |
+
<!-- TTS Text Area -->
|
| 348 |
+
<textarea id="ttsText" class="tts-textarea" placeholder="Digite seu texto aqui...">Olá! Teste de voz.</textarea>
|
| 349 |
+
|
| 350 |
+
<!-- TTS Voice Selector -->
|
| 351 |
+
<div style="display: flex; gap: 16px; align-items: center; margin-bottom: 16px;">
|
| 352 |
+
<div class="mdc-select mdc-select--filled" style="flex: 1;">
|
| 353 |
+
<div class="mdc-select__anchor" role="button" aria-haspopup="listbox" aria-expanded="false">
|
| 354 |
+
<span class="mdc-select__ripple"></span>
|
| 355 |
+
<span class="mdc-floating-label">Voz TTS</span>
|
| 356 |
+
<span class="mdc-select__selected-text"></span>
|
| 357 |
+
<span class="mdc-select__dropdown-icon">
|
| 358 |
+
<span class="material-icons">arrow_drop_down</span>
|
| 359 |
+
</span>
|
| 360 |
+
<span class="mdc-line-ripple"></span>
|
| 361 |
+
</div>
|
| 362 |
+
<div class="mdc-select__menu mdc-menu mdc-menu-surface mdc-menu-surface--fullwidth">
|
| 363 |
+
<ul class="mdc-list" role="listbox" style="max-height: 400px; overflow-y: auto;">
|
| 364 |
+
<!-- Portuguese voices -->
|
| 365 |
+
<li class="mdc-list-divider" role="separator">🇧🇷 Português</li>
|
| 366 |
+
<li class="mdc-list-item mdc-list-item--selected" data-value="pf_dora" role="option">
|
| 367 |
+
<span class="mdc-list-item__text">[pf_dora] Feminino - Dora</span>
|
| 368 |
+
</li>
|
| 369 |
+
<li class="mdc-list-item" data-value="pm_alex" role="option">
|
| 370 |
+
<span class="mdc-list-item__text">[pm_alex] Masculino - Alex</span>
|
| 371 |
+
</li>
|
| 372 |
+
<li class="mdc-list-item" data-value="pm_santa" role="option">
|
| 373 |
+
<span class="mdc-list-item__text">[pm_santa] Masculino - Santa</span>
|
| 374 |
+
</li>
|
| 375 |
+
<!-- Other languages - keeping all voices from original -->
|
| 376 |
+
<li class="mdc-list-divider" role="separator">🇺🇸 Inglês Americano</li>
|
| 377 |
+
<li class="mdc-list-item" data-value="af_alloy" role="option">
|
| 378 |
+
<span class="mdc-list-item__text">Feminino - Alloy</span>
|
| 379 |
+
</li>
|
| 380 |
+
<li class="mdc-list-item" data-value="af_bella" role="option">
|
| 381 |
+
<span class="mdc-list-item__text">Feminino - Bella</span>
|
| 382 |
+
</li>
|
| 383 |
+
<li class="mdc-list-item" data-value="af_heart" role="option">
|
| 384 |
+
<span class="mdc-list-item__text">Feminino - Heart</span>
|
| 385 |
+
</li>
|
| 386 |
+
<li class="mdc-list-item" data-value="am_adam" role="option">
|
| 387 |
+
<span class="mdc-list-item__text">Masculino - Adam</span>
|
| 388 |
+
</li>
|
| 389 |
+
<li class="mdc-list-item" data-value="am_echo" role="option">
|
| 390 |
+
<span class="mdc-list-item__text">Masculino - Echo</span>
|
| 391 |
+
</li>
|
| 392 |
+
</ul>
|
| 393 |
+
</div>
|
| 394 |
+
</div>
|
| 395 |
+
|
| 396 |
+
<!-- Hidden select for compatibility -->
|
| 397 |
+
<select id="ttsVoiceSelect" style="display: none;">
|
| 398 |
+
<optgroup label="🇧🇷 Português">
|
| 399 |
+
<option value="pf_dora" selected>[pf_dora] Feminino - Dora</option>
|
| 400 |
+
<option value="pm_alex">[pm_alex] Masculino - Alex</option>
|
| 401 |
+
<option value="pm_santa">[pm_santa] Masculino - Santa (Festivo)</option>
|
| 402 |
+
</optgroup>
|
| 403 |
+
<optgroup label="🇫🇷 Francês">
|
| 404 |
+
<option value="ff_siwis">[ff_siwis] Feminino - Siwis (Nativa)</option>
|
| 405 |
+
</optgroup>
|
| 406 |
+
<optgroup label="🇺🇸 Inglês Americano">
|
| 407 |
+
<option value="af_alloy">Feminino - Alloy</option>
|
| 408 |
+
<option value="af_aoede">Feminino - Aoede</option>
|
| 409 |
+
<option value="af_bella">Feminino - Bella</option>
|
| 410 |
+
<option value="af_heart">Feminino - Heart</option>
|
| 411 |
+
<option value="af_jessica">Feminino - Jessica</option>
|
| 412 |
+
<option value="af_kore">Feminino - Kore</option>
|
| 413 |
+
<option value="af_nicole">Feminino - Nicole</option>
|
| 414 |
+
<option value="af_nova">Feminino - Nova</option>
|
| 415 |
+
<option value="af_river">Feminino - River</option>
|
| 416 |
+
<option value="af_sarah">Feminino - Sarah</option>
|
| 417 |
+
<option value="af_sky">Feminino - Sky</option>
|
| 418 |
+
<option value="am_adam">Masculino - Adam</option>
|
| 419 |
+
<option value="am_echo">Masculino - Echo</option>
|
| 420 |
+
<option value="am_eric">Masculino - Eric</option>
|
| 421 |
+
<option value="am_fenrir">Masculino - Fenrir</option>
|
| 422 |
+
<option value="am_liam">Masculino - Liam</option>
|
| 423 |
+
<option value="am_michael">Masculino - Michael</option>
|
| 424 |
+
<option value="am_onyx">Masculino - Onyx</option>
|
| 425 |
+
<option value="am_puck">Masculino - Puck</option>
|
| 426 |
+
<option value="am_santa">Masculino - Santa</option>
|
| 427 |
+
</optgroup>
|
| 428 |
+
<optgroup label="🇬🇧 Inglês Britânico">
|
| 429 |
+
<option value="bf_alice">Feminino - Alice</option>
|
| 430 |
+
<option value="bf_emma">Feminino - Emma</option>
|
| 431 |
+
<option value="bf_isabella">Feminino - Isabella</option>
|
| 432 |
+
<option value="bf_lily">Feminino - Lily</option>
|
| 433 |
+
<option value="bm_daniel">Masculino - Daniel</option>
|
| 434 |
+
<option value="bm_fable">Masculino - Fable</option>
|
| 435 |
+
<option value="bm_george">Masculino - George</option>
|
| 436 |
+
<option value="bm_lewis">Masculino - Lewis</option>
|
| 437 |
+
</optgroup>
|
| 438 |
+
<optgroup label="🇪🇸 Espanhol">
|
| 439 |
+
<option value="ef_dora">Feminino - Dora</option>
|
| 440 |
+
<option value="em_alex">Masculino - Alex</option>
|
| 441 |
+
<option value="em_santa">Masculino - Santa</option>
|
| 442 |
+
</optgroup>
|
| 443 |
+
<optgroup label="🇮🇹 Italiano">
|
| 444 |
+
<option value="if_sara">Feminino - Sara</option>
|
| 445 |
+
<option value="im_nicola">Masculino - Nicola</option>
|
| 446 |
+
</optgroup>
|
| 447 |
+
<optgroup label="🇯🇵 Japonês">
|
| 448 |
+
<option value="jf_alpha">Feminino - Alpha</option>
|
| 449 |
+
<option value="jf_gongitsune">Feminino - Gongitsune</option>
|
| 450 |
+
<option value="jf_nezumi">Feminino - Nezumi</option>
|
| 451 |
+
<option value="jf_tebukuro">Feminino - Tebukuro</option>
|
| 452 |
+
<option value="jm_kumo">Masculino - Kumo</option>
|
| 453 |
+
</optgroup>
|
| 454 |
+
<optgroup label="🇨🇳 Chinês">
|
| 455 |
+
<option value="zf_xiaobei">Feminino - Xiaobei</option>
|
| 456 |
+
<option value="zf_xiaoni">Feminino - Xiaoni</option>
|
| 457 |
+
<option value="zf_xiaoxiao">Feminino - Xiaoxiao</option>
|
| 458 |
+
<option value="zf_xiaoyi">Feminino - Xiaoyi</option>
|
| 459 |
+
<option value="zm_yunjian">Masculino - Yunjian</option>
|
| 460 |
+
<option value="zm_yunxi">Masculino - Yunxi</option>
|
| 461 |
+
<option value="zm_yunxia">Masculino - Yunxia</option>
|
| 462 |
+
<option value="zm_yunyang">Masculino - Yunyang</option>
|
| 463 |
+
</optgroup>
|
| 464 |
+
<optgroup label="🇮🇳 Hindi">
|
| 465 |
+
<option value="hf_alpha">Feminino - Alpha</option>
|
| 466 |
+
<option value="hf_beta">Feminino - Beta</option>
|
| 467 |
+
<option value="hm_omega">Masculino - Omega</option>
|
| 468 |
+
<option value="hm_psi">Masculino - Psi</option>
|
| 469 |
+
</optgroup>
|
| 470 |
+
</select>
|
| 471 |
+
|
| 472 |
+
<button id="ttsPlayBtn" class="mdc-button mdc-button--raised" disabled>
|
| 473 |
+
<span class="mdc-button__ripple"></span>
|
| 474 |
+
<i class="material-icons mdc-button__icon" aria-hidden="true">play_arrow</i>
|
| 475 |
+
<span class="mdc-button__label">Gerar Áudio</span>
|
| 476 |
+
</button>
|
| 477 |
+
</div>
|
| 478 |
+
|
| 479 |
+
<!-- TTS Status -->
|
| 480 |
+
<div id="ttsStatus" style="display: none;">
|
| 481 |
+
<div class="mdc-linear-progress mdc-linear-progress--indeterminate" role="progressbar">
|
| 482 |
+
<div class="mdc-linear-progress__buffer">
|
| 483 |
+
<div class="mdc-linear-progress__buffer-bar"></div>
|
| 484 |
+
<div class="mdc-linear-progress__buffer-dots"></div>
|
| 485 |
+
</div>
|
| 486 |
+
<div class="mdc-linear-progress__bar mdc-linear-progress__primary-bar">
|
| 487 |
+
<span class="mdc-linear-progress__bar-inner"></span>
|
| 488 |
+
</div>
|
| 489 |
+
<div class="mdc-linear-progress__bar mdc-linear-progress__secondary-bar">
|
| 490 |
+
<span class="mdc-linear-progress__bar-inner"></span>
|
| 491 |
+
</div>
|
| 492 |
+
</div>
|
| 493 |
+
<p id="ttsStatusText" style="margin-top: 8px;">⏳ Processando...</p>
|
| 494 |
+
</div>
|
| 495 |
+
|
| 496 |
+
<!-- TTS Player -->
|
| 497 |
+
<div id="ttsPlayer" style="display: none;">
|
| 498 |
+
<audio id="ttsAudio" controls style="width: 100%;"></audio>
|
| 499 |
+
</div>
|
| 500 |
+
</div>
|
| 501 |
+
</div>
|
| 502 |
+
|
| 503 |
+
<!-- Material Design JavaScript via CDN -->
|
| 504 |
+
<script src="https://unpkg.com/material-components-web@latest/dist/material-components-web.min.js"></script>
|
| 505 |
+
|
| 506 |
+
<!-- Original JavaScript (preserved completely) -->
|
| 507 |
+
<script>
|
| 508 |
+
// Initialize Material Design Components
|
| 509 |
+
mdc.autoInit();
|
| 510 |
+
|
| 511 |
+
// Initialize specific MDC components
|
| 512 |
+
const mdcSelects = document.querySelectorAll('.mdc-select');
|
| 513 |
+
mdcSelects.forEach((selectEl, index) => {
|
| 514 |
+
const select = mdc.select.MDCSelect.attachTo(selectEl);
|
| 515 |
+
|
| 516 |
+
// Sync with hidden selects
|
| 517 |
+
select.listen('MDCSelect:change', () => {
|
| 518 |
+
const value = select.value;
|
| 519 |
+
if (index === 0) {
|
| 520 |
+
// Main voice selector
|
| 521 |
+
document.getElementById('voiceSelect').value = value;
|
| 522 |
+
document.getElementById('voiceSelect').dispatchEvent(new Event('change'));
|
| 523 |
+
} else {
|
| 524 |
+
// TTS voice selector
|
| 525 |
+
document.getElementById('ttsVoiceSelect').value = value;
|
| 526 |
+
document.getElementById('ttsVoiceSelect').dispatchEvent(new Event('change'));
|
| 527 |
+
}
|
| 528 |
+
});
|
| 529 |
+
});
|
| 530 |
+
|
| 531 |
+
// Initialize buttons
|
| 532 |
+
const buttons = document.querySelectorAll('.mdc-button');
|
| 533 |
+
buttons.forEach(buttonEl => {
|
| 534 |
+
mdc.ripple.MDCRipple.attachTo(buttonEl);
|
| 535 |
+
});
|
| 536 |
+
|
| 537 |
+
// ========= ORIGINAL JAVASCRIPT CODE (PRESERVED COMPLETELY) =========
|
| 538 |
+
|
| 539 |
+
// Estado da aplicação
|
| 540 |
+
let ws = null;
|
| 541 |
+
let isConnected = false;
|
| 542 |
+
let isRecording = false;
|
| 543 |
+
let audioContext = null;
|
| 544 |
+
let stream = null;
|
| 545 |
+
let audioSource = null;
|
| 546 |
+
let audioProcessor = null;
|
| 547 |
+
let pcmBuffer = [];
|
| 548 |
+
|
| 549 |
+
// Métricas
|
| 550 |
+
const metrics = {
|
| 551 |
+
sentBytes: 0,
|
| 552 |
+
receivedBytes: 0,
|
| 553 |
+
latency: 0,
|
| 554 |
+
recordingStartTime: 0
|
| 555 |
+
};
|
| 556 |
+
|
| 557 |
+
// Elementos DOM
|
| 558 |
+
const elements = {
|
| 559 |
+
statusDot: document.getElementById('statusDot'),
|
| 560 |
+
statusText: document.getElementById('statusText'),
|
| 561 |
+
latencyText: document.getElementById('latencyText'),
|
| 562 |
+
connectBtn: document.getElementById('connectBtn'),
|
| 563 |
+
talkBtn: document.getElementById('talkBtn'),
|
| 564 |
+
voiceSelect: document.getElementById('voiceSelect'),
|
| 565 |
+
sentBytes: document.getElementById('sentBytes'),
|
| 566 |
+
receivedBytes: document.getElementById('receivedBytes'),
|
| 567 |
+
format: document.getElementById('format'),
|
| 568 |
+
log: document.getElementById('log'),
|
| 569 |
+
// TTS elements
|
| 570 |
+
ttsText: document.getElementById('ttsText'),
|
| 571 |
+
ttsVoiceSelect: document.getElementById('ttsVoiceSelect'),
|
| 572 |
+
ttsPlayBtn: document.getElementById('ttsPlayBtn'),
|
| 573 |
+
ttsStatus: document.getElementById('ttsStatus'),
|
| 574 |
+
ttsStatusText: document.getElementById('ttsStatusText'),
|
| 575 |
+
ttsPlayer: document.getElementById('ttsPlayer'),
|
| 576 |
+
ttsAudio: document.getElementById('ttsAudio')
|
| 577 |
+
};
|
| 578 |
+
|
| 579 |
+
// Log no console visual
|
| 580 |
+
function log(message, type = 'info') {
|
| 581 |
+
const time = new Date().toLocaleTimeString('pt-BR');
|
| 582 |
+
const entry = document.createElement('div');
|
| 583 |
+
entry.className = `log-entry ${type}`;
|
| 584 |
+
entry.innerHTML = `
|
| 585 |
+
<span class="log-time">[${time}]</span>
|
| 586 |
+
<span class="log-message">${message}</span>
|
| 587 |
+
`;
|
| 588 |
+
elements.log.appendChild(entry);
|
| 589 |
+
elements.log.scrollTop = elements.log.scrollHeight;
|
| 590 |
+
console.log(`[${type}] ${message}`);
|
| 591 |
+
}
|
| 592 |
+
|
| 593 |
+
// Atualizar métricas
|
| 594 |
+
function updateMetrics() {
|
| 595 |
+
elements.sentBytes.textContent = `${(metrics.sentBytes / 1024).toFixed(1)} KB`;
|
| 596 |
+
elements.receivedBytes.textContent = `${(metrics.receivedBytes / 1024).toFixed(1)} KB`;
|
| 597 |
+
elements.latencyText.textContent = `Latência: ${metrics.latency}ms`;
|
| 598 |
+
}
|
| 599 |
+
|
| 600 |
+
// Conectar ao WebSocket
|
| 601 |
+
async function connect() {
|
| 602 |
+
try {
|
| 603 |
+
// Solicitar acesso ao microfone
|
| 604 |
+
stream = await navigator.mediaDevices.getUserMedia({
|
| 605 |
+
audio: {
|
| 606 |
+
echoCancellation: true,
|
| 607 |
+
noiseSuppression: true,
|
| 608 |
+
sampleRate: 24000 // High quality 24kHz
|
| 609 |
+
}
|
| 610 |
+
});
|
| 611 |
+
|
| 612 |
+
log('✅ Microfone acessado', 'success');
|
| 613 |
+
|
| 614 |
+
// Conectar WebSocket com suporte binário
|
| 615 |
+
const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
|
| 616 |
+
const wsUrl = `${protocol}//${window.location.host}/ws`;
|
| 617 |
+
ws = new WebSocket(wsUrl);
|
| 618 |
+
ws.binaryType = 'arraybuffer';
|
| 619 |
+
|
| 620 |
+
ws.onopen = () => {
|
| 621 |
+
isConnected = true;
|
| 622 |
+
elements.statusDot.classList.add('connected');
|
| 623 |
+
elements.statusText.textContent = 'Conectado';
|
| 624 |
+
|
| 625 |
+
// Update button appearance
|
| 626 |
+
elements.connectBtn.querySelector('.mdc-button__label').textContent = 'Desconectar';
|
| 627 |
+
elements.connectBtn.querySelector('.material-icons').textContent = 'power_settings_new';
|
| 628 |
+
elements.talkBtn.disabled = false;
|
| 629 |
+
|
| 630 |
+
// Enviar voz selecionada ao conectar
|
| 631 |
+
const currentVoice = elements.voiceSelect.value || elements.ttsVoiceSelect.value || 'pf_dora';
|
| 632 |
+
ws.send(JSON.stringify({
|
| 633 |
+
type: 'set-voice',
|
| 634 |
+
voice_id: currentVoice
|
| 635 |
+
}));
|
| 636 |
+
log(`🔊 Voz configurada: ${currentVoice}`, 'info');
|
| 637 |
+
elements.ttsPlayBtn.disabled = false; // Habilitar TTS button
|
| 638 |
+
log('✅ Conectado ao servidor', 'success');
|
| 639 |
+
};
|
| 640 |
+
|
| 641 |
+
ws.onmessage = (event) => {
|
| 642 |
+
if (event.data instanceof ArrayBuffer) {
|
| 643 |
+
// Áudio PCM binário recebido
|
| 644 |
+
handlePCMAudio(event.data);
|
| 645 |
+
} else {
|
| 646 |
+
// Mensagem JSON
|
| 647 |
+
const data = JSON.parse(event.data);
|
| 648 |
+
handleMessage(data);
|
| 649 |
+
}
|
| 650 |
+
};
|
| 651 |
+
|
| 652 |
+
ws.onerror = (error) => {
|
| 653 |
+
log(`❌ Erro WebSocket: ${error}`, 'error');
|
| 654 |
+
};
|
| 655 |
+
|
| 656 |
+
ws.onclose = () => {
|
| 657 |
+
disconnect();
|
| 658 |
+
};
|
| 659 |
+
|
| 660 |
+
} catch (error) {
|
| 661 |
+
log(`❌ Erro ao conectar: ${error.message}`, 'error');
|
| 662 |
+
}
|
| 663 |
+
}
|
| 664 |
+
|
| 665 |
+
// Desconectar
|
| 666 |
+
function disconnect() {
|
| 667 |
+
isConnected = false;
|
| 668 |
+
|
| 669 |
+
if (ws) {
|
| 670 |
+
ws.close();
|
| 671 |
+
ws = null;
|
| 672 |
+
}
|
| 673 |
+
|
| 674 |
+
if (stream) {
|
| 675 |
+
stream.getTracks().forEach(track => track.stop());
|
| 676 |
+
stream = null;
|
| 677 |
+
}
|
| 678 |
+
|
| 679 |
+
if (audioContext) {
|
| 680 |
+
audioContext.close();
|
| 681 |
+
audioContext = null;
|
| 682 |
+
}
|
| 683 |
+
|
| 684 |
+
elements.statusDot.classList.remove('connected');
|
| 685 |
+
elements.statusText.textContent = 'Desconectado';
|
| 686 |
+
elements.connectBtn.querySelector('.mdc-button__label').textContent = 'Conectar';
|
| 687 |
+
elements.talkBtn.disabled = true;
|
| 688 |
+
|
| 689 |
+
log('👋 Desconectado', 'warning');
|
| 690 |
+
}
|
| 691 |
+
|
| 692 |
+
// Iniciar gravação PCM
|
| 693 |
+
function startRecording() {
|
| 694 |
+
if (isRecording) return;
|
| 695 |
+
|
| 696 |
+
isRecording = true;
|
| 697 |
+
metrics.recordingStartTime = Date.now();
|
| 698 |
+
elements.talkBtn.classList.add('recording');
|
| 699 |
+
elements.talkBtn.querySelector('.mdc-button__label').textContent = 'Gravando...';
|
| 700 |
+
elements.talkBtn.querySelector('.material-icons').textContent = 'mic_off';
|
| 701 |
+
pcmBuffer = [];
|
| 702 |
+
|
| 703 |
+
const sampleRate = 24000; // Sempre usar melhor qualidade
|
| 704 |
+
log(`🎤 Gravando PCM 16-bit @ ${sampleRate}Hz (alta qualidade)`, 'info');
|
| 705 |
+
|
| 706 |
+
// Criar AudioContext se necessário
|
| 707 |
+
if (!audioContext) {
|
| 708 |
+
// Sempre usar melhor qualidade (24kHz)
|
| 709 |
+
const sampleRate = 24000;
|
| 710 |
+
|
| 711 |
+
audioContext = new (window.AudioContext || window.webkitAudioContext)({
|
| 712 |
+
sampleRate: sampleRate
|
| 713 |
+
});
|
| 714 |
+
|
| 715 |
+
log(`🎧 AudioContext criado: ${sampleRate}Hz (alta qualidade)`, 'info');
|
| 716 |
+
}
|
| 717 |
+
|
| 718 |
+
// Criar processador de áudio
|
| 719 |
+
audioSource = audioContext.createMediaStreamSource(stream);
|
| 720 |
+
audioProcessor = audioContext.createScriptProcessor(4096, 1, 1);
|
| 721 |
+
|
| 722 |
+
audioProcessor.onaudioprocess = (e) => {
|
| 723 |
+
if (!isRecording) return;
|
| 724 |
+
|
| 725 |
+
const inputData = e.inputBuffer.getChannelData(0);
|
| 726 |
+
|
| 727 |
+
// Calcular RMS (Root Mean Square) para melhor detecção de volume
|
| 728 |
+
let sumSquares = 0;
|
| 729 |
+
for (let i = 0; i < inputData.length; i++) {
|
| 730 |
+
sumSquares += inputData[i] * inputData[i];
|
| 731 |
+
}
|
| 732 |
+
const rms = Math.sqrt(sumSquares / inputData.length);
|
| 733 |
+
|
| 734 |
+
// Calcular amplitude máxima também
|
| 735 |
+
let maxAmplitude = 0;
|
| 736 |
+
for (let i = 0; i < inputData.length; i++) {
|
| 737 |
+
maxAmplitude = Math.max(maxAmplitude, Math.abs(inputData[i]));
|
| 738 |
+
}
|
| 739 |
+
|
| 740 |
+
// Detecção de voz baseada em RMS (mais confiável que amplitude máxima)
|
| 741 |
+
const voiceThreshold = 0.01; // Threshold para detectar voz
|
| 742 |
+
const hasVoice = rms > voiceThreshold;
|
| 743 |
+
|
| 744 |
+
// Aplicar ganho suave apenas se necessário
|
| 745 |
+
let gain = 1.0;
|
| 746 |
+
if (hasVoice && rms < 0.05) {
|
| 747 |
+
// Ganho suave baseado em RMS, máximo 5x
|
| 748 |
+
gain = Math.min(5.0, 0.05 / rms);
|
| 749 |
+
if (gain > 1.2) {
|
| 750 |
+
log(`🎤 Volume baixo detectado, aplicando ganho: ${gain.toFixed(1)}x`, 'info');
|
| 751 |
+
}
|
| 752 |
+
}
|
| 753 |
+
|
| 754 |
+
// Converter Float32 para Int16 com processamento melhorado
|
| 755 |
+
const pcmData = new Int16Array(inputData.length);
|
| 756 |
+
for (let i = 0; i < inputData.length; i++) {
|
| 757 |
+
// Aplicar ganho suave
|
| 758 |
+
let sample = inputData[i] * gain;
|
| 759 |
+
|
| 760 |
+
// Soft clipping para evitar distorção
|
| 761 |
+
if (Math.abs(sample) > 0.95) {
|
| 762 |
+
sample = Math.sign(sample) * (0.95 + 0.05 * Math.tanh((Math.abs(sample) - 0.95) * 10));
|
| 763 |
+
}
|
| 764 |
+
|
| 765 |
+
// Converter para Int16
|
| 766 |
+
sample = Math.max(-1, Math.min(1, sample));
|
| 767 |
+
pcmData[i] = sample < 0 ? sample * 0x8000 : sample * 0x7FFF;
|
| 768 |
+
}
|
| 769 |
+
|
| 770 |
+
// Adicionar ao buffer apenas se detectar voz
|
| 771 |
+
if (hasVoice) {
|
| 772 |
+
pcmBuffer.push(pcmData);
|
| 773 |
+
}
|
| 774 |
+
};
|
| 775 |
+
|
| 776 |
+
audioSource.connect(audioProcessor);
|
| 777 |
+
audioProcessor.connect(audioContext.destination);
|
| 778 |
+
}
|
| 779 |
+
|
| 780 |
+
// Parar gravação e enviar
|
| 781 |
+
function stopRecording() {
|
| 782 |
+
if (!isRecording) return;
|
| 783 |
+
|
| 784 |
+
isRecording = false;
|
| 785 |
+
const duration = Date.now() - metrics.recordingStartTime;
|
| 786 |
+
elements.talkBtn.classList.remove('recording');
|
| 787 |
+
elements.talkBtn.querySelector('.mdc-button__label').textContent = 'Push to Talk';
|
| 788 |
+
elements.talkBtn.querySelector('.material-icons').textContent = 'mic';
|
| 789 |
+
|
| 790 |
+
// Desconectar processador
|
| 791 |
+
if (audioProcessor) {
|
| 792 |
+
audioProcessor.disconnect();
|
| 793 |
+
audioProcessor = null;
|
| 794 |
+
}
|
| 795 |
+
if (audioSource) {
|
| 796 |
+
audioSource.disconnect();
|
| 797 |
+
audioSource = null;
|
| 798 |
+
}
|
| 799 |
+
|
| 800 |
+
// Verificar se há áudio para enviar
|
| 801 |
+
if (pcmBuffer.length === 0) {
|
| 802 |
+
log(`⚠️ Nenhum áudio capturado (silêncio ou volume muito baixo)`, 'warning');
|
| 803 |
+
pcmBuffer = [];
|
| 804 |
+
return;
|
| 805 |
+
}
|
| 806 |
+
|
| 807 |
+
// Combinar todos os chunks PCM
|
| 808 |
+
const totalLength = pcmBuffer.reduce((acc, chunk) => acc + chunk.length, 0);
|
| 809 |
+
|
| 810 |
+
// Verificar tamanho mínimo (0.5 segundos)
|
| 811 |
+
const sampleRate = 24000; // Sempre 24kHz
|
| 812 |
+
const minSamples = sampleRate * 0.5;
|
| 813 |
+
|
| 814 |
+
if (totalLength < minSamples) {
|
| 815 |
+
log(`⚠️ Áudio muito curto: ${(totalLength/sampleRate).toFixed(2)}s (mínimo 0.5s)`, 'warning');
|
| 816 |
+
pcmBuffer = [];
|
| 817 |
+
return;
|
| 818 |
+
}
|
| 819 |
+
|
| 820 |
+
const fullPCM = new Int16Array(totalLength);
|
| 821 |
+
let offset = 0;
|
| 822 |
+
for (const chunk of pcmBuffer) {
|
| 823 |
+
fullPCM.set(chunk, offset);
|
| 824 |
+
offset += chunk.length;
|
| 825 |
+
}
|
| 826 |
+
|
| 827 |
+
// Calcular amplitude final para debug
|
| 828 |
+
let maxAmp = 0;
|
| 829 |
+
for (let i = 0; i < Math.min(fullPCM.length, 1000); i++) {
|
| 830 |
+
maxAmp = Math.max(maxAmp, Math.abs(fullPCM[i] / 32768));
|
| 831 |
+
}
|
| 832 |
+
|
| 833 |
+
// Enviar PCM binário direto (sem Base64!)
|
| 834 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 835 |
+
// Enviar um header simples antes do áudio
|
| 836 |
+
const header = new ArrayBuffer(8);
|
| 837 |
+
const view = new DataView(header);
|
| 838 |
+
view.setUint32(0, 0x50434D16); // Magic: "PCM16"
|
| 839 |
+
view.setUint32(4, fullPCM.length * 2); // Tamanho em bytes
|
| 840 |
+
|
| 841 |
+
ws.send(header);
|
| 842 |
+
ws.send(fullPCM.buffer);
|
| 843 |
+
|
| 844 |
+
metrics.sentBytes += fullPCM.length * 2;
|
| 845 |
+
updateMetrics();
|
| 846 |
+
const sampleRate = 24000; // Sempre 24kHz
|
| 847 |
+
log(`📤 PCM enviado: ${(fullPCM.length * 2 / 1024).toFixed(1)}KB, ${(totalLength/sampleRate).toFixed(1)}s @ ${sampleRate}Hz, amp:${maxAmp.toFixed(3)}`, 'success');
|
| 848 |
+
}
|
| 849 |
+
|
| 850 |
+
// Limpar buffer após enviar
|
| 851 |
+
pcmBuffer = [];
|
| 852 |
+
}
|
| 853 |
+
|
| 854 |
+
// Processar mensagem JSON
|
| 855 |
+
function handleMessage(data) {
|
| 856 |
+
switch (data.type) {
|
| 857 |
+
case 'metrics':
|
| 858 |
+
metrics.latency = data.latency;
|
| 859 |
+
updateMetrics();
|
| 860 |
+
log(`📊 Resposta: "${data.response}" (${data.latency}ms)`, 'success');
|
| 861 |
+
break;
|
| 862 |
+
|
| 863 |
+
case 'error':
|
| 864 |
+
log(`❌ Erro: ${data.message}`, 'error');
|
| 865 |
+
break;
|
| 866 |
+
|
| 867 |
+
case 'tts-response':
|
| 868 |
+
// Resposta do TTS direto (Opus 24kHz ou PCM)
|
| 869 |
+
if (data.audio) {
|
| 870 |
+
// Decodificar base64 para arraybuffer
|
| 871 |
+
const binaryString = atob(data.audio);
|
| 872 |
+
const bytes = new Uint8Array(binaryString.length);
|
| 873 |
+
for (let i = 0; i < binaryString.length; i++) {
|
| 874 |
+
bytes[i] = binaryString.charCodeAt(i);
|
| 875 |
+
}
|
| 876 |
+
|
| 877 |
+
let audioData = bytes.buffer;
|
| 878 |
+
// IMPORTANTE: Usar a taxa enviada pelo servidor
|
| 879 |
+
const sampleRate = data.sampleRate || 24000;
|
| 880 |
+
|
| 881 |
+
console.log(`🎯 TTS Response - Taxa recebida: ${sampleRate}Hz, Formato: ${data.format}, Tamanho: ${bytes.length} bytes`);
|
| 882 |
+
|
| 883 |
+
// Se for Opus, usar WebAudio API para decodificar nativamente
|
| 884 |
+
let wavBuffer;
|
| 885 |
+
if (data.format === 'opus') {
|
| 886 |
+
console.log(`🗜️ Opus 24kHz recebido: ${(bytes.length/1024).toFixed(1)}KB`);
|
| 887 |
+
|
| 888 |
+
// Log de economia de banda
|
| 889 |
+
if (data.originalSize) {
|
| 890 |
+
const compression = Math.round(100 - (bytes.length / data.originalSize) * 100);
|
| 891 |
+
console.log(`📊 Economia de banda: ${compression}% (${(data.originalSize/1024).toFixed(1)}KB → ${(bytes.length/1024).toFixed(1)}KB)`);
|
| 892 |
+
}
|
| 893 |
+
|
| 894 |
+
// WebAudio API pode decodificar Opus nativamente
|
| 895 |
+
// Por agora, tratar como PCM até implementar decoder completo
|
| 896 |
+
wavBuffer = addWavHeader(audioData, sampleRate);
|
| 897 |
+
} else {
|
| 898 |
+
// PCM - adicionar WAV header com a taxa correta
|
| 899 |
+
wavBuffer = addWavHeader(audioData, sampleRate);
|
| 900 |
+
}
|
| 901 |
+
|
| 902 |
+
// Log da qualidade recebida
|
| 903 |
+
console.log(`🎵 TTS pronto: ${(audioData.byteLength/1024).toFixed(1)}KB @ ${sampleRate}Hz (${data.quality || 'high'} quality, ${data.format || 'pcm'})`);
|
| 904 |
+
|
| 905 |
+
// Criar blob e URL
|
| 906 |
+
const blob = new Blob([wavBuffer], { type: 'audio/wav' });
|
| 907 |
+
const audioUrl = URL.createObjectURL(blob);
|
| 908 |
+
|
| 909 |
+
// Atualizar player
|
| 910 |
+
elements.ttsAudio.src = audioUrl;
|
| 911 |
+
elements.ttsPlayer.style.display = 'block';
|
| 912 |
+
elements.ttsStatus.style.display = 'none';
|
| 913 |
+
elements.ttsPlayBtn.disabled = false;
|
| 914 |
+
elements.ttsPlayBtn.querySelector('.mdc-button__label').textContent = 'Gerar Áudio';
|
| 915 |
+
|
| 916 |
+
log('🎵 Áudio TTS gerado com sucesso!', 'success');
|
| 917 |
+
}
|
| 918 |
+
break;
|
| 919 |
+
}
|
| 920 |
+
}
|
| 921 |
+
|
| 922 |
+
// Processar áudio PCM recebido
|
| 923 |
+
function handlePCMAudio(arrayBuffer) {
|
| 924 |
+
metrics.receivedBytes += arrayBuffer.byteLength;
|
| 925 |
+
updateMetrics();
|
| 926 |
+
|
| 927 |
+
// Criar WAV header para reproduzir
|
| 928 |
+
const wavBuffer = addWavHeader(arrayBuffer);
|
| 929 |
+
|
| 930 |
+
// Criar blob e URL para o áudio
|
| 931 |
+
const blob = new Blob([wavBuffer], { type: 'audio/wav' });
|
| 932 |
+
const audioUrl = URL.createObjectURL(blob);
|
| 933 |
+
|
| 934 |
+
// Criar log com botão de play
|
| 935 |
+
const time = new Date().toLocaleTimeString('pt-BR');
|
| 936 |
+
const entry = document.createElement('div');
|
| 937 |
+
entry.className = 'log-entry success';
|
| 938 |
+
entry.innerHTML = `
|
| 939 |
+
<span class="log-time">[${time}]</span>
|
| 940 |
+
<span class="log-message">🔊 Áudio recebido: ${(arrayBuffer.byteLength / 1024).toFixed(1)}KB</span>
|
| 941 |
+
<div class="audio-player">
|
| 942 |
+
<button class="play-btn" onclick="playAudio('${audioUrl}')">▶️ Play</button>
|
| 943 |
+
<audio id="audio-${Date.now()}" src="${audioUrl}" style="display: none;"></audio>
|
| 944 |
+
</div>
|
| 945 |
+
`;
|
| 946 |
+
elements.log.appendChild(entry);
|
| 947 |
+
elements.log.scrollTop = elements.log.scrollHeight;
|
| 948 |
+
|
| 949 |
+
// Auto-play o áudio
|
| 950 |
+
const audio = new Audio(audioUrl);
|
| 951 |
+
audio.play().catch(err => {
|
| 952 |
+
console.log('Auto-play bloqueado, use o botão para reproduzir');
|
| 953 |
+
});
|
| 954 |
+
}
|
| 955 |
+
|
| 956 |
+
// Função para tocar áudio manualmente
|
| 957 |
+
function playAudio(url) {
|
| 958 |
+
const audio = new Audio(url);
|
| 959 |
+
audio.play();
|
| 960 |
+
}
|
| 961 |
+
|
| 962 |
+
// Adicionar header WAV ao PCM
|
| 963 |
+
function addWavHeader(pcmBuffer, customSampleRate) {
|
| 964 |
+
const pcmData = new Uint8Array(pcmBuffer);
|
| 965 |
+
const wavBuffer = new ArrayBuffer(44 + pcmData.length);
|
| 966 |
+
const view = new DataView(wavBuffer);
|
| 967 |
+
|
| 968 |
+
// WAV header
|
| 969 |
+
const writeString = (offset, string) => {
|
| 970 |
+
for (let i = 0; i < string.length; i++) {
|
| 971 |
+
view.setUint8(offset + i, string.charCodeAt(i));
|
| 972 |
+
}
|
| 973 |
+
};
|
| 974 |
+
|
| 975 |
+
writeString(0, 'RIFF');
|
| 976 |
+
view.setUint32(4, 36 + pcmData.length, true);
|
| 977 |
+
writeString(8, 'WAVE');
|
| 978 |
+
writeString(12, 'fmt ');
|
| 979 |
+
view.setUint32(16, 16, true); // fmt chunk size
|
| 980 |
+
view.setUint16(20, 1, true); // PCM format
|
| 981 |
+
view.setUint16(22, 1, true); // Mono
|
| 982 |
+
|
| 983 |
+
// Usar taxa customizada se fornecida, senão usar 24kHz
|
| 984 |
+
let sampleRate = customSampleRate || 24000;
|
| 985 |
+
|
| 986 |
+
console.log(`📝 WAV Header - Configurando taxa: ${sampleRate}Hz`);
|
| 987 |
+
|
| 988 |
+
view.setUint32(24, sampleRate, true); // Sample rate
|
| 989 |
+
view.setUint32(28, sampleRate * 2, true); // Byte rate: sampleRate * 1 * 2
|
| 990 |
+
view.setUint16(32, 2, true); // Block align: 1 * 2
|
| 991 |
+
view.setUint16(34, 16, true); // Bits per sample: 16-bit
|
| 992 |
+
writeString(36, 'data');
|
| 993 |
+
view.setUint32(40, pcmData.length, true);
|
| 994 |
+
|
| 995 |
+
// Copiar dados PCM
|
| 996 |
+
new Uint8Array(wavBuffer, 44).set(pcmData);
|
| 997 |
+
|
| 998 |
+
return wavBuffer;
|
| 999 |
+
}
|
| 1000 |
+
|
| 1001 |
+
// Event Listeners
|
| 1002 |
+
elements.connectBtn.addEventListener('click', () => {
|
| 1003 |
+
if (isConnected) {
|
| 1004 |
+
disconnect();
|
| 1005 |
+
} else {
|
| 1006 |
+
connect();
|
| 1007 |
+
}
|
| 1008 |
+
});
|
| 1009 |
+
|
| 1010 |
+
elements.talkBtn.addEventListener('mousedown', startRecording);
|
| 1011 |
+
elements.talkBtn.addEventListener('mouseup', stopRecording);
|
| 1012 |
+
elements.talkBtn.addEventListener('mouseleave', stopRecording);
|
| 1013 |
+
|
| 1014 |
+
// Voice selector listener
|
| 1015 |
+
elements.voiceSelect.addEventListener('change', (e) => {
|
| 1016 |
+
const voice_id = e.target.value;
|
| 1017 |
+
console.log('Voice select changed to:', voice_id);
|
| 1018 |
+
|
| 1019 |
+
// Update current voice display
|
| 1020 |
+
const currentVoiceElement = document.getElementById('currentVoice');
|
| 1021 |
+
if (currentVoiceElement) {
|
| 1022 |
+
currentVoiceElement.textContent = voice_id;
|
| 1023 |
+
}
|
| 1024 |
+
|
| 1025 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 1026 |
+
console.log('Sending set-voice command:', voice_id);
|
| 1027 |
+
ws.send(JSON.stringify({
|
| 1028 |
+
type: 'set-voice',
|
| 1029 |
+
voice_id: voice_id
|
| 1030 |
+
}));
|
| 1031 |
+
log(`🔊 Voz alterada para: ${voice_id} - ${e.target.options[e.target.selectedIndex].text}`, 'info');
|
| 1032 |
+
} else {
|
| 1033 |
+
console.log('WebSocket not connected, cannot send voice change');
|
| 1034 |
+
log(`⚠️ Conecte-se primeiro para mudar a voz`, 'warning');
|
| 1035 |
+
}
|
| 1036 |
+
});
|
| 1037 |
+
elements.talkBtn.addEventListener('touchstart', startRecording);
|
| 1038 |
+
elements.talkBtn.addEventListener('touchend', stopRecording);
|
| 1039 |
+
|
| 1040 |
+
// TTS Voice selector listener
|
| 1041 |
+
elements.ttsVoiceSelect.addEventListener('change', (e) => {
|
| 1042 |
+
const voice_id = e.target.value;
|
| 1043 |
+
|
| 1044 |
+
// Update main voice selector
|
| 1045 |
+
elements.voiceSelect.value = voice_id;
|
| 1046 |
+
|
| 1047 |
+
// Update current voice display
|
| 1048 |
+
const currentVoiceElement = document.getElementById('currentVoice');
|
| 1049 |
+
if (currentVoiceElement) {
|
| 1050 |
+
currentVoiceElement.textContent = voice_id;
|
| 1051 |
+
}
|
| 1052 |
+
|
| 1053 |
+
// Send voice change to server
|
| 1054 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 1055 |
+
ws.send(JSON.stringify({
|
| 1056 |
+
type: 'set-voice',
|
| 1057 |
+
voice_id: voice_id
|
| 1058 |
+
}));
|
| 1059 |
+
log(`🎤 Voz TTS alterada para: ${voice_id}`, 'info');
|
| 1060 |
+
}
|
| 1061 |
+
});
|
| 1062 |
+
|
| 1063 |
+
// TTS Button Event Listener
|
| 1064 |
+
elements.ttsPlayBtn.addEventListener('click', (e) => {
|
| 1065 |
+
e.preventDefault();
|
| 1066 |
+
e.stopPropagation();
|
| 1067 |
+
|
| 1068 |
+
console.log('TTS Button clicked!');
|
| 1069 |
+
const text = elements.ttsText.value.trim();
|
| 1070 |
+
const voice = elements.ttsVoiceSelect.value;
|
| 1071 |
+
|
| 1072 |
+
console.log('TTS Text:', text);
|
| 1073 |
+
console.log('TTS Voice:', voice);
|
| 1074 |
+
|
| 1075 |
+
if (!text) {
|
| 1076 |
+
alert('Por favor, digite algum texto para converter em áudio');
|
| 1077 |
+
return;
|
| 1078 |
+
}
|
| 1079 |
+
|
| 1080 |
+
if (!ws || ws.readyState !== WebSocket.OPEN) {
|
| 1081 |
+
alert('Por favor, conecte-se primeiro clicando em "Conectar"');
|
| 1082 |
+
return;
|
| 1083 |
+
}
|
| 1084 |
+
|
| 1085 |
+
// Mostrar status
|
| 1086 |
+
elements.ttsStatus.style.display = 'block';
|
| 1087 |
+
elements.ttsStatusText.textContent = '⏳ Gerando áudio...';
|
| 1088 |
+
elements.ttsPlayBtn.disabled = true;
|
| 1089 |
+
elements.ttsPlayBtn.querySelector('.mdc-button__label').textContent = 'Processando...';
|
| 1090 |
+
elements.ttsPlayer.style.display = 'none';
|
| 1091 |
+
|
| 1092 |
+
// Sempre usar melhor qualidade (24kHz)
|
| 1093 |
+
const quality = 'high';
|
| 1094 |
+
|
| 1095 |
+
// Enviar request para TTS com qualidade máxima
|
| 1096 |
+
const ttsRequest = {
|
| 1097 |
+
type: 'text-to-speech',
|
| 1098 |
+
text: text,
|
| 1099 |
+
voice_id: voice,
|
| 1100 |
+
quality: quality,
|
| 1101 |
+
format: 'opus' // Opus 24kHz @ 32kbps - máxima qualidade, mínima banda
|
| 1102 |
+
};
|
| 1103 |
+
|
| 1104 |
+
console.log('Sending TTS request:', ttsRequest);
|
| 1105 |
+
ws.send(JSON.stringify(ttsRequest));
|
| 1106 |
+
|
| 1107 |
+
log(`🎤 Solicitando TTS: voz=${voice}, texto="${text.substring(0, 50)}..."`, 'info');
|
| 1108 |
+
});
|
| 1109 |
+
|
| 1110 |
+
// Inicialização
|
| 1111 |
+
log('🚀 Ultravox Chat PCM Otimizado - Material Design', 'info');
|
| 1112 |
+
log('📊 Formato: PCM 16-bit @ 24kHz', 'info');
|
| 1113 |
+
log('⚡ Interface Material Design', 'success');
|
| 1114 |
+
</script>
|
| 1115 |
+
</body>
|
| 1116 |
+
</html>
|
services/webrtc_gateway/ultravox-chat-opus.html
ADDED
|
@@ -0,0 +1,581 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="pt-BR">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>Ultravox Chat - Opus Edition</title>
|
| 7 |
+
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">
|
| 8 |
+
<style>
|
| 9 |
+
body {
|
| 10 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 11 |
+
min-height: 100vh;
|
| 12 |
+
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
|
| 13 |
+
}
|
| 14 |
+
|
| 15 |
+
.container {
|
| 16 |
+
max-width: 1200px;
|
| 17 |
+
margin-top: 30px;
|
| 18 |
+
}
|
| 19 |
+
|
| 20 |
+
.card {
|
| 21 |
+
border: none;
|
| 22 |
+
border-radius: 15px;
|
| 23 |
+
box-shadow: 0 10px 40px rgba(0,0,0,0.1);
|
| 24 |
+
backdrop-filter: blur(10px);
|
| 25 |
+
background: rgba(255, 255, 255, 0.95);
|
| 26 |
+
}
|
| 27 |
+
|
| 28 |
+
.card-header {
|
| 29 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 30 |
+
color: white;
|
| 31 |
+
border-radius: 15px 15px 0 0 !important;
|
| 32 |
+
padding: 20px;
|
| 33 |
+
border: none;
|
| 34 |
+
}
|
| 35 |
+
|
| 36 |
+
.status-indicator {
|
| 37 |
+
display: inline-block;
|
| 38 |
+
width: 10px;
|
| 39 |
+
height: 10px;
|
| 40 |
+
border-radius: 50%;
|
| 41 |
+
background: #dc3545;
|
| 42 |
+
margin-right: 8px;
|
| 43 |
+
animation: pulse 2s infinite;
|
| 44 |
+
}
|
| 45 |
+
|
| 46 |
+
.status-indicator.connected {
|
| 47 |
+
background: #28a745;
|
| 48 |
+
}
|
| 49 |
+
|
| 50 |
+
@keyframes pulse {
|
| 51 |
+
0% { opacity: 1; }
|
| 52 |
+
50% { opacity: 0.5; }
|
| 53 |
+
100% { opacity: 1; }
|
| 54 |
+
}
|
| 55 |
+
|
| 56 |
+
.btn-primary {
|
| 57 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 58 |
+
border: none;
|
| 59 |
+
border-radius: 25px;
|
| 60 |
+
padding: 10px 30px;
|
| 61 |
+
transition: all 0.3s;
|
| 62 |
+
}
|
| 63 |
+
|
| 64 |
+
.btn-primary:hover {
|
| 65 |
+
transform: translateY(-2px);
|
| 66 |
+
box-shadow: 0 5px 20px rgba(0,0,0,0.2);
|
| 67 |
+
}
|
| 68 |
+
|
| 69 |
+
.btn-talk {
|
| 70 |
+
width: 100px;
|
| 71 |
+
height: 100px;
|
| 72 |
+
border-radius: 50%;
|
| 73 |
+
font-size: 24px;
|
| 74 |
+
position: relative;
|
| 75 |
+
transition: all 0.3s;
|
| 76 |
+
background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);
|
| 77 |
+
border: none;
|
| 78 |
+
color: white;
|
| 79 |
+
}
|
| 80 |
+
|
| 81 |
+
.btn-talk:disabled {
|
| 82 |
+
background: #ccc;
|
| 83 |
+
cursor: not-allowed;
|
| 84 |
+
}
|
| 85 |
+
|
| 86 |
+
.btn-talk.recording {
|
| 87 |
+
animation: recording-pulse 1s infinite;
|
| 88 |
+
background: linear-gradient(135deg, #fa709a 0%, #fee140 100%);
|
| 89 |
+
}
|
| 90 |
+
|
| 91 |
+
@keyframes recording-pulse {
|
| 92 |
+
0% { transform: scale(1); }
|
| 93 |
+
50% { transform: scale(1.1); }
|
| 94 |
+
100% { transform: scale(1); }
|
| 95 |
+
}
|
| 96 |
+
|
| 97 |
+
#chatLog {
|
| 98 |
+
height: 400px;
|
| 99 |
+
overflow-y: auto;
|
| 100 |
+
background: #f8f9fa;
|
| 101 |
+
border-radius: 10px;
|
| 102 |
+
padding: 15px;
|
| 103 |
+
font-family: 'Courier New', monospace;
|
| 104 |
+
font-size: 14px;
|
| 105 |
+
}
|
| 106 |
+
|
| 107 |
+
.log-entry {
|
| 108 |
+
margin-bottom: 8px;
|
| 109 |
+
padding: 8px;
|
| 110 |
+
border-radius: 5px;
|
| 111 |
+
animation: fadeIn 0.3s;
|
| 112 |
+
}
|
| 113 |
+
|
| 114 |
+
@keyframes fadeIn {
|
| 115 |
+
from { opacity: 0; transform: translateY(10px); }
|
| 116 |
+
to { opacity: 1; transform: translateY(0); }
|
| 117 |
+
}
|
| 118 |
+
|
| 119 |
+
.log-info { background: #d1ecf1; color: #0c5460; }
|
| 120 |
+
.log-success { background: #d4edda; color: #155724; }
|
| 121 |
+
.log-warning { background: #fff3cd; color: #856404; }
|
| 122 |
+
.log-error { background: #f8d7da; color: #721c24; }
|
| 123 |
+
.log-ai { background: #e7e3ff; color: #4a4a8a; }
|
| 124 |
+
|
| 125 |
+
.metrics-card {
|
| 126 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 127 |
+
color: white;
|
| 128 |
+
border-radius: 10px;
|
| 129 |
+
padding: 15px;
|
| 130 |
+
margin-top: 20px;
|
| 131 |
+
}
|
| 132 |
+
|
| 133 |
+
.metric-item {
|
| 134 |
+
display: flex;
|
| 135 |
+
justify-content: space-between;
|
| 136 |
+
padding: 5px 0;
|
| 137 |
+
border-bottom: 1px solid rgba(255,255,255,0.2);
|
| 138 |
+
}
|
| 139 |
+
|
| 140 |
+
.metric-item:last-child {
|
| 141 |
+
border-bottom: none;
|
| 142 |
+
}
|
| 143 |
+
|
| 144 |
+
.metric-value {
|
| 145 |
+
font-weight: bold;
|
| 146 |
+
}
|
| 147 |
+
|
| 148 |
+
.voice-select {
|
| 149 |
+
margin-top: 10px;
|
| 150 |
+
}
|
| 151 |
+
|
| 152 |
+
#debugLog {
|
| 153 |
+
height: 200px;
|
| 154 |
+
overflow-y: auto;
|
| 155 |
+
background: #2d2d2d;
|
| 156 |
+
color: #00ff00;
|
| 157 |
+
border-radius: 5px;
|
| 158 |
+
padding: 10px;
|
| 159 |
+
font-family: 'Courier New', monospace;
|
| 160 |
+
font-size: 12px;
|
| 161 |
+
margin-top: 20px;
|
| 162 |
+
}
|
| 163 |
+
|
| 164 |
+
.codec-indicator {
|
| 165 |
+
display: inline-block;
|
| 166 |
+
padding: 2px 8px;
|
| 167 |
+
border-radius: 12px;
|
| 168 |
+
background: #28a745;
|
| 169 |
+
color: white;
|
| 170 |
+
font-size: 12px;
|
| 171 |
+
margin-left: 10px;
|
| 172 |
+
}
|
| 173 |
+
</style>
|
| 174 |
+
</head>
|
| 175 |
+
<body>
|
| 176 |
+
<div class="container">
|
| 177 |
+
<div class="row">
|
| 178 |
+
<div class="col-lg-8">
|
| 179 |
+
<div class="card">
|
| 180 |
+
<div class="card-header">
|
| 181 |
+
<h4 class="mb-0">
|
| 182 |
+
🎙️ Ultravox Chat - WebRTC Pipeline
|
| 183 |
+
<span class="codec-indicator">OPUS</span>
|
| 184 |
+
</h4>
|
| 185 |
+
<small>Gravação e envio em Opus codec (compressão eficiente)</small>
|
| 186 |
+
</div>
|
| 187 |
+
<div class="card-body">
|
| 188 |
+
<div class="d-flex justify-content-between align-items-center mb-3">
|
| 189 |
+
<div>
|
| 190 |
+
<span class="status-indicator" id="statusDot"></span>
|
| 191 |
+
<span id="statusText">Desconectado</span>
|
| 192 |
+
</div>
|
| 193 |
+
<button class="btn btn-primary" id="connectBtn">Conectar</button>
|
| 194 |
+
</div>
|
| 195 |
+
|
| 196 |
+
<div class="text-center my-4">
|
| 197 |
+
<button class="btn btn-talk" id="talkBtn" disabled>
|
| 198 |
+
🎤
|
| 199 |
+
</button>
|
| 200 |
+
<div class="mt-2 text-muted">Segure para falar</div>
|
| 201 |
+
</div>
|
| 202 |
+
|
| 203 |
+
<div class="voice-select">
|
| 204 |
+
<label for="voiceSelect" class="form-label">🔊 Voz TTS:</label>
|
| 205 |
+
<select class="form-select" id="voiceSelect">
|
| 206 |
+
<optgroup label="🇧🇷 Português Brasileiro">
|
| 207 |
+
<option value="pf_dora" selected>[pf_dora] Feminino - Dora</option>
|
| 208 |
+
<option value="pm_alex">[pm_alex] Masculino - Alex</option>
|
| 209 |
+
</optgroup>
|
| 210 |
+
</select>
|
| 211 |
+
</div>
|
| 212 |
+
|
| 213 |
+
<div class="mt-3">
|
| 214 |
+
<label for="chatLog" class="form-label">📝 Log de Conversação:</label>
|
| 215 |
+
<div id="chatLog"></div>
|
| 216 |
+
</div>
|
| 217 |
+
</div>
|
| 218 |
+
</div>
|
| 219 |
+
</div>
|
| 220 |
+
|
| 221 |
+
<div class="col-lg-4">
|
| 222 |
+
<div class="metrics-card">
|
| 223 |
+
<h5 class="mb-3">📊 Métricas</h5>
|
| 224 |
+
<div class="metric-item">
|
| 225 |
+
<span>Codec:</span>
|
| 226 |
+
<span class="metric-value" id="codecType">Opus</span>
|
| 227 |
+
</div>
|
| 228 |
+
<div class="metric-item">
|
| 229 |
+
<span>Bitrate:</span>
|
| 230 |
+
<span class="metric-value" id="bitrate">32 kbps</span>
|
| 231 |
+
</div>
|
| 232 |
+
<div class="metric-item">
|
| 233 |
+
<span>Taxa de Compressão:</span>
|
| 234 |
+
<span class="metric-value" id="compressionRate">-</span>
|
| 235 |
+
</div>
|
| 236 |
+
<div class="metric-item">
|
| 237 |
+
<span>Latência Total:</span>
|
| 238 |
+
<span class="metric-value" id="totalLatency">-</span>
|
| 239 |
+
</div>
|
| 240 |
+
<div class="metric-item">
|
| 241 |
+
<span>Tempo de Gravação:</span>
|
| 242 |
+
<span class="metric-value" id="recordingTime">-</span>
|
| 243 |
+
</div>
|
| 244 |
+
<div class="metric-item">
|
| 245 |
+
<span>Taxa de Áudio:</span>
|
| 246 |
+
<span class="metric-value" id="audioRate">48 kHz</span>
|
| 247 |
+
</div>
|
| 248 |
+
<div class="metric-item">
|
| 249 |
+
<span>Tamanho do Áudio:</span>
|
| 250 |
+
<span class="metric-value" id="audioSize">-</span>
|
| 251 |
+
</div>
|
| 252 |
+
</div>
|
| 253 |
+
|
| 254 |
+
<div class="card mt-3">
|
| 255 |
+
<div class="card-body">
|
| 256 |
+
<h6>🐛 Debug Log</h6>
|
| 257 |
+
<div id="debugLog"></div>
|
| 258 |
+
</div>
|
| 259 |
+
</div>
|
| 260 |
+
</div>
|
| 261 |
+
</div>
|
| 262 |
+
</div>
|
| 263 |
+
|
| 264 |
+
<script>
|
| 265 |
+
// Elementos do DOM
|
| 266 |
+
const elements = {
|
| 267 |
+
connectBtn: document.getElementById('connectBtn'),
|
| 268 |
+
talkBtn: document.getElementById('talkBtn'),
|
| 269 |
+
statusDot: document.getElementById('statusDot'),
|
| 270 |
+
statusText: document.getElementById('statusText'),
|
| 271 |
+
chatLog: document.getElementById('chatLog'),
|
| 272 |
+
debugLog: document.getElementById('debugLog'),
|
| 273 |
+
voiceSelect: document.getElementById('voiceSelect'),
|
| 274 |
+
// Métricas
|
| 275 |
+
codecType: document.getElementById('codecType'),
|
| 276 |
+
bitrate: document.getElementById('bitrate'),
|
| 277 |
+
compressionRate: document.getElementById('compressionRate'),
|
| 278 |
+
totalLatency: document.getElementById('totalLatency'),
|
| 279 |
+
recordingTime: document.getElementById('recordingTime'),
|
| 280 |
+
audioRate: document.getElementById('audioRate'),
|
| 281 |
+
audioSize: document.getElementById('audioSize')
|
| 282 |
+
};
|
| 283 |
+
|
| 284 |
+
// Estado da aplicação
|
| 285 |
+
let ws = null;
|
| 286 |
+
let isConnected = false;
|
| 287 |
+
let isRecording = false;
|
| 288 |
+
let stream = null;
|
| 289 |
+
let mediaRecorder = null;
|
| 290 |
+
let audioChunks = [];
|
| 291 |
+
let sessionId = null;
|
| 292 |
+
|
| 293 |
+
// Métricas
|
| 294 |
+
let metrics = {
|
| 295 |
+
recordingStartTime: 0,
|
| 296 |
+
recordingEndTime: 0,
|
| 297 |
+
audioBytesSent: 0,
|
| 298 |
+
pcmBytesOriginal: 0
|
| 299 |
+
};
|
| 300 |
+
|
| 301 |
+
// Função de log
|
| 302 |
+
function log(message, type = 'info') {
|
| 303 |
+
const timestamp = new Date().toLocaleTimeString();
|
| 304 |
+
const entry = document.createElement('div');
|
| 305 |
+
entry.className = `log-entry log-${type}`;
|
| 306 |
+
entry.textContent = `[${timestamp}] ${message}`;
|
| 307 |
+
elements.chatLog.appendChild(entry);
|
| 308 |
+
elements.chatLog.scrollTop = elements.chatLog.scrollHeight;
|
| 309 |
+
}
|
| 310 |
+
|
| 311 |
+
// Debug log
|
| 312 |
+
function debug(message) {
|
| 313 |
+
const timestamp = new Date().toLocaleTimeString();
|
| 314 |
+
const entry = `[${timestamp}] ${message}\n`;
|
| 315 |
+
elements.debugLog.textContent += entry;
|
| 316 |
+
elements.debugLog.scrollTop = elements.debugLog.scrollHeight;
|
| 317 |
+
}
|
| 318 |
+
|
| 319 |
+
// Gerar ID de sessão único
|
| 320 |
+
function generateSessionId() {
|
| 321 |
+
return Math.random().toString(36).substring(2) + Date.now().toString(36);
|
| 322 |
+
}
|
| 323 |
+
|
| 324 |
+
// Conectar ao WebSocket
|
| 325 |
+
async function connect() {
|
| 326 |
+
if (isConnected) {
|
| 327 |
+
disconnect();
|
| 328 |
+
return;
|
| 329 |
+
}
|
| 330 |
+
|
| 331 |
+
try {
|
| 332 |
+
// Solicitar permissão de microfone
|
| 333 |
+
stream = await navigator.mediaDevices.getUserMedia({
|
| 334 |
+
audio: {
|
| 335 |
+
echoCancellation: true,
|
| 336 |
+
noiseSuppression: true,
|
| 337 |
+
autoGainControl: true,
|
| 338 |
+
sampleRate: 48000
|
| 339 |
+
}
|
| 340 |
+
});
|
| 341 |
+
|
| 342 |
+
// Conectar WebSocket
|
| 343 |
+
const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
|
| 344 |
+
const wsUrl = `${protocol}//${window.location.hostname}:8082/ws`;
|
| 345 |
+
|
| 346 |
+
ws = new WebSocket(wsUrl);
|
| 347 |
+
|
| 348 |
+
ws.onopen = () => {
|
| 349 |
+
isConnected = true;
|
| 350 |
+
sessionId = generateSessionId();
|
| 351 |
+
elements.statusDot.classList.add('connected');
|
| 352 |
+
elements.statusText.textContent = 'Conectado';
|
| 353 |
+
elements.connectBtn.textContent = 'Desconectar';
|
| 354 |
+
elements.connectBtn.classList.remove('btn-primary');
|
| 355 |
+
elements.connectBtn.classList.add('btn-danger');
|
| 356 |
+
elements.talkBtn.disabled = false;
|
| 357 |
+
log('✅ Conectado ao servidor (Opus mode)', 'success');
|
| 358 |
+
debug('WebSocket conectado com suporte a Opus');
|
| 359 |
+
};
|
| 360 |
+
|
| 361 |
+
ws.onmessage = (event) => {
|
| 362 |
+
const data = JSON.parse(event.data);
|
| 363 |
+
|
| 364 |
+
if (data.type === 'transcription') {
|
| 365 |
+
log(`👂 Você: ${data.text}`, 'info');
|
| 366 |
+
} else if (data.type === 'response') {
|
| 367 |
+
log(`🤖 AI: ${data.text}`, 'ai');
|
| 368 |
+
const latency = Date.now() - metrics.recordingEndTime;
|
| 369 |
+
elements.totalLatency.textContent = `${latency}ms`;
|
| 370 |
+
} else if (data.type === 'audio') {
|
| 371 |
+
playAudio(data.audio);
|
| 372 |
+
} else if (data.type === 'error') {
|
| 373 |
+
log(`❌ Erro: ${data.message}`, 'error');
|
| 374 |
+
}
|
| 375 |
+
};
|
| 376 |
+
|
| 377 |
+
ws.onerror = (error) => {
|
| 378 |
+
log(`❌ Erro de conexão: ${error}`, 'error');
|
| 379 |
+
debug(`WebSocket error: ${error}`);
|
| 380 |
+
};
|
| 381 |
+
|
| 382 |
+
ws.onclose = () => {
|
| 383 |
+
if (isConnected) {
|
| 384 |
+
log('⚠️ Conexão perdida', 'warning');
|
| 385 |
+
disconnect();
|
| 386 |
+
}
|
| 387 |
+
};
|
| 388 |
+
|
| 389 |
+
} catch (error) {
|
| 390 |
+
log(`❌ Erro ao conectar: ${error.message}`, 'error');
|
| 391 |
+
debug(`Connection error: ${error.message}`);
|
| 392 |
+
}
|
| 393 |
+
}
|
| 394 |
+
|
| 395 |
+
// Desconectar
|
| 396 |
+
function disconnect() {
|
| 397 |
+
isConnected = false;
|
| 398 |
+
|
| 399 |
+
if (ws) {
|
| 400 |
+
ws.close();
|
| 401 |
+
ws = null;
|
| 402 |
+
}
|
| 403 |
+
|
| 404 |
+
if (stream) {
|
| 405 |
+
stream.getTracks().forEach(track => track.stop());
|
| 406 |
+
stream = null;
|
| 407 |
+
}
|
| 408 |
+
|
| 409 |
+
elements.statusDot.classList.remove('connected');
|
| 410 |
+
elements.statusText.textContent = 'Desconectado';
|
| 411 |
+
elements.connectBtn.textContent = 'Conectar';
|
| 412 |
+
elements.connectBtn.classList.remove('btn-danger');
|
| 413 |
+
elements.connectBtn.classList.add('btn-primary');
|
| 414 |
+
elements.talkBtn.disabled = true;
|
| 415 |
+
|
| 416 |
+
log('👋 Desconectado', 'warning');
|
| 417 |
+
}
|
| 418 |
+
|
| 419 |
+
// Iniciar gravação com MediaRecorder (Opus)
|
| 420 |
+
function startRecording() {
|
| 421 |
+
if (isRecording) return;
|
| 422 |
+
|
| 423 |
+
isRecording = true;
|
| 424 |
+
audioChunks = [];
|
| 425 |
+
metrics.recordingStartTime = Date.now();
|
| 426 |
+
metrics.audioBytesSent = 0;
|
| 427 |
+
metrics.pcmBytesOriginal = 0;
|
| 428 |
+
|
| 429 |
+
elements.talkBtn.classList.add('recording');
|
| 430 |
+
elements.talkBtn.textContent = '⏺️';
|
| 431 |
+
|
| 432 |
+
// Configurar MediaRecorder para Opus
|
| 433 |
+
const mimeType = 'audio/webm;codecs=opus';
|
| 434 |
+
|
| 435 |
+
if (!MediaRecorder.isTypeSupported(mimeType)) {
|
| 436 |
+
log('⚠️ Opus não suportado, usando codec padrão', 'warning');
|
| 437 |
+
debug('Opus codec not supported, falling back');
|
| 438 |
+
}
|
| 439 |
+
|
| 440 |
+
const options = {
|
| 441 |
+
mimeType: MediaRecorder.isTypeSupported(mimeType) ? mimeType : 'audio/webm',
|
| 442 |
+
audioBitsPerSecond: 32000 // 32 kbps para Opus
|
| 443 |
+
};
|
| 444 |
+
|
| 445 |
+
mediaRecorder = new MediaRecorder(stream, options);
|
| 446 |
+
|
| 447 |
+
debug(`MediaRecorder iniciado: ${mediaRecorder.mimeType}`);
|
| 448 |
+
log(`🎤 Gravando com ${mediaRecorder.mimeType}`, 'info');
|
| 449 |
+
|
| 450 |
+
// Coletar chunks de áudio
|
| 451 |
+
mediaRecorder.ondataavailable = (event) => {
|
| 452 |
+
if (event.data.size > 0) {
|
| 453 |
+
audioChunks.push(event.data);
|
| 454 |
+
metrics.audioBytesSent += event.data.size;
|
| 455 |
+
|
| 456 |
+
// Estimar tamanho original (PCM 16-bit @ 48kHz)
|
| 457 |
+
const duration = (Date.now() - metrics.recordingStartTime) / 1000;
|
| 458 |
+
metrics.pcmBytesOriginal = duration * 48000 * 2; // 2 bytes per sample
|
| 459 |
+
|
| 460 |
+
updateMetrics();
|
| 461 |
+
}
|
| 462 |
+
};
|
| 463 |
+
|
| 464 |
+
// Enviar áudio quando parar
|
| 465 |
+
mediaRecorder.onstop = async () => {
|
| 466 |
+
const audioBlob = new Blob(audioChunks, { type: mediaRecorder.mimeType });
|
| 467 |
+
await sendAudioToServer(audioBlob);
|
| 468 |
+
};
|
| 469 |
+
|
| 470 |
+
// Iniciar gravação com timeslice de 100ms para streaming
|
| 471 |
+
mediaRecorder.start(100);
|
| 472 |
+
|
| 473 |
+
elements.codecType.textContent = mediaRecorder.mimeType.includes('opus') ? 'Opus' : 'WebM';
|
| 474 |
+
}
|
| 475 |
+
|
| 476 |
+
// Parar gravação
|
| 477 |
+
function stopRecording() {
|
| 478 |
+
if (!isRecording) return;
|
| 479 |
+
|
| 480 |
+
isRecording = false;
|
| 481 |
+
metrics.recordingEndTime = Date.now();
|
| 482 |
+
elements.talkBtn.classList.remove('recording');
|
| 483 |
+
elements.talkBtn.textContent = '🎤';
|
| 484 |
+
|
| 485 |
+
if (mediaRecorder && mediaRecorder.state !== 'inactive') {
|
| 486 |
+
mediaRecorder.stop();
|
| 487 |
+
}
|
| 488 |
+
|
| 489 |
+
const duration = ((metrics.recordingEndTime - metrics.recordingStartTime) / 1000).toFixed(1);
|
| 490 |
+
elements.recordingTime.textContent = `${duration}s`;
|
| 491 |
+
|
| 492 |
+
log(`⏹️ Gravação finalizada (${duration}s)`, 'info');
|
| 493 |
+
debug(`Recording stopped: ${duration}s, ${metrics.audioBytesSent} bytes`);
|
| 494 |
+
}
|
| 495 |
+
|
| 496 |
+
// Enviar áudio para o servidor
|
| 497 |
+
async function sendAudioToServer(audioBlob) {
|
| 498 |
+
if (!ws || ws.readyState !== WebSocket.OPEN) {
|
| 499 |
+
log('❌ WebSocket não conectado', 'error');
|
| 500 |
+
return;
|
| 501 |
+
}
|
| 502 |
+
|
| 503 |
+
try {
|
| 504 |
+
// Converter blob para base64
|
| 505 |
+
const reader = new FileReader();
|
| 506 |
+
reader.onloadend = () => {
|
| 507 |
+
const base64Audio = reader.result.split(',')[1];
|
| 508 |
+
|
| 509 |
+
// Enviar via WebSocket
|
| 510 |
+
ws.send(JSON.stringify({
|
| 511 |
+
type: 'audio',
|
| 512 |
+
sessionId: sessionId,
|
| 513 |
+
audio: base64Audio,
|
| 514 |
+
format: 'opus',
|
| 515 |
+
mimeType: audioBlob.type,
|
| 516 |
+
voice: elements.voiceSelect.value,
|
| 517 |
+
sampleRate: 48000
|
| 518 |
+
}));
|
| 519 |
+
|
| 520 |
+
log(`📤 Áudio enviado: ${(audioBlob.size / 1024).toFixed(1)}KB (Opus)`, 'success');
|
| 521 |
+
debug(`Audio sent: ${audioBlob.size} bytes, type: ${audioBlob.type}`);
|
| 522 |
+
|
| 523 |
+
elements.audioSize.textContent = `${(audioBlob.size / 1024).toFixed(1)}KB`;
|
| 524 |
+
};
|
| 525 |
+
|
| 526 |
+
reader.readAsDataURL(audioBlob);
|
| 527 |
+
|
| 528 |
+
} catch (error) {
|
| 529 |
+
log(`❌ Erro ao enviar áudio: ${error.message}`, 'error');
|
| 530 |
+
debug(`Send error: ${error.message}`);
|
| 531 |
+
}
|
| 532 |
+
}
|
| 533 |
+
|
| 534 |
+
// Atualizar métricas
|
| 535 |
+
function updateMetrics() {
|
| 536 |
+
if (metrics.pcmBytesOriginal > 0 && metrics.audioBytesSent > 0) {
|
| 537 |
+
const compressionRate = (metrics.pcmBytesOriginal / metrics.audioBytesSent).toFixed(1);
|
| 538 |
+
elements.compressionRate.textContent = `${compressionRate}:1`;
|
| 539 |
+
|
| 540 |
+
const bitrate = (metrics.audioBytesSent * 8 / ((Date.now() - metrics.recordingStartTime) / 1000) / 1000).toFixed(0);
|
| 541 |
+
elements.bitrate.textContent = `${bitrate} kbps`;
|
| 542 |
+
}
|
| 543 |
+
}
|
| 544 |
+
|
| 545 |
+
// Reproduzir áudio recebido
|
| 546 |
+
function playAudio(base64Audio) {
|
| 547 |
+
try {
|
| 548 |
+
const audio = new Audio(`data:audio/wav;base64,${base64Audio}`);
|
| 549 |
+
audio.play();
|
| 550 |
+
debug('Playing TTS audio response');
|
| 551 |
+
} catch (error) {
|
| 552 |
+
log(`❌ Erro ao reproduzir áudio: ${error.message}`, 'error');
|
| 553 |
+
}
|
| 554 |
+
}
|
| 555 |
+
|
| 556 |
+
// Event Listeners
|
| 557 |
+
elements.connectBtn.addEventListener('click', connect);
|
| 558 |
+
|
| 559 |
+
// Push-to-talk
|
| 560 |
+
elements.talkBtn.addEventListener('mousedown', startRecording);
|
| 561 |
+
elements.talkBtn.addEventListener('mouseup', stopRecording);
|
| 562 |
+
elements.talkBtn.addEventListener('mouseleave', stopRecording);
|
| 563 |
+
|
| 564 |
+
// Touch events para mobile
|
| 565 |
+
elements.talkBtn.addEventListener('touchstart', (e) => {
|
| 566 |
+
e.preventDefault();
|
| 567 |
+
startRecording();
|
| 568 |
+
});
|
| 569 |
+
|
| 570 |
+
elements.talkBtn.addEventListener('touchend', (e) => {
|
| 571 |
+
e.preventDefault();
|
| 572 |
+
stopRecording();
|
| 573 |
+
});
|
| 574 |
+
|
| 575 |
+
// Inicialização
|
| 576 |
+
log('🎯 Ultravox Chat (Opus Edition) pronto!', 'info');
|
| 577 |
+
debug('Sistema inicializado com suporte a gravação Opus');
|
| 578 |
+
debug('Codec preferencial: audio/webm;codecs=opus @ 32kbps');
|
| 579 |
+
</script>
|
| 580 |
+
</body>
|
| 581 |
+
</html>
|
services/webrtc_gateway/ultravox-chat-original.html
ADDED
|
@@ -0,0 +1,964 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="pt-BR">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>Ultravox Chat PCM - Otimizado</title>
|
| 7 |
+
<script src="opus-decoder.js"></script>
|
| 8 |
+
<style>
|
| 9 |
+
* {
|
| 10 |
+
margin: 0;
|
| 11 |
+
padding: 0;
|
| 12 |
+
box-sizing: border-box;
|
| 13 |
+
}
|
| 14 |
+
|
| 15 |
+
body {
|
| 16 |
+
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, sans-serif;
|
| 17 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 18 |
+
min-height: 100vh;
|
| 19 |
+
display: flex;
|
| 20 |
+
justify-content: center;
|
| 21 |
+
align-items: center;
|
| 22 |
+
padding: 20px;
|
| 23 |
+
}
|
| 24 |
+
|
| 25 |
+
.container {
|
| 26 |
+
background: white;
|
| 27 |
+
border-radius: 20px;
|
| 28 |
+
box-shadow: 0 20px 60px rgba(0,0,0,0.3);
|
| 29 |
+
padding: 40px;
|
| 30 |
+
max-width: 600px;
|
| 31 |
+
width: 100%;
|
| 32 |
+
}
|
| 33 |
+
|
| 34 |
+
h1 {
|
| 35 |
+
text-align: center;
|
| 36 |
+
color: #333;
|
| 37 |
+
margin-bottom: 30px;
|
| 38 |
+
font-size: 28px;
|
| 39 |
+
}
|
| 40 |
+
|
| 41 |
+
.status {
|
| 42 |
+
background: #f8f9fa;
|
| 43 |
+
border-radius: 10px;
|
| 44 |
+
padding: 15px;
|
| 45 |
+
margin-bottom: 20px;
|
| 46 |
+
display: flex;
|
| 47 |
+
align-items: center;
|
| 48 |
+
justify-content: space-between;
|
| 49 |
+
}
|
| 50 |
+
|
| 51 |
+
.status-dot {
|
| 52 |
+
width: 12px;
|
| 53 |
+
height: 12px;
|
| 54 |
+
border-radius: 50%;
|
| 55 |
+
background: #dc3545;
|
| 56 |
+
margin-right: 10px;
|
| 57 |
+
display: inline-block;
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
.status-dot.connected {
|
| 61 |
+
background: #28a745;
|
| 62 |
+
animation: pulse 2s infinite;
|
| 63 |
+
}
|
| 64 |
+
|
| 65 |
+
@keyframes pulse {
|
| 66 |
+
0% { box-shadow: 0 0 0 0 rgba(40, 167, 69, 0.7); }
|
| 67 |
+
70% { box-shadow: 0 0 0 10px rgba(40, 167, 69, 0); }
|
| 68 |
+
100% { box-shadow: 0 0 0 0 rgba(40, 167, 69, 0); }
|
| 69 |
+
}
|
| 70 |
+
|
| 71 |
+
.controls {
|
| 72 |
+
display: flex;
|
| 73 |
+
gap: 10px;
|
| 74 |
+
margin-bottom: 20px;
|
| 75 |
+
}
|
| 76 |
+
|
| 77 |
+
.voice-selector {
|
| 78 |
+
display: flex;
|
| 79 |
+
align-items: center;
|
| 80 |
+
gap: 10px;
|
| 81 |
+
margin-bottom: 20px;
|
| 82 |
+
padding: 10px;
|
| 83 |
+
background: #f8f9fa;
|
| 84 |
+
border-radius: 10px;
|
| 85 |
+
}
|
| 86 |
+
|
| 87 |
+
.voice-selector label {
|
| 88 |
+
font-weight: 600;
|
| 89 |
+
color: #555;
|
| 90 |
+
}
|
| 91 |
+
|
| 92 |
+
.voice-selector select {
|
| 93 |
+
flex: 1;
|
| 94 |
+
padding: 8px;
|
| 95 |
+
border: 2px solid #ddd;
|
| 96 |
+
border-radius: 5px;
|
| 97 |
+
font-size: 14px;
|
| 98 |
+
background: white;
|
| 99 |
+
cursor: pointer;
|
| 100 |
+
}
|
| 101 |
+
|
| 102 |
+
.voice-selector select:focus {
|
| 103 |
+
outline: none;
|
| 104 |
+
border-color: #667eea;
|
| 105 |
+
}
|
| 106 |
+
|
| 107 |
+
button {
|
| 108 |
+
flex: 1;
|
| 109 |
+
padding: 15px;
|
| 110 |
+
border: none;
|
| 111 |
+
border-radius: 10px;
|
| 112 |
+
font-size: 16px;
|
| 113 |
+
font-weight: 600;
|
| 114 |
+
cursor: pointer;
|
| 115 |
+
transition: all 0.3s ease;
|
| 116 |
+
}
|
| 117 |
+
|
| 118 |
+
button:disabled {
|
| 119 |
+
opacity: 0.5;
|
| 120 |
+
cursor: not-allowed;
|
| 121 |
+
}
|
| 122 |
+
|
| 123 |
+
.btn-primary {
|
| 124 |
+
background: #007bff;
|
| 125 |
+
color: white;
|
| 126 |
+
}
|
| 127 |
+
|
| 128 |
+
.btn-primary:hover:not(:disabled) {
|
| 129 |
+
background: #0056b3;
|
| 130 |
+
transform: translateY(-2px);
|
| 131 |
+
box-shadow: 0 5px 15px rgba(0,123,255,0.3);
|
| 132 |
+
}
|
| 133 |
+
|
| 134 |
+
.btn-danger {
|
| 135 |
+
background: #dc3545;
|
| 136 |
+
color: white;
|
| 137 |
+
}
|
| 138 |
+
|
| 139 |
+
.btn-danger:hover:not(:disabled) {
|
| 140 |
+
background: #c82333;
|
| 141 |
+
}
|
| 142 |
+
|
| 143 |
+
.btn-success {
|
| 144 |
+
background: #28a745;
|
| 145 |
+
color: white;
|
| 146 |
+
}
|
| 147 |
+
|
| 148 |
+
.btn-success.recording {
|
| 149 |
+
background: #dc3545;
|
| 150 |
+
animation: recordPulse 1s infinite;
|
| 151 |
+
}
|
| 152 |
+
|
| 153 |
+
@keyframes recordPulse {
|
| 154 |
+
0%, 100% { opacity: 1; }
|
| 155 |
+
50% { opacity: 0.7; }
|
| 156 |
+
}
|
| 157 |
+
|
| 158 |
+
.metrics {
|
| 159 |
+
display: grid;
|
| 160 |
+
grid-template-columns: repeat(3, 1fr);
|
| 161 |
+
gap: 15px;
|
| 162 |
+
margin-bottom: 20px;
|
| 163 |
+
}
|
| 164 |
+
|
| 165 |
+
.metric {
|
| 166 |
+
background: #f8f9fa;
|
| 167 |
+
padding: 15px;
|
| 168 |
+
border-radius: 10px;
|
| 169 |
+
text-align: center;
|
| 170 |
+
}
|
| 171 |
+
|
| 172 |
+
.metric-label {
|
| 173 |
+
font-size: 12px;
|
| 174 |
+
color: #6c757d;
|
| 175 |
+
margin-bottom: 5px;
|
| 176 |
+
}
|
| 177 |
+
|
| 178 |
+
.metric-value {
|
| 179 |
+
font-size: 24px;
|
| 180 |
+
font-weight: bold;
|
| 181 |
+
color: #333;
|
| 182 |
+
}
|
| 183 |
+
|
| 184 |
+
.log {
|
| 185 |
+
background: #f8f9fa;
|
| 186 |
+
border-radius: 10px;
|
| 187 |
+
padding: 20px;
|
| 188 |
+
height: 300px;
|
| 189 |
+
overflow-y: auto;
|
| 190 |
+
font-family: 'Monaco', 'Menlo', monospace;
|
| 191 |
+
font-size: 12px;
|
| 192 |
+
}
|
| 193 |
+
|
| 194 |
+
.log-entry {
|
| 195 |
+
padding: 5px 0;
|
| 196 |
+
border-bottom: 1px solid #e9ecef;
|
| 197 |
+
display: flex;
|
| 198 |
+
align-items: flex-start;
|
| 199 |
+
}
|
| 200 |
+
|
| 201 |
+
.log-time {
|
| 202 |
+
color: #6c757d;
|
| 203 |
+
margin-right: 10px;
|
| 204 |
+
flex-shrink: 0;
|
| 205 |
+
}
|
| 206 |
+
|
| 207 |
+
.log-message {
|
| 208 |
+
flex: 1;
|
| 209 |
+
}
|
| 210 |
+
|
| 211 |
+
.log-entry.error { color: #dc3545; }
|
| 212 |
+
.log-entry.success { color: #28a745; }
|
| 213 |
+
.log-entry.info { color: #007bff; }
|
| 214 |
+
.log-entry.warning { color: #ffc107; }
|
| 215 |
+
|
| 216 |
+
.audio-player {
|
| 217 |
+
display: inline-flex;
|
| 218 |
+
align-items: center;
|
| 219 |
+
gap: 10px;
|
| 220 |
+
margin-left: 10px;
|
| 221 |
+
}
|
| 222 |
+
|
| 223 |
+
.play-btn {
|
| 224 |
+
background: #007bff;
|
| 225 |
+
color: white;
|
| 226 |
+
border: none;
|
| 227 |
+
border-radius: 5px;
|
| 228 |
+
padding: 5px 10px;
|
| 229 |
+
cursor: pointer;
|
| 230 |
+
font-size: 12px;
|
| 231 |
+
}
|
| 232 |
+
|
| 233 |
+
.play-btn:hover {
|
| 234 |
+
background: #0056b3;
|
| 235 |
+
}
|
| 236 |
+
</style>
|
| 237 |
+
</head>
|
| 238 |
+
<body>
|
| 239 |
+
<div class="container">
|
| 240 |
+
<h1>🚀 Ultravox PCM - Otimizado</h1>
|
| 241 |
+
|
| 242 |
+
<div class="status">
|
| 243 |
+
<div>
|
| 244 |
+
<span class="status-dot" id="statusDot"></span>
|
| 245 |
+
<span id="statusText">Desconectado</span>
|
| 246 |
+
</div>
|
| 247 |
+
<span id="latencyText">Latência: --ms</span>
|
| 248 |
+
</div>
|
| 249 |
+
|
| 250 |
+
<div class="voice-selector">
|
| 251 |
+
<label for="voiceSelect">🔊 Voz TTS:</label>
|
| 252 |
+
<select id="voiceSelect">
|
| 253 |
+
<option value="pf_dora" selected>🇧🇷 [pf_dora] Português Feminino (Dora)</option>
|
| 254 |
+
<option value="pm_alex">🇧🇷 [pm_alex] Português Masculino (Alex)</option>
|
| 255 |
+
<option value="af_heart">🌍 [af_heart] Alternativa Feminina (Heart)</option>
|
| 256 |
+
<option value="af_bella">🌍 [af_bella] Alternativa Feminina (Bella)</option>
|
| 257 |
+
</select>
|
| 258 |
+
</div>
|
| 259 |
+
|
| 260 |
+
<div class="controls">
|
| 261 |
+
<button id="connectBtn" class="btn-primary">Conectar</button>
|
| 262 |
+
<button id="talkBtn" class="btn-success" disabled>Push to Talk</button>
|
| 263 |
+
</div>
|
| 264 |
+
|
| 265 |
+
<div class="metrics">
|
| 266 |
+
<div class="metric">
|
| 267 |
+
<div class="metric-label">Enviado</div>
|
| 268 |
+
<div class="metric-value" id="sentBytes">0 KB</div>
|
| 269 |
+
</div>
|
| 270 |
+
<div class="metric">
|
| 271 |
+
<div class="metric-label">Recebido</div>
|
| 272 |
+
<div class="metric-value" id="receivedBytes">0 KB</div>
|
| 273 |
+
</div>
|
| 274 |
+
<div class="metric">
|
| 275 |
+
<div class="metric-label">Formato</div>
|
| 276 |
+
<div class="metric-value" id="format">PCM</div>
|
| 277 |
+
</div>
|
| 278 |
+
<div class="metric">
|
| 279 |
+
<div class="metric-label">🎤 Voz</div>
|
| 280 |
+
<div class="metric-value" id="currentVoice" style="font-family: monospace; color: #4CAF50; font-weight: bold;">pf_dora</div>
|
| 281 |
+
</div>
|
| 282 |
+
</div>
|
| 283 |
+
|
| 284 |
+
<div class="log" id="log"></div>
|
| 285 |
+
</div>
|
| 286 |
+
|
| 287 |
+
<!-- Seção TTS Direto -->
|
| 288 |
+
<div class="container" style="margin-top: 20px;">
|
| 289 |
+
<h2>🎵 Text-to-Speech Direto</h2>
|
| 290 |
+
<p>Digite ou edite o texto abaixo e escolha uma voz para converter em áudio</p>
|
| 291 |
+
|
| 292 |
+
<div class="section">
|
| 293 |
+
<textarea id="ttsText" style="width: 100%; height: 120px; padding: 10px; border: 1px solid #333; border-radius: 8px; background: #1e1e1e; color: #e0e0e0; font-family: 'Segoe UI', system-ui, sans-serif; font-size: 14px; resize: vertical;">Olá! Teste de voz.</textarea>
|
| 294 |
+
</div>
|
| 295 |
+
|
| 296 |
+
<div class="section" style="display: flex; gap: 10px; align-items: center; margin-top: 15px;">
|
| 297 |
+
<label for="ttsVoiceSelect" style="font-weight: 600;">🔊 Voz:</label>
|
| 298 |
+
<select id="ttsVoiceSelect" style="flex: 1; padding: 8px; border: 1px solid #333; border-radius: 5px; background: #2a2a2a; color: #e0e0e0;">
|
| 299 |
+
<optgroup label="🇧🇷 Português">
|
| 300 |
+
<option value="pf_dora" selected>[pf_dora] Feminino - Dora</option>
|
| 301 |
+
<option value="pm_alex">[pm_alex] Masculino - Alex</option>
|
| 302 |
+
<option value="pm_santa">[pm_santa] Masculino - Santa (Festivo)</option>
|
| 303 |
+
</optgroup>
|
| 304 |
+
<optgroup label="🇫🇷 Francês">
|
| 305 |
+
<option value="ff_siwis">[ff_siwis] Feminino - Siwis (Nativa)</option>
|
| 306 |
+
</optgroup>
|
| 307 |
+
<optgroup label="🇺🇸 Inglês Americano">
|
| 308 |
+
<option value="af_alloy">Feminino - Alloy</option>
|
| 309 |
+
<option value="af_aoede">Feminino - Aoede</option>
|
| 310 |
+
<option value="af_bella">Feminino - Bella</option>
|
| 311 |
+
<option value="af_heart">Feminino - Heart</option>
|
| 312 |
+
<option value="af_jessica">Feminino - Jessica</option>
|
| 313 |
+
<option value="af_kore">Feminino - Kore</option>
|
| 314 |
+
<option value="af_nicole">Feminino - Nicole</option>
|
| 315 |
+
<option value="af_nova">Feminino - Nova</option>
|
| 316 |
+
<option value="af_river">Feminino - River</option>
|
| 317 |
+
<option value="af_sarah">Feminino - Sarah</option>
|
| 318 |
+
<option value="af_sky">Feminino - Sky</option>
|
| 319 |
+
<option value="am_adam">Masculino - Adam</option>
|
| 320 |
+
<option value="am_echo">Masculino - Echo</option>
|
| 321 |
+
<option value="am_eric">Masculino - Eric</option>
|
| 322 |
+
<option value="am_fenrir">Masculino - Fenrir</option>
|
| 323 |
+
<option value="am_liam">Masculino - Liam</option>
|
| 324 |
+
<option value="am_michael">Masculino - Michael</option>
|
| 325 |
+
<option value="am_onyx">Masculino - Onyx</option>
|
| 326 |
+
<option value="am_puck">Masculino - Puck</option>
|
| 327 |
+
<option value="am_santa">Masculino - Santa</option>
|
| 328 |
+
</optgroup>
|
| 329 |
+
<optgroup label="🇬🇧 Inglês Britânico">
|
| 330 |
+
<option value="bf_alice">Feminino - Alice</option>
|
| 331 |
+
<option value="bf_emma">Feminino - Emma</option>
|
| 332 |
+
<option value="bf_isabella">Feminino - Isabella</option>
|
| 333 |
+
<option value="bf_lily">Feminino - Lily</option>
|
| 334 |
+
<option value="bm_daniel">Masculino - Daniel</option>
|
| 335 |
+
<option value="bm_fable">Masculino - Fable</option>
|
| 336 |
+
<option value="bm_george">Masculino - George</option>
|
| 337 |
+
<option value="bm_lewis">Masculino - Lewis</option>
|
| 338 |
+
</optgroup>
|
| 339 |
+
<optgroup label="🇪🇸 Espanhol">
|
| 340 |
+
<option value="ef_dora">Feminino - Dora</option>
|
| 341 |
+
<option value="em_alex">Masculino - Alex</option>
|
| 342 |
+
<option value="em_santa">Masculino - Santa</option>
|
| 343 |
+
</optgroup>
|
| 344 |
+
<optgroup label="🇮🇹 Italiano">
|
| 345 |
+
<option value="if_sara">Feminino - Sara</option>
|
| 346 |
+
<option value="im_nicola">Masculino - Nicola</option>
|
| 347 |
+
</optgroup>
|
| 348 |
+
<optgroup label="🇯🇵 Japonês">
|
| 349 |
+
<option value="jf_alpha">Feminino - Alpha</option>
|
| 350 |
+
<option value="jf_gongitsune">Feminino - Gongitsune</option>
|
| 351 |
+
<option value="jf_nezumi">Feminino - Nezumi</option>
|
| 352 |
+
<option value="jf_tebukuro">Feminino - Tebukuro</option>
|
| 353 |
+
<option value="jm_kumo">Masculino - Kumo</option>
|
| 354 |
+
</optgroup>
|
| 355 |
+
<optgroup label="🇨🇳 Chinês">
|
| 356 |
+
<option value="zf_xiaobei">Feminino - Xiaobei</option>
|
| 357 |
+
<option value="zf_xiaoni">Feminino - Xiaoni</option>
|
| 358 |
+
<option value="zf_xiaoxiao">Feminino - Xiaoxiao</option>
|
| 359 |
+
<option value="zf_xiaoyi">Feminino - Xiaoyi</option>
|
| 360 |
+
<option value="zm_yunjian">Masculino - Yunjian</option>
|
| 361 |
+
<option value="zm_yunxi">Masculino - Yunxi</option>
|
| 362 |
+
<option value="zm_yunxia">Masculino - Yunxia</option>
|
| 363 |
+
<option value="zm_yunyang">Masculino - Yunyang</option>
|
| 364 |
+
</optgroup>
|
| 365 |
+
<optgroup label="🇮🇳 Hindi">
|
| 366 |
+
<option value="hf_alpha">Feminino - Alpha</option>
|
| 367 |
+
<option value="hf_beta">Feminino - Beta</option>
|
| 368 |
+
<option value="hm_omega">Masculino - Omega</option>
|
| 369 |
+
<option value="hm_psi">Masculino - Psi</option>
|
| 370 |
+
</optgroup>
|
| 371 |
+
</select>
|
| 372 |
+
|
| 373 |
+
<button id="ttsPlayBtn" class="btn-success" disabled style="padding: 10px 20px;">
|
| 374 |
+
▶️ Gerar Áudio
|
| 375 |
+
</button>
|
| 376 |
+
</div>
|
| 377 |
+
|
| 378 |
+
<div id="ttsStatus" style="display: none; margin-top: 15px; padding: 15px; background: #2a2a2a; border-radius: 8px;">
|
| 379 |
+
<span id="ttsStatusText">⏳ Processando...</span>
|
| 380 |
+
</div>
|
| 381 |
+
|
| 382 |
+
<div id="ttsPlayer" style="display: none; margin-top: 15px;">
|
| 383 |
+
<audio id="ttsAudio" controls style="width: 100%;"></audio>
|
| 384 |
+
</div>
|
| 385 |
+
</div>
|
| 386 |
+
|
| 387 |
+
<script>
|
| 388 |
+
// Estado da aplicação
|
| 389 |
+
let ws = null;
|
| 390 |
+
let isConnected = false;
|
| 391 |
+
let isRecording = false;
|
| 392 |
+
let audioContext = null;
|
| 393 |
+
let stream = null;
|
| 394 |
+
let audioSource = null;
|
| 395 |
+
let audioProcessor = null;
|
| 396 |
+
let pcmBuffer = [];
|
| 397 |
+
|
| 398 |
+
// Métricas
|
| 399 |
+
const metrics = {
|
| 400 |
+
sentBytes: 0,
|
| 401 |
+
receivedBytes: 0,
|
| 402 |
+
latency: 0,
|
| 403 |
+
recordingStartTime: 0
|
| 404 |
+
};
|
| 405 |
+
|
| 406 |
+
// Elementos DOM
|
| 407 |
+
const elements = {
|
| 408 |
+
statusDot: document.getElementById('statusDot'),
|
| 409 |
+
statusText: document.getElementById('statusText'),
|
| 410 |
+
latencyText: document.getElementById('latencyText'),
|
| 411 |
+
connectBtn: document.getElementById('connectBtn'),
|
| 412 |
+
talkBtn: document.getElementById('talkBtn'),
|
| 413 |
+
voiceSelect: document.getElementById('voiceSelect'),
|
| 414 |
+
sentBytes: document.getElementById('sentBytes'),
|
| 415 |
+
receivedBytes: document.getElementById('receivedBytes'),
|
| 416 |
+
format: document.getElementById('format'),
|
| 417 |
+
log: document.getElementById('log'),
|
| 418 |
+
// TTS elements
|
| 419 |
+
ttsText: document.getElementById('ttsText'),
|
| 420 |
+
ttsVoiceSelect: document.getElementById('ttsVoiceSelect'),
|
| 421 |
+
ttsPlayBtn: document.getElementById('ttsPlayBtn'),
|
| 422 |
+
ttsStatus: document.getElementById('ttsStatus'),
|
| 423 |
+
ttsStatusText: document.getElementById('ttsStatusText'),
|
| 424 |
+
ttsPlayer: document.getElementById('ttsPlayer'),
|
| 425 |
+
ttsAudio: document.getElementById('ttsAudio')
|
| 426 |
+
};
|
| 427 |
+
|
| 428 |
+
// Log no console visual
|
| 429 |
+
function log(message, type = 'info') {
|
| 430 |
+
const time = new Date().toLocaleTimeString('pt-BR');
|
| 431 |
+
const entry = document.createElement('div');
|
| 432 |
+
entry.className = `log-entry ${type}`;
|
| 433 |
+
entry.innerHTML = `
|
| 434 |
+
<span class="log-time">[${time}]</span>
|
| 435 |
+
<span class="log-message">${message}</span>
|
| 436 |
+
`;
|
| 437 |
+
elements.log.appendChild(entry);
|
| 438 |
+
elements.log.scrollTop = elements.log.scrollHeight;
|
| 439 |
+
console.log(`[${type}] ${message}`);
|
| 440 |
+
}
|
| 441 |
+
|
| 442 |
+
// Atualizar métricas
|
| 443 |
+
function updateMetrics() {
|
| 444 |
+
elements.sentBytes.textContent = `${(metrics.sentBytes / 1024).toFixed(1)} KB`;
|
| 445 |
+
elements.receivedBytes.textContent = `${(metrics.receivedBytes / 1024).toFixed(1)} KB`;
|
| 446 |
+
elements.latencyText.textContent = `Latência: ${metrics.latency}ms`;
|
| 447 |
+
}
|
| 448 |
+
|
| 449 |
+
// Conectar ao WebSocket
|
| 450 |
+
async function connect() {
|
| 451 |
+
try {
|
| 452 |
+
// Solicitar acesso ao microfone
|
| 453 |
+
stream = await navigator.mediaDevices.getUserMedia({
|
| 454 |
+
audio: {
|
| 455 |
+
echoCancellation: true,
|
| 456 |
+
noiseSuppression: true,
|
| 457 |
+
sampleRate: 24000 // High quality 24kHz
|
| 458 |
+
}
|
| 459 |
+
});
|
| 460 |
+
|
| 461 |
+
log('✅ Microfone acessado', 'success');
|
| 462 |
+
|
| 463 |
+
// Conectar WebSocket com suporte binário
|
| 464 |
+
const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
|
| 465 |
+
const wsUrl = `${protocol}//${window.location.host}/ws`;
|
| 466 |
+
ws = new WebSocket(wsUrl);
|
| 467 |
+
ws.binaryType = 'arraybuffer';
|
| 468 |
+
|
| 469 |
+
ws.onopen = () => {
|
| 470 |
+
isConnected = true;
|
| 471 |
+
elements.statusDot.classList.add('connected');
|
| 472 |
+
elements.statusText.textContent = 'Conectado';
|
| 473 |
+
elements.connectBtn.textContent = 'Desconectar';
|
| 474 |
+
elements.connectBtn.classList.remove('btn-primary');
|
| 475 |
+
elements.connectBtn.classList.add('btn-danger');
|
| 476 |
+
elements.talkBtn.disabled = false;
|
| 477 |
+
|
| 478 |
+
// Enviar voz selecionada ao conectar
|
| 479 |
+
const currentVoice = elements.voiceSelect.value || elements.ttsVoiceSelect.value || 'pf_dora';
|
| 480 |
+
ws.send(JSON.stringify({
|
| 481 |
+
type: 'set-voice',
|
| 482 |
+
voice_id: currentVoice
|
| 483 |
+
}));
|
| 484 |
+
log(`🔊 Voz configurada: ${currentVoice}`, 'info');
|
| 485 |
+
elements.ttsPlayBtn.disabled = false; // Habilitar TTS button
|
| 486 |
+
log('✅ Conectado ao servidor', 'success');
|
| 487 |
+
};
|
| 488 |
+
|
| 489 |
+
ws.onmessage = (event) => {
|
| 490 |
+
if (event.data instanceof ArrayBuffer) {
|
| 491 |
+
// Áudio PCM binário recebido
|
| 492 |
+
handlePCMAudio(event.data);
|
| 493 |
+
} else {
|
| 494 |
+
// Mensagem JSON
|
| 495 |
+
const data = JSON.parse(event.data);
|
| 496 |
+
handleMessage(data);
|
| 497 |
+
}
|
| 498 |
+
};
|
| 499 |
+
|
| 500 |
+
ws.onerror = (error) => {
|
| 501 |
+
log(`❌ Erro WebSocket: ${error}`, 'error');
|
| 502 |
+
};
|
| 503 |
+
|
| 504 |
+
ws.onclose = () => {
|
| 505 |
+
disconnect();
|
| 506 |
+
};
|
| 507 |
+
|
| 508 |
+
} catch (error) {
|
| 509 |
+
log(`❌ Erro ao conectar: ${error.message}`, 'error');
|
| 510 |
+
}
|
| 511 |
+
}
|
| 512 |
+
|
| 513 |
+
// Desconectar
|
| 514 |
+
function disconnect() {
|
| 515 |
+
isConnected = false;
|
| 516 |
+
|
| 517 |
+
if (ws) {
|
| 518 |
+
ws.close();
|
| 519 |
+
ws = null;
|
| 520 |
+
}
|
| 521 |
+
|
| 522 |
+
if (stream) {
|
| 523 |
+
stream.getTracks().forEach(track => track.stop());
|
| 524 |
+
stream = null;
|
| 525 |
+
}
|
| 526 |
+
|
| 527 |
+
if (audioContext) {
|
| 528 |
+
audioContext.close();
|
| 529 |
+
audioContext = null;
|
| 530 |
+
}
|
| 531 |
+
|
| 532 |
+
elements.statusDot.classList.remove('connected');
|
| 533 |
+
elements.statusText.textContent = 'Desconectado';
|
| 534 |
+
elements.connectBtn.textContent = 'Conectar';
|
| 535 |
+
elements.connectBtn.classList.remove('btn-danger');
|
| 536 |
+
elements.connectBtn.classList.add('btn-primary');
|
| 537 |
+
elements.talkBtn.disabled = true;
|
| 538 |
+
|
| 539 |
+
log('👋 Desconectado', 'warning');
|
| 540 |
+
}
|
| 541 |
+
|
| 542 |
+
// Iniciar gravação PCM
|
| 543 |
+
function startRecording() {
|
| 544 |
+
if (isRecording) return;
|
| 545 |
+
|
| 546 |
+
isRecording = true;
|
| 547 |
+
metrics.recordingStartTime = Date.now();
|
| 548 |
+
elements.talkBtn.classList.add('recording');
|
| 549 |
+
elements.talkBtn.textContent = 'Gravando...';
|
| 550 |
+
pcmBuffer = [];
|
| 551 |
+
|
| 552 |
+
const sampleRate = 24000; // Sempre usar melhor qualidade
|
| 553 |
+
log(`🎤 Gravando PCM 16-bit @ ${sampleRate}Hz (alta qualidade)`, 'info');
|
| 554 |
+
|
| 555 |
+
// Criar AudioContext se necessário
|
| 556 |
+
if (!audioContext) {
|
| 557 |
+
// Sempre usar melhor qualidade (24kHz)
|
| 558 |
+
const sampleRate = 24000;
|
| 559 |
+
|
| 560 |
+
audioContext = new (window.AudioContext || window.webkitAudioContext)({
|
| 561 |
+
sampleRate: sampleRate
|
| 562 |
+
});
|
| 563 |
+
|
| 564 |
+
log(`🎧 AudioContext criado: ${sampleRate}Hz (alta qualidade)`, 'info');
|
| 565 |
+
}
|
| 566 |
+
|
| 567 |
+
// Criar processador de áudio
|
| 568 |
+
audioSource = audioContext.createMediaStreamSource(stream);
|
| 569 |
+
audioProcessor = audioContext.createScriptProcessor(4096, 1, 1);
|
| 570 |
+
|
| 571 |
+
audioProcessor.onaudioprocess = (e) => {
|
| 572 |
+
if (!isRecording) return;
|
| 573 |
+
|
| 574 |
+
const inputData = e.inputBuffer.getChannelData(0);
|
| 575 |
+
|
| 576 |
+
// Calcular RMS (Root Mean Square) para melhor detecção de volume
|
| 577 |
+
let sumSquares = 0;
|
| 578 |
+
for (let i = 0; i < inputData.length; i++) {
|
| 579 |
+
sumSquares += inputData[i] * inputData[i];
|
| 580 |
+
}
|
| 581 |
+
const rms = Math.sqrt(sumSquares / inputData.length);
|
| 582 |
+
|
| 583 |
+
// Calcular amplitude máxima também
|
| 584 |
+
let maxAmplitude = 0;
|
| 585 |
+
for (let i = 0; i < inputData.length; i++) {
|
| 586 |
+
maxAmplitude = Math.max(maxAmplitude, Math.abs(inputData[i]));
|
| 587 |
+
}
|
| 588 |
+
|
| 589 |
+
// Detecção de voz baseada em RMS (mais confiável que amplitude máxima)
|
| 590 |
+
const voiceThreshold = 0.01; // Threshold para detectar voz
|
| 591 |
+
const hasVoice = rms > voiceThreshold;
|
| 592 |
+
|
| 593 |
+
// Aplicar ganho suave apenas se necessário
|
| 594 |
+
let gain = 1.0;
|
| 595 |
+
if (hasVoice && rms < 0.05) {
|
| 596 |
+
// Ganho suave baseado em RMS, máximo 5x
|
| 597 |
+
gain = Math.min(5.0, 0.05 / rms);
|
| 598 |
+
if (gain > 1.2) {
|
| 599 |
+
log(`🎤 Volume baixo detectado, aplicando ganho: ${gain.toFixed(1)}x`, 'info');
|
| 600 |
+
}
|
| 601 |
+
}
|
| 602 |
+
|
| 603 |
+
// Converter Float32 para Int16 com processamento melhorado
|
| 604 |
+
const pcmData = new Int16Array(inputData.length);
|
| 605 |
+
for (let i = 0; i < inputData.length; i++) {
|
| 606 |
+
// Aplicar ganho suave
|
| 607 |
+
let sample = inputData[i] * gain;
|
| 608 |
+
|
| 609 |
+
// Soft clipping para evitar distorção
|
| 610 |
+
if (Math.abs(sample) > 0.95) {
|
| 611 |
+
sample = Math.sign(sample) * (0.95 + 0.05 * Math.tanh((Math.abs(sample) - 0.95) * 10));
|
| 612 |
+
}
|
| 613 |
+
|
| 614 |
+
// Converter para Int16
|
| 615 |
+
sample = Math.max(-1, Math.min(1, sample));
|
| 616 |
+
pcmData[i] = sample < 0 ? sample * 0x8000 : sample * 0x7FFF;
|
| 617 |
+
}
|
| 618 |
+
|
| 619 |
+
// Adicionar ao buffer apenas se detectar voz
|
| 620 |
+
if (hasVoice) {
|
| 621 |
+
pcmBuffer.push(pcmData);
|
| 622 |
+
}
|
| 623 |
+
};
|
| 624 |
+
|
| 625 |
+
audioSource.connect(audioProcessor);
|
| 626 |
+
audioProcessor.connect(audioContext.destination);
|
| 627 |
+
}
|
| 628 |
+
|
| 629 |
+
// Parar gravação e enviar
|
| 630 |
+
function stopRecording() {
|
| 631 |
+
if (!isRecording) return;
|
| 632 |
+
|
| 633 |
+
isRecording = false;
|
| 634 |
+
const duration = Date.now() - metrics.recordingStartTime;
|
| 635 |
+
elements.talkBtn.classList.remove('recording');
|
| 636 |
+
elements.talkBtn.textContent = 'Push to Talk';
|
| 637 |
+
|
| 638 |
+
// Desconectar processador
|
| 639 |
+
if (audioProcessor) {
|
| 640 |
+
audioProcessor.disconnect();
|
| 641 |
+
audioProcessor = null;
|
| 642 |
+
}
|
| 643 |
+
if (audioSource) {
|
| 644 |
+
audioSource.disconnect();
|
| 645 |
+
audioSource = null;
|
| 646 |
+
}
|
| 647 |
+
|
| 648 |
+
// Verificar se há áudio para enviar
|
| 649 |
+
if (pcmBuffer.length === 0) {
|
| 650 |
+
log(`⚠️ Nenhum áudio capturado (silêncio ou volume muito baixo)`, 'warning');
|
| 651 |
+
pcmBuffer = [];
|
| 652 |
+
return;
|
| 653 |
+
}
|
| 654 |
+
|
| 655 |
+
// Combinar todos os chunks PCM
|
| 656 |
+
const totalLength = pcmBuffer.reduce((acc, chunk) => acc + chunk.length, 0);
|
| 657 |
+
|
| 658 |
+
// Verificar tamanho mínimo (0.5 segundos)
|
| 659 |
+
const sampleRate = 24000; // Sempre 24kHz
|
| 660 |
+
const minSamples = sampleRate * 0.5;
|
| 661 |
+
|
| 662 |
+
if (totalLength < minSamples) {
|
| 663 |
+
log(`⚠️ Áudio muito curto: ${(totalLength/sampleRate).toFixed(2)}s (mínimo 0.5s)`, 'warning');
|
| 664 |
+
pcmBuffer = [];
|
| 665 |
+
return;
|
| 666 |
+
}
|
| 667 |
+
|
| 668 |
+
const fullPCM = new Int16Array(totalLength);
|
| 669 |
+
let offset = 0;
|
| 670 |
+
for (const chunk of pcmBuffer) {
|
| 671 |
+
fullPCM.set(chunk, offset);
|
| 672 |
+
offset += chunk.length;
|
| 673 |
+
}
|
| 674 |
+
|
| 675 |
+
// Calcular amplitude final para debug
|
| 676 |
+
let maxAmp = 0;
|
| 677 |
+
for (let i = 0; i < Math.min(fullPCM.length, 1000); i++) {
|
| 678 |
+
maxAmp = Math.max(maxAmp, Math.abs(fullPCM[i] / 32768));
|
| 679 |
+
}
|
| 680 |
+
|
| 681 |
+
// Enviar PCM binário direto (sem Base64!)
|
| 682 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 683 |
+
// Enviar um header simples antes do áudio
|
| 684 |
+
const header = new ArrayBuffer(8);
|
| 685 |
+
const view = new DataView(header);
|
| 686 |
+
view.setUint32(0, 0x50434D16); // Magic: "PCM16"
|
| 687 |
+
view.setUint32(4, fullPCM.length * 2); // Tamanho em bytes
|
| 688 |
+
|
| 689 |
+
ws.send(header);
|
| 690 |
+
ws.send(fullPCM.buffer);
|
| 691 |
+
|
| 692 |
+
metrics.sentBytes += fullPCM.length * 2;
|
| 693 |
+
updateMetrics();
|
| 694 |
+
const sampleRate = 24000; // Sempre 24kHz
|
| 695 |
+
log(`📤 PCM enviado: ${(fullPCM.length * 2 / 1024).toFixed(1)}KB, ${(totalLength/sampleRate).toFixed(1)}s @ ${sampleRate}Hz, amp:${maxAmp.toFixed(3)}`, 'success');
|
| 696 |
+
}
|
| 697 |
+
|
| 698 |
+
// Limpar buffer após enviar
|
| 699 |
+
pcmBuffer = [];
|
| 700 |
+
}
|
| 701 |
+
|
| 702 |
+
// Processar mensagem JSON
|
| 703 |
+
function handleMessage(data) {
|
| 704 |
+
switch (data.type) {
|
| 705 |
+
case 'metrics':
|
| 706 |
+
metrics.latency = data.latency;
|
| 707 |
+
updateMetrics();
|
| 708 |
+
log(`📊 Resposta: "${data.response}" (${data.latency}ms)`, 'success');
|
| 709 |
+
break;
|
| 710 |
+
|
| 711 |
+
case 'error':
|
| 712 |
+
log(`❌ Erro: ${data.message}`, 'error');
|
| 713 |
+
break;
|
| 714 |
+
|
| 715 |
+
case 'tts-response':
|
| 716 |
+
// Resposta do TTS direto (Opus 24kHz ou PCM)
|
| 717 |
+
if (data.audio) {
|
| 718 |
+
// Decodificar base64 para arraybuffer
|
| 719 |
+
const binaryString = atob(data.audio);
|
| 720 |
+
const bytes = new Uint8Array(binaryString.length);
|
| 721 |
+
for (let i = 0; i < binaryString.length; i++) {
|
| 722 |
+
bytes[i] = binaryString.charCodeAt(i);
|
| 723 |
+
}
|
| 724 |
+
|
| 725 |
+
let audioData = bytes.buffer;
|
| 726 |
+
// IMPORTANTE: Usar a taxa enviada pelo servidor
|
| 727 |
+
const sampleRate = data.sampleRate || 24000;
|
| 728 |
+
|
| 729 |
+
console.log(`🎯 TTS Response - Taxa recebida: ${sampleRate}Hz, Formato: ${data.format}, Tamanho: ${bytes.length} bytes`);
|
| 730 |
+
|
| 731 |
+
// Se for Opus, usar WebAudio API para decodificar nativamente
|
| 732 |
+
let wavBuffer;
|
| 733 |
+
if (data.format === 'opus') {
|
| 734 |
+
console.log(`🗜️ Opus 24kHz recebido: ${(bytes.length/1024).toFixed(1)}KB`);
|
| 735 |
+
|
| 736 |
+
// Log de economia de banda
|
| 737 |
+
if (data.originalSize) {
|
| 738 |
+
const compression = Math.round(100 - (bytes.length / data.originalSize) * 100);
|
| 739 |
+
console.log(`📊 Economia de banda: ${compression}% (${(data.originalSize/1024).toFixed(1)}KB → ${(bytes.length/1024).toFixed(1)}KB)`);
|
| 740 |
+
}
|
| 741 |
+
|
| 742 |
+
// WebAudio API pode decodificar Opus nativamente
|
| 743 |
+
// Por agora, tratar como PCM até implementar decoder completo
|
| 744 |
+
wavBuffer = addWavHeader(audioData, sampleRate);
|
| 745 |
+
} else {
|
| 746 |
+
// PCM - adicionar WAV header com a taxa correta
|
| 747 |
+
wavBuffer = addWavHeader(audioData, sampleRate);
|
| 748 |
+
}
|
| 749 |
+
|
| 750 |
+
// Log da qualidade recebida
|
| 751 |
+
console.log(`🎵 TTS pronto: ${(audioData.byteLength/1024).toFixed(1)}KB @ ${sampleRate}Hz (${data.quality || 'high'} quality, ${data.format || 'pcm'})`);
|
| 752 |
+
|
| 753 |
+
// Criar blob e URL
|
| 754 |
+
const blob = new Blob([wavBuffer], { type: 'audio/wav' });
|
| 755 |
+
const audioUrl = URL.createObjectURL(blob);
|
| 756 |
+
|
| 757 |
+
// Atualizar player
|
| 758 |
+
elements.ttsAudio.src = audioUrl;
|
| 759 |
+
elements.ttsPlayer.style.display = 'block';
|
| 760 |
+
elements.ttsStatus.style.display = 'none';
|
| 761 |
+
elements.ttsPlayBtn.disabled = false;
|
| 762 |
+
elements.ttsPlayBtn.textContent = '▶️ Gerar Áudio';
|
| 763 |
+
|
| 764 |
+
log('🎵 Áudio TTS gerado com sucesso!', 'success');
|
| 765 |
+
}
|
| 766 |
+
break;
|
| 767 |
+
}
|
| 768 |
+
}
|
| 769 |
+
|
| 770 |
+
// Processar áudio PCM recebido
|
| 771 |
+
function handlePCMAudio(arrayBuffer) {
|
| 772 |
+
metrics.receivedBytes += arrayBuffer.byteLength;
|
| 773 |
+
updateMetrics();
|
| 774 |
+
|
| 775 |
+
// Criar WAV header para reproduzir
|
| 776 |
+
const wavBuffer = addWavHeader(arrayBuffer);
|
| 777 |
+
|
| 778 |
+
// Criar blob e URL para o áudio
|
| 779 |
+
const blob = new Blob([wavBuffer], { type: 'audio/wav' });
|
| 780 |
+
const audioUrl = URL.createObjectURL(blob);
|
| 781 |
+
|
| 782 |
+
// Criar log com botão de play
|
| 783 |
+
const time = new Date().toLocaleTimeString('pt-BR');
|
| 784 |
+
const entry = document.createElement('div');
|
| 785 |
+
entry.className = 'log-entry success';
|
| 786 |
+
entry.innerHTML = `
|
| 787 |
+
<span class="log-time">[${time}]</span>
|
| 788 |
+
<span class="log-message">🔊 Áudio recebido: ${(arrayBuffer.byteLength / 1024).toFixed(1)}KB</span>
|
| 789 |
+
<div class="audio-player">
|
| 790 |
+
<button class="play-btn" onclick="playAudio('${audioUrl}')">▶️ Play</button>
|
| 791 |
+
<audio id="audio-${Date.now()}" src="${audioUrl}" style="display: none;"></audio>
|
| 792 |
+
</div>
|
| 793 |
+
`;
|
| 794 |
+
elements.log.appendChild(entry);
|
| 795 |
+
elements.log.scrollTop = elements.log.scrollHeight;
|
| 796 |
+
|
| 797 |
+
// Auto-play o áudio
|
| 798 |
+
const audio = new Audio(audioUrl);
|
| 799 |
+
audio.play().catch(err => {
|
| 800 |
+
console.log('Auto-play bloqueado, use o botão para reproduzir');
|
| 801 |
+
});
|
| 802 |
+
}
|
| 803 |
+
|
| 804 |
+
// Função para tocar áudio manualmente
|
| 805 |
+
function playAudio(url) {
|
| 806 |
+
const audio = new Audio(url);
|
| 807 |
+
audio.play();
|
| 808 |
+
}
|
| 809 |
+
|
| 810 |
+
// Adicionar header WAV ao PCM
|
| 811 |
+
function addWavHeader(pcmBuffer, customSampleRate) {
|
| 812 |
+
const pcmData = new Uint8Array(pcmBuffer);
|
| 813 |
+
const wavBuffer = new ArrayBuffer(44 + pcmData.length);
|
| 814 |
+
const view = new DataView(wavBuffer);
|
| 815 |
+
|
| 816 |
+
// WAV header
|
| 817 |
+
const writeString = (offset, string) => {
|
| 818 |
+
for (let i = 0; i < string.length; i++) {
|
| 819 |
+
view.setUint8(offset + i, string.charCodeAt(i));
|
| 820 |
+
}
|
| 821 |
+
};
|
| 822 |
+
|
| 823 |
+
writeString(0, 'RIFF');
|
| 824 |
+
view.setUint32(4, 36 + pcmData.length, true);
|
| 825 |
+
writeString(8, 'WAVE');
|
| 826 |
+
writeString(12, 'fmt ');
|
| 827 |
+
view.setUint32(16, 16, true); // fmt chunk size
|
| 828 |
+
view.setUint16(20, 1, true); // PCM format
|
| 829 |
+
view.setUint16(22, 1, true); // Mono
|
| 830 |
+
|
| 831 |
+
// Usar taxa customizada se fornecida, senão usar 24kHz
|
| 832 |
+
let sampleRate = customSampleRate || 24000;
|
| 833 |
+
|
| 834 |
+
console.log(`📝 WAV Header - Configurando taxa: ${sampleRate}Hz`);
|
| 835 |
+
|
| 836 |
+
view.setUint32(24, sampleRate, true); // Sample rate
|
| 837 |
+
view.setUint32(28, sampleRate * 2, true); // Byte rate: sampleRate * 1 * 2
|
| 838 |
+
view.setUint16(32, 2, true); // Block align: 1 * 2
|
| 839 |
+
view.setUint16(34, 16, true); // Bits per sample: 16-bit
|
| 840 |
+
writeString(36, 'data');
|
| 841 |
+
view.setUint32(40, pcmData.length, true);
|
| 842 |
+
|
| 843 |
+
// Copiar dados PCM
|
| 844 |
+
new Uint8Array(wavBuffer, 44).set(pcmData);
|
| 845 |
+
|
| 846 |
+
return wavBuffer;
|
| 847 |
+
}
|
| 848 |
+
|
| 849 |
+
// Event Listeners
|
| 850 |
+
elements.connectBtn.addEventListener('click', () => {
|
| 851 |
+
if (isConnected) {
|
| 852 |
+
disconnect();
|
| 853 |
+
} else {
|
| 854 |
+
connect();
|
| 855 |
+
}
|
| 856 |
+
});
|
| 857 |
+
|
| 858 |
+
elements.talkBtn.addEventListener('mousedown', startRecording);
|
| 859 |
+
elements.talkBtn.addEventListener('mouseup', stopRecording);
|
| 860 |
+
elements.talkBtn.addEventListener('mouseleave', stopRecording);
|
| 861 |
+
|
| 862 |
+
// Voice selector listener
|
| 863 |
+
elements.voiceSelect.addEventListener('change', (e) => {
|
| 864 |
+
const voice_id = e.target.value;
|
| 865 |
+
console.log('Voice select changed to:', voice_id);
|
| 866 |
+
|
| 867 |
+
// Update current voice display
|
| 868 |
+
const currentVoiceElement = document.getElementById('currentVoice');
|
| 869 |
+
if (currentVoiceElement) {
|
| 870 |
+
currentVoiceElement.textContent = voice_id;
|
| 871 |
+
}
|
| 872 |
+
|
| 873 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 874 |
+
console.log('Sending set-voice command:', voice_id);
|
| 875 |
+
ws.send(JSON.stringify({
|
| 876 |
+
type: 'set-voice',
|
| 877 |
+
voice_id: voice_id
|
| 878 |
+
}));
|
| 879 |
+
log(`🔊 Voz alterada para: ${voice_id} - ${e.target.options[e.target.selectedIndex].text}`, 'info');
|
| 880 |
+
} else {
|
| 881 |
+
console.log('WebSocket not connected, cannot send voice change');
|
| 882 |
+
log(`⚠️ Conecte-se primeiro para mudar a voz`, 'warning');
|
| 883 |
+
}
|
| 884 |
+
});
|
| 885 |
+
elements.talkBtn.addEventListener('touchstart', startRecording);
|
| 886 |
+
elements.talkBtn.addEventListener('touchend', stopRecording);
|
| 887 |
+
|
| 888 |
+
// TTS Voice selector listener
|
| 889 |
+
elements.ttsVoiceSelect.addEventListener('change', (e) => {
|
| 890 |
+
const voice_id = e.target.value;
|
| 891 |
+
|
| 892 |
+
// Update main voice selector
|
| 893 |
+
elements.voiceSelect.value = voice_id;
|
| 894 |
+
|
| 895 |
+
// Update current voice display
|
| 896 |
+
const currentVoiceElement = document.getElementById('currentVoice');
|
| 897 |
+
if (currentVoiceElement) {
|
| 898 |
+
currentVoiceElement.textContent = voice_id;
|
| 899 |
+
}
|
| 900 |
+
|
| 901 |
+
// Send voice change to server
|
| 902 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 903 |
+
ws.send(JSON.stringify({
|
| 904 |
+
type: 'set-voice',
|
| 905 |
+
voice_id: voice_id
|
| 906 |
+
}));
|
| 907 |
+
log(`🎤 Voz TTS alterada para: ${voice_id}`, 'info');
|
| 908 |
+
}
|
| 909 |
+
});
|
| 910 |
+
|
| 911 |
+
// TTS Button Event Listener
|
| 912 |
+
elements.ttsPlayBtn.addEventListener('click', (e) => {
|
| 913 |
+
e.preventDefault();
|
| 914 |
+
e.stopPropagation();
|
| 915 |
+
|
| 916 |
+
console.log('TTS Button clicked!');
|
| 917 |
+
const text = elements.ttsText.value.trim();
|
| 918 |
+
const voice = elements.ttsVoiceSelect.value;
|
| 919 |
+
|
| 920 |
+
console.log('TTS Text:', text);
|
| 921 |
+
console.log('TTS Voice:', voice);
|
| 922 |
+
|
| 923 |
+
if (!text) {
|
| 924 |
+
alert('Por favor, digite algum texto para converter em áudio');
|
| 925 |
+
return;
|
| 926 |
+
}
|
| 927 |
+
|
| 928 |
+
if (!ws || ws.readyState !== WebSocket.OPEN) {
|
| 929 |
+
alert('Por favor, conecte-se primeiro clicando em "Conectar"');
|
| 930 |
+
return;
|
| 931 |
+
}
|
| 932 |
+
|
| 933 |
+
// Mostrar status
|
| 934 |
+
elements.ttsStatus.style.display = 'block';
|
| 935 |
+
elements.ttsStatusText.textContent = '⏳ Gerando áudio...';
|
| 936 |
+
elements.ttsPlayBtn.disabled = true;
|
| 937 |
+
elements.ttsPlayBtn.textContent = '⏳ Processando...';
|
| 938 |
+
elements.ttsPlayer.style.display = 'none';
|
| 939 |
+
|
| 940 |
+
// Sempre usar melhor qualidade (24kHz)
|
| 941 |
+
const quality = 'high';
|
| 942 |
+
|
| 943 |
+
// Enviar request para TTS com qualidade máxima
|
| 944 |
+
const ttsRequest = {
|
| 945 |
+
type: 'text-to-speech',
|
| 946 |
+
text: text,
|
| 947 |
+
voice_id: voice,
|
| 948 |
+
quality: quality,
|
| 949 |
+
format: 'opus' // Opus 24kHz @ 32kbps - máxima qualidade, mínima banda
|
| 950 |
+
};
|
| 951 |
+
|
| 952 |
+
console.log('Sending TTS request:', ttsRequest);
|
| 953 |
+
ws.send(JSON.stringify(ttsRequest));
|
| 954 |
+
|
| 955 |
+
log(`🎤 Solicitando TTS: voz=${voice}, texto="${text.substring(0, 50)}..."`, 'info');
|
| 956 |
+
});
|
| 957 |
+
|
| 958 |
+
// Inicialização
|
| 959 |
+
log('🚀 Ultravox Chat PCM Otimizado', 'info');
|
| 960 |
+
log('📊 Formato: PCM 16-bit @ 16kHz', 'info');
|
| 961 |
+
log('⚡ Sem FFmpeg, sem Base64!', 'success');
|
| 962 |
+
</script>
|
| 963 |
+
</body>
|
| 964 |
+
</html>
|
services/webrtc_gateway/ultravox-chat-server.js
CHANGED
|
@@ -317,6 +317,22 @@ function handleMessage(clientId, data) {
|
|
| 317 |
handleAudioData(clientId, data.audio);
|
| 318 |
break;
|
| 319 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 320 |
case 'broadcast':
|
| 321 |
handleBroadcast(clientId, data.message);
|
| 322 |
break;
|
|
@@ -561,27 +577,146 @@ const pcmBuffers = new Map();
|
|
| 561 |
function handleBinaryMessage(clientId, buffer) {
|
| 562 |
// Verificar se é header ou dados
|
| 563 |
if (buffer.length === 8) {
|
| 564 |
-
// Header PCM
|
| 565 |
const view = new DataView(buffer.buffer, buffer.byteOffset, buffer.length);
|
| 566 |
const magic = view.getUint32(0);
|
| 567 |
const size = view.getUint32(4);
|
| 568 |
|
| 569 |
if (magic === 0x50434D16) { // "PCM16"
|
| 570 |
console.log(`🎤 PCM header: ${size} bytes esperados`);
|
| 571 |
-
pcmBuffers.set(clientId, { expectedSize: size, data: Buffer.alloc(0) });
|
|
|
|
|
|
|
|
|
|
| 572 |
}
|
| 573 |
} else {
|
| 574 |
-
//
|
| 575 |
-
|
| 576 |
-
handlePCMData(clientId, buffer);
|
| 577 |
|
| 578 |
-
|
| 579 |
-
|
| 580 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 581 |
}
|
| 582 |
}
|
| 583 |
}
|
| 584 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 585 |
// Processar dados PCM direto (sem conversão!)
|
| 586 |
async function handlePCMData(clientId, pcmBuffer) {
|
| 587 |
const client = clients.get(clientId);
|
|
@@ -655,13 +790,26 @@ async function handlePCMData(clientId, pcmBuffer) {
|
|
| 655 |
});
|
| 656 |
}
|
| 657 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 658 |
// Sintetizar áudio com TTS
|
| 659 |
const ttsResult = await synthesizeWithTTS(clientId, response, session);
|
| 660 |
const responseAudio = ttsResult.audioData;
|
| 661 |
console.log(` 🔊 Áudio sintetizado: ${responseAudio.length} bytes @ ${ttsResult.sampleRate}Hz`);
|
| 662 |
|
| 663 |
-
// Enviar
|
| 664 |
-
client.ws.send(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 665 |
|
| 666 |
const totalLatency = Date.now() - startTime;
|
| 667 |
console.log(`⏱️ Latência total: ${totalLatency}ms`);
|
|
|
|
| 317 |
handleAudioData(clientId, data.audio);
|
| 318 |
break;
|
| 319 |
|
| 320 |
+
case 'audio':
|
| 321 |
+
// Processar áudio enviado em formato JSON (como no teste)
|
| 322 |
+
if (data.data && data.format) {
|
| 323 |
+
const audioBuffer = Buffer.from(data.data, 'base64');
|
| 324 |
+
console.log(`🎤 Received audio JSON: ${audioBuffer.length} bytes, format: ${data.format}`);
|
| 325 |
+
|
| 326 |
+
if (data.format === 'float32') {
|
| 327 |
+
// Áudio já está em Float32, processar diretamente sem conversão
|
| 328 |
+
handleFloat32Audio(clientId, audioBuffer);
|
| 329 |
+
} else {
|
| 330 |
+
// Processar como PCM int16
|
| 331 |
+
handlePCMData(clientId, audioBuffer);
|
| 332 |
+
}
|
| 333 |
+
}
|
| 334 |
+
break;
|
| 335 |
+
|
| 336 |
case 'broadcast':
|
| 337 |
handleBroadcast(clientId, data.message);
|
| 338 |
break;
|
|
|
|
| 577 |
function handleBinaryMessage(clientId, buffer) {
|
| 578 |
// Verificar se é header ou dados
|
| 579 |
if (buffer.length === 8) {
|
| 580 |
+
// Header PCM ou Opus
|
| 581 |
const view = new DataView(buffer.buffer, buffer.byteOffset, buffer.length);
|
| 582 |
const magic = view.getUint32(0);
|
| 583 |
const size = view.getUint32(4);
|
| 584 |
|
| 585 |
if (magic === 0x50434D16) { // "PCM16"
|
| 586 |
console.log(`🎤 PCM header: ${size} bytes esperados`);
|
| 587 |
+
pcmBuffers.set(clientId, { expectedSize: size, data: Buffer.alloc(0), type: 'pcm' });
|
| 588 |
+
} else if (magic === 0x4F505553) { // "OPUS"
|
| 589 |
+
console.log(`🎵 Opus header: ${size} bytes esperados`);
|
| 590 |
+
pcmBuffers.set(clientId, { expectedSize: size, data: Buffer.alloc(0), type: 'opus' });
|
| 591 |
}
|
| 592 |
} else {
|
| 593 |
+
// Verificar se temos um buffer esperando dados
|
| 594 |
+
const bufferInfo = pcmBuffers.get(clientId);
|
|
|
|
| 595 |
|
| 596 |
+
if (bufferInfo) {
|
| 597 |
+
// Adicionar dados ao buffer
|
| 598 |
+
bufferInfo.data = Buffer.concat([bufferInfo.data, buffer]);
|
| 599 |
+
console.log(`📦 Buffer acumulado: ${bufferInfo.data.length}/${bufferInfo.expectedSize} bytes`);
|
| 600 |
+
|
| 601 |
+
// Se recebemos todos os dados esperados
|
| 602 |
+
if (bufferInfo.data.length >= bufferInfo.expectedSize) {
|
| 603 |
+
if (bufferInfo.type === 'opus') {
|
| 604 |
+
console.log(`🎵 Processando Opus: ${bufferInfo.data.length} bytes`);
|
| 605 |
+
handleOpusData(clientId, bufferInfo.data);
|
| 606 |
+
} else {
|
| 607 |
+
console.log(`🎤 Processando PCM: ${bufferInfo.data.length} bytes`);
|
| 608 |
+
handlePCMData(clientId, bufferInfo.data);
|
| 609 |
+
}
|
| 610 |
+
pcmBuffers.delete(clientId);
|
| 611 |
+
}
|
| 612 |
+
} else {
|
| 613 |
+
// Processar PCM diretamente (sem header)
|
| 614 |
+
console.log(`🎵 Processando PCM direto: ${buffer.length} bytes`);
|
| 615 |
+
handlePCMData(clientId, buffer);
|
| 616 |
}
|
| 617 |
}
|
| 618 |
}
|
| 619 |
|
| 620 |
+
// Processar dados Opus
|
| 621 |
+
async function handleOpusData(clientId, opusBuffer) {
|
| 622 |
+
try {
|
| 623 |
+
// Descomprimir Opus para PCM
|
| 624 |
+
const pcmBuffer = decompressOpusToPCM(opusBuffer);
|
| 625 |
+
console.log(`🎵 Opus descomprimido: ${opusBuffer.length} bytes -> ${pcmBuffer.length} bytes PCM`);
|
| 626 |
+
|
| 627 |
+
// Processar como PCM
|
| 628 |
+
await handlePCMData(clientId, pcmBuffer);
|
| 629 |
+
} catch (error) {
|
| 630 |
+
console.error(`❌ Erro ao processar Opus: ${error.message}`);
|
| 631 |
+
}
|
| 632 |
+
}
|
| 633 |
+
|
| 634 |
+
// Processar áudio que já está em Float32
|
| 635 |
+
async function handleFloat32Audio(clientId, float32Buffer) {
|
| 636 |
+
const client = clients.get(clientId);
|
| 637 |
+
const session = sessions.get(clientId);
|
| 638 |
+
|
| 639 |
+
if (!client || !session) return;
|
| 640 |
+
|
| 641 |
+
if (client.isProcessing) {
|
| 642 |
+
console.log('⚠️ Já processando áudio, ignorando...');
|
| 643 |
+
return;
|
| 644 |
+
}
|
| 645 |
+
|
| 646 |
+
client.isProcessing = true;
|
| 647 |
+
const startTime = Date.now();
|
| 648 |
+
|
| 649 |
+
try {
|
| 650 |
+
console.log(`\n🎤 FLOAT32 AUDIO RECEBIDO [${clientId}]`);
|
| 651 |
+
console.log(` Tamanho: ${float32Buffer.length} bytes`);
|
| 652 |
+
console.log(` Formato: Float32 normalizado`);
|
| 653 |
+
|
| 654 |
+
// Áudio já está em Float32, apenas passar adiante
|
| 655 |
+
console.log(` 📊 Áudio Float32 pronto: ${float32Buffer.length} bytes`);
|
| 656 |
+
|
| 657 |
+
// Processar com Ultravox
|
| 658 |
+
const response = await processWithUltravox(clientId, float32Buffer, session);
|
| 659 |
+
console.log(` 📝 Resposta: "${response}"`);
|
| 660 |
+
|
| 661 |
+
// Armazenar na memória de conversação
|
| 662 |
+
const conversationId = client.conversationId;
|
| 663 |
+
if (conversationId) {
|
| 664 |
+
conversationMemory.addMessage(conversationId, {
|
| 665 |
+
role: 'user',
|
| 666 |
+
content: '[Áudio processado]',
|
| 667 |
+
audioSize: float32Buffer.length,
|
| 668 |
+
timestamp: startTime
|
| 669 |
+
});
|
| 670 |
+
|
| 671 |
+
conversationMemory.addMessage(conversationId, {
|
| 672 |
+
role: 'assistant',
|
| 673 |
+
content: response,
|
| 674 |
+
latency: Date.now() - startTime
|
| 675 |
+
});
|
| 676 |
+
}
|
| 677 |
+
|
| 678 |
+
// Enviar transcrição primeiro
|
| 679 |
+
client.ws.send(JSON.stringify({
|
| 680 |
+
type: 'transcription',
|
| 681 |
+
text: response,
|
| 682 |
+
timestamp: Date.now()
|
| 683 |
+
}));
|
| 684 |
+
|
| 685 |
+
// Sintetizar áudio com TTS
|
| 686 |
+
const ttsResult = await synthesizeWithTTS(clientId, response, session);
|
| 687 |
+
const responseAudio = ttsResult.audioData;
|
| 688 |
+
console.log(` 🔊 Áudio sintetizado: ${responseAudio.length} bytes @ ${ttsResult.sampleRate}Hz`);
|
| 689 |
+
|
| 690 |
+
// Enviar áudio como JSON
|
| 691 |
+
client.ws.send(JSON.stringify({
|
| 692 |
+
type: 'audio',
|
| 693 |
+
data: responseAudio.toString('base64'),
|
| 694 |
+
format: 'pcm',
|
| 695 |
+
sampleRate: ttsResult.sampleRate || 16000,
|
| 696 |
+
isFinal: true
|
| 697 |
+
}));
|
| 698 |
+
|
| 699 |
+
const totalLatency = Date.now() - startTime;
|
| 700 |
+
console.log(`⏱️ Latência total: ${totalLatency}ms`);
|
| 701 |
+
|
| 702 |
+
// Enviar métricas
|
| 703 |
+
client.ws.send(JSON.stringify({
|
| 704 |
+
type: 'metrics',
|
| 705 |
+
latency: totalLatency,
|
| 706 |
+
response: response
|
| 707 |
+
}));
|
| 708 |
+
|
| 709 |
+
} catch (error) {
|
| 710 |
+
console.error('❌ Erro ao processar áudio Float32:', error);
|
| 711 |
+
client.ws.send(JSON.stringify({
|
| 712 |
+
type: 'error',
|
| 713 |
+
message: error.message
|
| 714 |
+
}));
|
| 715 |
+
} finally {
|
| 716 |
+
client.isProcessing = false;
|
| 717 |
+
}
|
| 718 |
+
}
|
| 719 |
+
|
| 720 |
// Processar dados PCM direto (sem conversão!)
|
| 721 |
async function handlePCMData(clientId, pcmBuffer) {
|
| 722 |
const client = clients.get(clientId);
|
|
|
|
| 790 |
});
|
| 791 |
}
|
| 792 |
|
| 793 |
+
// Enviar transcrição primeiro
|
| 794 |
+
client.ws.send(JSON.stringify({
|
| 795 |
+
type: 'transcription',
|
| 796 |
+
text: response,
|
| 797 |
+
timestamp: Date.now()
|
| 798 |
+
}));
|
| 799 |
+
|
| 800 |
// Sintetizar áudio com TTS
|
| 801 |
const ttsResult = await synthesizeWithTTS(clientId, response, session);
|
| 802 |
const responseAudio = ttsResult.audioData;
|
| 803 |
console.log(` 🔊 Áudio sintetizado: ${responseAudio.length} bytes @ ${ttsResult.sampleRate}Hz`);
|
| 804 |
|
| 805 |
+
// Enviar áudio como JSON
|
| 806 |
+
client.ws.send(JSON.stringify({
|
| 807 |
+
type: 'audio',
|
| 808 |
+
data: responseAudio.toString('base64'),
|
| 809 |
+
format: 'pcm',
|
| 810 |
+
sampleRate: ttsResult.sampleRate || 16000,
|
| 811 |
+
isFinal: true
|
| 812 |
+
}));
|
| 813 |
|
| 814 |
const totalLatency = Date.now() - startTime;
|
| 815 |
console.log(`⏱️ Latência total: ${totalLatency}ms`);
|
services/webrtc_gateway/ultravox-chat-tailwind.html
ADDED
|
@@ -0,0 +1,393 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="pt-BR">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>Ultravox Chat - Real-time Voice Assistant</title>
|
| 7 |
+
<script src="https://cdn.tailwindcss.com"></script>
|
| 8 |
+
<script src="opus-decoder.js"></script>
|
| 9 |
+
<script>
|
| 10 |
+
tailwind.config = {
|
| 11 |
+
theme: {
|
| 12 |
+
extend: {
|
| 13 |
+
animation: {
|
| 14 |
+
'pulse-slow': 'pulse 3s cubic-bezier(0.4, 0, 0.6, 1) infinite',
|
| 15 |
+
}
|
| 16 |
+
}
|
| 17 |
+
}
|
| 18 |
+
}
|
| 19 |
+
</script>
|
| 20 |
+
</head>
|
| 21 |
+
<body class="min-h-screen bg-gradient-to-br from-purple-600 via-purple-500 to-pink-500 p-4 flex items-center justify-center">
|
| 22 |
+
<div class="w-full max-w-2xl bg-white/95 backdrop-blur-sm rounded-2xl shadow-2xl p-6 md:p-8 space-y-6">
|
| 23 |
+
<!-- Header -->
|
| 24 |
+
<div class="text-center space-y-2">
|
| 25 |
+
<h1 class="text-3xl md:text-4xl font-bold bg-gradient-to-r from-purple-600 to-pink-600 bg-clip-text text-transparent">
|
| 26 |
+
Ultravox Chat
|
| 27 |
+
</h1>
|
| 28 |
+
<p class="text-gray-600 text-sm md:text-base">Real-time Voice Assistant</p>
|
| 29 |
+
</div>
|
| 30 |
+
|
| 31 |
+
<!-- Status Card -->
|
| 32 |
+
<div class="bg-gray-50 rounded-xl p-4 space-y-3">
|
| 33 |
+
<div class="flex items-center justify-between">
|
| 34 |
+
<span class="text-gray-700 font-medium">Connection Status</span>
|
| 35 |
+
<span id="status" class="inline-flex items-center px-3 py-1 rounded-full text-xs font-medium bg-gray-200 text-gray-800">
|
| 36 |
+
Disconnected
|
| 37 |
+
</span>
|
| 38 |
+
</div>
|
| 39 |
+
|
| 40 |
+
<!-- Voice Selection -->
|
| 41 |
+
<div class="flex flex-col sm:flex-row gap-3">
|
| 42 |
+
<div class="flex-1">
|
| 43 |
+
<label class="block text-sm font-medium text-gray-700 mb-1">Voice</label>
|
| 44 |
+
<select id="voiceSelect" class="w-full px-3 py-2 border border-gray-300 rounded-lg focus:ring-2 focus:ring-purple-500 focus:border-transparent transition">
|
| 45 |
+
<option value="pf_dora">Dora (Portuguese Female)</option>
|
| 46 |
+
<option value="pm_alex">Alex (Portuguese Male)</option>
|
| 47 |
+
<option value="pm_santa">Santa (Portuguese Male)</option>
|
| 48 |
+
</select>
|
| 49 |
+
</div>
|
| 50 |
+
</div>
|
| 51 |
+
</div>
|
| 52 |
+
|
| 53 |
+
<!-- Controls -->
|
| 54 |
+
<div class="space-y-4">
|
| 55 |
+
<!-- Connect Button -->
|
| 56 |
+
<button id="connectBtn"
|
| 57 |
+
class="w-full py-3 px-6 bg-gradient-to-r from-purple-600 to-pink-600 text-white font-semibold rounded-lg hover:shadow-lg transform hover:scale-[1.02] transition-all duration-200">
|
| 58 |
+
Connect to Server
|
| 59 |
+
</button>
|
| 60 |
+
|
| 61 |
+
<!-- Push to Talk Button -->
|
| 62 |
+
<button id="talkBtn"
|
| 63 |
+
disabled
|
| 64 |
+
class="w-full py-4 px-6 bg-gray-100 text-gray-400 font-semibold rounded-lg disabled:opacity-50 disabled:cursor-not-allowed transition-all duration-200 relative overflow-hidden group">
|
| 65 |
+
<span class="relative z-10 flex items-center justify-center gap-2">
|
| 66 |
+
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
| 67 |
+
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M19 11a7 7 0 01-7 7m0 0a7 7 0 01-7-7m7 7v4m0 0H8m4 0h4m-4-8a3 3 0 01-3-3V5a3 3 0 116 0v6a3 3 0 01-3 3z"></path>
|
| 68 |
+
</svg>
|
| 69 |
+
<span id="talkBtnText">Push to Talk</span>
|
| 70 |
+
</span>
|
| 71 |
+
<div class="absolute inset-0 bg-gradient-to-r from-purple-600 to-pink-600 transform scale-x-0 group-enabled:group-active:scale-x-100 transition-transform duration-200 origin-left"></div>
|
| 72 |
+
</button>
|
| 73 |
+
</div>
|
| 74 |
+
|
| 75 |
+
<!-- Activity Logs -->
|
| 76 |
+
<div class="bg-gray-50 rounded-xl p-4">
|
| 77 |
+
<h3 class="text-sm font-semibold text-gray-700 mb-3 flex items-center gap-2">
|
| 78 |
+
<svg class="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
| 79 |
+
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z"></path>
|
| 80 |
+
</svg>
|
| 81 |
+
Activity Log
|
| 82 |
+
</h3>
|
| 83 |
+
<div id="logs" class="space-y-2 max-h-40 overflow-y-auto text-xs text-gray-600 font-mono">
|
| 84 |
+
<div class="text-gray-400">Waiting for connection...</div>
|
| 85 |
+
</div>
|
| 86 |
+
</div>
|
| 87 |
+
|
| 88 |
+
<!-- Debug Info (Hidden by default) -->
|
| 89 |
+
<details class="bg-gray-50 rounded-xl p-4">
|
| 90 |
+
<summary class="cursor-pointer text-sm font-medium text-gray-700 hover:text-purple-600">
|
| 91 |
+
Debug Information
|
| 92 |
+
</summary>
|
| 93 |
+
<div class="mt-3 space-y-2 text-xs text-gray-600">
|
| 94 |
+
<div>Sample Rate: <span id="debugSampleRate" class="font-mono">24000 Hz</span></div>
|
| 95 |
+
<div>Buffer Size: <span id="debugBufferSize" class="font-mono">4096</span></div>
|
| 96 |
+
<div>Latency: <span id="debugLatency" class="font-mono">--</span></div>
|
| 97 |
+
</div>
|
| 98 |
+
</details>
|
| 99 |
+
</div>
|
| 100 |
+
|
| 101 |
+
<script>
|
| 102 |
+
const elements = {
|
| 103 |
+
status: document.getElementById('status'),
|
| 104 |
+
connectBtn: document.getElementById('connectBtn'),
|
| 105 |
+
talkBtn: document.getElementById('talkBtn'),
|
| 106 |
+
talkBtnText: document.getElementById('talkBtnText'),
|
| 107 |
+
logs: document.getElementById('logs'),
|
| 108 |
+
voiceSelect: document.getElementById('voiceSelect'),
|
| 109 |
+
debugLatency: document.getElementById('debugLatency')
|
| 110 |
+
};
|
| 111 |
+
|
| 112 |
+
let ws = null;
|
| 113 |
+
let audioContext = null;
|
| 114 |
+
let mediaStream = null;
|
| 115 |
+
let processor = null;
|
| 116 |
+
let isRecording = false;
|
| 117 |
+
let audioQueue = [];
|
| 118 |
+
let isPlaying = false;
|
| 119 |
+
let startTime = null;
|
| 120 |
+
|
| 121 |
+
function updateStatus(status, type = 'info') {
|
| 122 |
+
const statusClasses = {
|
| 123 |
+
'success': 'bg-green-100 text-green-800',
|
| 124 |
+
'error': 'bg-red-100 text-red-800',
|
| 125 |
+
'warning': 'bg-yellow-100 text-yellow-800',
|
| 126 |
+
'info': 'bg-blue-100 text-blue-800',
|
| 127 |
+
'default': 'bg-gray-200 text-gray-800'
|
| 128 |
+
};
|
| 129 |
+
|
| 130 |
+
elements.status.className = `inline-flex items-center px-3 py-1 rounded-full text-xs font-medium ${statusClasses[type] || statusClasses.default}`;
|
| 131 |
+
elements.status.textContent = status;
|
| 132 |
+
}
|
| 133 |
+
|
| 134 |
+
function log(message, type = 'info') {
|
| 135 |
+
const timestamp = new Date().toLocaleTimeString('pt-BR');
|
| 136 |
+
const colorClasses = {
|
| 137 |
+
'success': 'text-green-600',
|
| 138 |
+
'error': 'text-red-600',
|
| 139 |
+
'warning': 'text-yellow-600',
|
| 140 |
+
'info': 'text-blue-600'
|
| 141 |
+
};
|
| 142 |
+
|
| 143 |
+
const div = document.createElement('div');
|
| 144 |
+
div.className = colorClasses[type] || 'text-gray-600';
|
| 145 |
+
div.innerHTML = `<span class="text-gray-400">[${timestamp}]</span> ${message}`;
|
| 146 |
+
elements.logs.appendChild(div);
|
| 147 |
+
elements.logs.scrollTop = elements.logs.scrollHeight;
|
| 148 |
+
|
| 149 |
+
// Keep only last 50 logs
|
| 150 |
+
while (elements.logs.children.length > 50) {
|
| 151 |
+
elements.logs.removeChild(elements.logs.firstChild);
|
| 152 |
+
}
|
| 153 |
+
}
|
| 154 |
+
|
| 155 |
+
async function initAudioContext() {
|
| 156 |
+
if (!audioContext) {
|
| 157 |
+
audioContext = new (window.AudioContext || window.webkitAudioContext)({
|
| 158 |
+
sampleRate: 24000,
|
| 159 |
+
latencyHint: 'interactive'
|
| 160 |
+
});
|
| 161 |
+
log('Audio context initialized', 'success');
|
| 162 |
+
}
|
| 163 |
+
|
| 164 |
+
if (audioContext.state === 'suspended') {
|
| 165 |
+
await audioContext.resume();
|
| 166 |
+
}
|
| 167 |
+
}
|
| 168 |
+
|
| 169 |
+
async function playAudioChunk(audioData) {
|
| 170 |
+
if (!audioContext) return;
|
| 171 |
+
|
| 172 |
+
try {
|
| 173 |
+
const audioBuffer = audioContext.createBuffer(1, audioData.length, 24000);
|
| 174 |
+
audioBuffer.getChannelData(0).set(audioData);
|
| 175 |
+
|
| 176 |
+
const source = audioContext.createBufferSource();
|
| 177 |
+
source.buffer = audioBuffer;
|
| 178 |
+
source.connect(audioContext.destination);
|
| 179 |
+
|
| 180 |
+
return new Promise((resolve) => {
|
| 181 |
+
source.onended = resolve;
|
| 182 |
+
source.start();
|
| 183 |
+
});
|
| 184 |
+
} catch (error) {
|
| 185 |
+
console.error('Error playing audio:', error);
|
| 186 |
+
}
|
| 187 |
+
}
|
| 188 |
+
|
| 189 |
+
async function processAudioQueue() {
|
| 190 |
+
if (isPlaying || audioQueue.length === 0) return;
|
| 191 |
+
|
| 192 |
+
isPlaying = true;
|
| 193 |
+
while (audioQueue.length > 0) {
|
| 194 |
+
const audioData = audioQueue.shift();
|
| 195 |
+
await playAudioChunk(audioData);
|
| 196 |
+
}
|
| 197 |
+
isPlaying = false;
|
| 198 |
+
|
| 199 |
+
// Update latency
|
| 200 |
+
if (startTime) {
|
| 201 |
+
const latency = Date.now() - startTime;
|
| 202 |
+
elements.debugLatency.textContent = `${latency}ms`;
|
| 203 |
+
startTime = null;
|
| 204 |
+
}
|
| 205 |
+
}
|
| 206 |
+
|
| 207 |
+
function connectWebSocket() {
|
| 208 |
+
const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
|
| 209 |
+
const wsUrl = `${protocol}//${window.location.host}/ultravox`;
|
| 210 |
+
|
| 211 |
+
log(`Connecting to ${wsUrl}...`);
|
| 212 |
+
ws = new WebSocket(wsUrl);
|
| 213 |
+
ws.binaryType = 'arraybuffer';
|
| 214 |
+
|
| 215 |
+
ws.onopen = () => {
|
| 216 |
+
updateStatus('Connected', 'success');
|
| 217 |
+
log('WebSocket connected', 'success');
|
| 218 |
+
elements.connectBtn.textContent = 'Disconnect';
|
| 219 |
+
elements.connectBtn.classList.remove('from-purple-600', 'to-pink-600');
|
| 220 |
+
elements.connectBtn.classList.add('from-red-500', 'to-red-600');
|
| 221 |
+
elements.talkBtn.disabled = false;
|
| 222 |
+
elements.talkBtn.classList.remove('bg-gray-100', 'text-gray-400');
|
| 223 |
+
elements.talkBtn.classList.add('bg-white', 'text-purple-600', 'border', 'border-purple-300', 'hover:border-purple-400');
|
| 224 |
+
|
| 225 |
+
// Send selected voice immediately after connection
|
| 226 |
+
const currentVoice = elements.voiceSelect.value || 'pf_dora';
|
| 227 |
+
ws.send(JSON.stringify({
|
| 228 |
+
type: 'set-voice',
|
| 229 |
+
voice_id: currentVoice
|
| 230 |
+
}));
|
| 231 |
+
log(`Voice set to: ${currentVoice}`, 'info');
|
| 232 |
+
};
|
| 233 |
+
|
| 234 |
+
ws.onmessage = async (event) => {
|
| 235 |
+
if (event.data instanceof ArrayBuffer) {
|
| 236 |
+
const int16Array = new Int16Array(event.data);
|
| 237 |
+
const float32Array = new Float32Array(int16Array.length);
|
| 238 |
+
for (let i = 0; i < int16Array.length; i++) {
|
| 239 |
+
float32Array[i] = int16Array[i] / 32768.0;
|
| 240 |
+
}
|
| 241 |
+
|
| 242 |
+
audioQueue.push(float32Array);
|
| 243 |
+
processAudioQueue();
|
| 244 |
+
} else {
|
| 245 |
+
try {
|
| 246 |
+
const data = JSON.parse(event.data);
|
| 247 |
+
if (data.type === 'transcription') {
|
| 248 |
+
log(`Transcription: ${data.text}`, 'info');
|
| 249 |
+
} else if (data.type === 'response') {
|
| 250 |
+
log(`Response: ${data.text}`, 'success');
|
| 251 |
+
} else if (data.type === 'voice-changed') {
|
| 252 |
+
log(`Voice changed to: ${data.voice_id}`, 'info');
|
| 253 |
+
}
|
| 254 |
+
} catch (e) {
|
| 255 |
+
log(`Server: ${event.data}`, 'info');
|
| 256 |
+
}
|
| 257 |
+
}
|
| 258 |
+
};
|
| 259 |
+
|
| 260 |
+
ws.onerror = (error) => {
|
| 261 |
+
log('WebSocket error', 'error');
|
| 262 |
+
updateStatus('Error', 'error');
|
| 263 |
+
};
|
| 264 |
+
|
| 265 |
+
ws.onclose = () => {
|
| 266 |
+
updateStatus('Disconnected', 'default');
|
| 267 |
+
log('WebSocket disconnected', 'warning');
|
| 268 |
+
elements.connectBtn.textContent = 'Connect to Server';
|
| 269 |
+
elements.connectBtn.classList.remove('from-red-500', 'to-red-600');
|
| 270 |
+
elements.connectBtn.classList.add('from-purple-600', 'to-pink-600');
|
| 271 |
+
elements.talkBtn.disabled = true;
|
| 272 |
+
elements.talkBtn.classList.remove('bg-white', 'text-purple-600', 'border', 'border-purple-300', 'hover:border-purple-400');
|
| 273 |
+
elements.talkBtn.classList.add('bg-gray-100', 'text-gray-400');
|
| 274 |
+
ws = null;
|
| 275 |
+
};
|
| 276 |
+
}
|
| 277 |
+
|
| 278 |
+
async function startRecording() {
|
| 279 |
+
try {
|
| 280 |
+
await initAudioContext();
|
| 281 |
+
|
| 282 |
+
mediaStream = await navigator.mediaDevices.getUserMedia({
|
| 283 |
+
audio: {
|
| 284 |
+
channelCount: 1,
|
| 285 |
+
sampleRate: 24000,
|
| 286 |
+
echoCancellation: true,
|
| 287 |
+
noiseSuppression: true,
|
| 288 |
+
autoGainControl: true
|
| 289 |
+
}
|
| 290 |
+
});
|
| 291 |
+
|
| 292 |
+
const source = audioContext.createMediaStreamSource(mediaStream);
|
| 293 |
+
processor = audioContext.createScriptProcessor(4096, 1, 1);
|
| 294 |
+
|
| 295 |
+
processor.onaudioprocess = (e) => {
|
| 296 |
+
if (!isRecording) return;
|
| 297 |
+
|
| 298 |
+
const inputData = e.inputBuffer.getChannelData(0);
|
| 299 |
+
const pcmData = new Int16Array(inputData.length);
|
| 300 |
+
|
| 301 |
+
for (let i = 0; i < inputData.length; i++) {
|
| 302 |
+
const s = Math.max(-1, Math.min(1, inputData[i]));
|
| 303 |
+
pcmData[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
|
| 304 |
+
}
|
| 305 |
+
|
| 306 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 307 |
+
ws.send(pcmData.buffer);
|
| 308 |
+
}
|
| 309 |
+
};
|
| 310 |
+
|
| 311 |
+
source.connect(processor);
|
| 312 |
+
processor.connect(audioContext.destination);
|
| 313 |
+
|
| 314 |
+
isRecording = true;
|
| 315 |
+
startTime = Date.now();
|
| 316 |
+
elements.talkBtn.classList.add('animate-pulse-slow');
|
| 317 |
+
elements.talkBtn.querySelector('span').classList.add('text-white');
|
| 318 |
+
elements.talkBtnText.textContent = 'Recording... Release to send';
|
| 319 |
+
updateStatus('Recording', 'error');
|
| 320 |
+
log('Recording started', 'info');
|
| 321 |
+
|
| 322 |
+
} catch (error) {
|
| 323 |
+
console.error('Error starting recording:', error);
|
| 324 |
+
log('Failed to start recording', 'error');
|
| 325 |
+
}
|
| 326 |
+
}
|
| 327 |
+
|
| 328 |
+
function stopRecording() {
|
| 329 |
+
isRecording = false;
|
| 330 |
+
elements.talkBtn.classList.remove('animate-pulse-slow');
|
| 331 |
+
elements.talkBtn.querySelector('span').classList.remove('text-white');
|
| 332 |
+
elements.talkBtnText.textContent = 'Push to Talk';
|
| 333 |
+
updateStatus('Connected', 'success');
|
| 334 |
+
|
| 335 |
+
if (processor) {
|
| 336 |
+
processor.disconnect();
|
| 337 |
+
processor = null;
|
| 338 |
+
}
|
| 339 |
+
|
| 340 |
+
if (mediaStream) {
|
| 341 |
+
mediaStream.getTracks().forEach(track => track.stop());
|
| 342 |
+
mediaStream = null;
|
| 343 |
+
}
|
| 344 |
+
|
| 345 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 346 |
+
ws.send(JSON.stringify({ type: 'end_audio' }));
|
| 347 |
+
}
|
| 348 |
+
|
| 349 |
+
log('Recording stopped', 'info');
|
| 350 |
+
}
|
| 351 |
+
|
| 352 |
+
// Event Listeners
|
| 353 |
+
elements.connectBtn.addEventListener('click', () => {
|
| 354 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 355 |
+
ws.close();
|
| 356 |
+
} else {
|
| 357 |
+
connectWebSocket();
|
| 358 |
+
}
|
| 359 |
+
});
|
| 360 |
+
|
| 361 |
+
elements.talkBtn.addEventListener('mousedown', startRecording);
|
| 362 |
+
elements.talkBtn.addEventListener('mouseup', stopRecording);
|
| 363 |
+
elements.talkBtn.addEventListener('mouseleave', () => {
|
| 364 |
+
if (isRecording) stopRecording();
|
| 365 |
+
});
|
| 366 |
+
|
| 367 |
+
// Touch events for mobile
|
| 368 |
+
elements.talkBtn.addEventListener('touchstart', (e) => {
|
| 369 |
+
e.preventDefault();
|
| 370 |
+
startRecording();
|
| 371 |
+
});
|
| 372 |
+
elements.talkBtn.addEventListener('touchend', (e) => {
|
| 373 |
+
e.preventDefault();
|
| 374 |
+
stopRecording();
|
| 375 |
+
});
|
| 376 |
+
|
| 377 |
+
// Voice selection change
|
| 378 |
+
elements.voiceSelect.addEventListener('change', () => {
|
| 379 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 380 |
+
const voice = elements.voiceSelect.value;
|
| 381 |
+
ws.send(JSON.stringify({
|
| 382 |
+
type: 'set-voice',
|
| 383 |
+
voice_id: voice
|
| 384 |
+
}));
|
| 385 |
+
log(`Voice changed to: ${voice}`, 'info');
|
| 386 |
+
}
|
| 387 |
+
});
|
| 388 |
+
|
| 389 |
+
// Initialize
|
| 390 |
+
log('Application ready', 'success');
|
| 391 |
+
</script>
|
| 392 |
+
</body>
|
| 393 |
+
</html>
|
services/webrtc_gateway/ultravox-chat.html
ADDED
|
@@ -0,0 +1,964 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="pt-BR">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>Ultravox Chat PCM - Otimizado</title>
|
| 7 |
+
<script src="opus-decoder.js"></script>
|
| 8 |
+
<style>
|
| 9 |
+
* {
|
| 10 |
+
margin: 0;
|
| 11 |
+
padding: 0;
|
| 12 |
+
box-sizing: border-box;
|
| 13 |
+
}
|
| 14 |
+
|
| 15 |
+
body {
|
| 16 |
+
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, sans-serif;
|
| 17 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 18 |
+
min-height: 100vh;
|
| 19 |
+
display: flex;
|
| 20 |
+
justify-content: center;
|
| 21 |
+
align-items: center;
|
| 22 |
+
padding: 20px;
|
| 23 |
+
}
|
| 24 |
+
|
| 25 |
+
.container {
|
| 26 |
+
background: white;
|
| 27 |
+
border-radius: 20px;
|
| 28 |
+
box-shadow: 0 20px 60px rgba(0,0,0,0.3);
|
| 29 |
+
padding: 40px;
|
| 30 |
+
max-width: 600px;
|
| 31 |
+
width: 100%;
|
| 32 |
+
}
|
| 33 |
+
|
| 34 |
+
h1 {
|
| 35 |
+
text-align: center;
|
| 36 |
+
color: #333;
|
| 37 |
+
margin-bottom: 30px;
|
| 38 |
+
font-size: 28px;
|
| 39 |
+
}
|
| 40 |
+
|
| 41 |
+
.status {
|
| 42 |
+
background: #f8f9fa;
|
| 43 |
+
border-radius: 10px;
|
| 44 |
+
padding: 15px;
|
| 45 |
+
margin-bottom: 20px;
|
| 46 |
+
display: flex;
|
| 47 |
+
align-items: center;
|
| 48 |
+
justify-content: space-between;
|
| 49 |
+
}
|
| 50 |
+
|
| 51 |
+
.status-dot {
|
| 52 |
+
width: 12px;
|
| 53 |
+
height: 12px;
|
| 54 |
+
border-radius: 50%;
|
| 55 |
+
background: #dc3545;
|
| 56 |
+
margin-right: 10px;
|
| 57 |
+
display: inline-block;
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
.status-dot.connected {
|
| 61 |
+
background: #28a745;
|
| 62 |
+
animation: pulse 2s infinite;
|
| 63 |
+
}
|
| 64 |
+
|
| 65 |
+
@keyframes pulse {
|
| 66 |
+
0% { box-shadow: 0 0 0 0 rgba(40, 167, 69, 0.7); }
|
| 67 |
+
70% { box-shadow: 0 0 0 10px rgba(40, 167, 69, 0); }
|
| 68 |
+
100% { box-shadow: 0 0 0 0 rgba(40, 167, 69, 0); }
|
| 69 |
+
}
|
| 70 |
+
|
| 71 |
+
.controls {
|
| 72 |
+
display: flex;
|
| 73 |
+
gap: 10px;
|
| 74 |
+
margin-bottom: 20px;
|
| 75 |
+
}
|
| 76 |
+
|
| 77 |
+
.voice-selector {
|
| 78 |
+
display: flex;
|
| 79 |
+
align-items: center;
|
| 80 |
+
gap: 10px;
|
| 81 |
+
margin-bottom: 20px;
|
| 82 |
+
padding: 10px;
|
| 83 |
+
background: #f8f9fa;
|
| 84 |
+
border-radius: 10px;
|
| 85 |
+
}
|
| 86 |
+
|
| 87 |
+
.voice-selector label {
|
| 88 |
+
font-weight: 600;
|
| 89 |
+
color: #555;
|
| 90 |
+
}
|
| 91 |
+
|
| 92 |
+
.voice-selector select {
|
| 93 |
+
flex: 1;
|
| 94 |
+
padding: 8px;
|
| 95 |
+
border: 2px solid #ddd;
|
| 96 |
+
border-radius: 5px;
|
| 97 |
+
font-size: 14px;
|
| 98 |
+
background: white;
|
| 99 |
+
cursor: pointer;
|
| 100 |
+
}
|
| 101 |
+
|
| 102 |
+
.voice-selector select:focus {
|
| 103 |
+
outline: none;
|
| 104 |
+
border-color: #667eea;
|
| 105 |
+
}
|
| 106 |
+
|
| 107 |
+
button {
|
| 108 |
+
flex: 1;
|
| 109 |
+
padding: 15px;
|
| 110 |
+
border: none;
|
| 111 |
+
border-radius: 10px;
|
| 112 |
+
font-size: 16px;
|
| 113 |
+
font-weight: 600;
|
| 114 |
+
cursor: pointer;
|
| 115 |
+
transition: all 0.3s ease;
|
| 116 |
+
}
|
| 117 |
+
|
| 118 |
+
button:disabled {
|
| 119 |
+
opacity: 0.5;
|
| 120 |
+
cursor: not-allowed;
|
| 121 |
+
}
|
| 122 |
+
|
| 123 |
+
.btn-primary {
|
| 124 |
+
background: #007bff;
|
| 125 |
+
color: white;
|
| 126 |
+
}
|
| 127 |
+
|
| 128 |
+
.btn-primary:hover:not(:disabled) {
|
| 129 |
+
background: #0056b3;
|
| 130 |
+
transform: translateY(-2px);
|
| 131 |
+
box-shadow: 0 5px 15px rgba(0,123,255,0.3);
|
| 132 |
+
}
|
| 133 |
+
|
| 134 |
+
.btn-danger {
|
| 135 |
+
background: #dc3545;
|
| 136 |
+
color: white;
|
| 137 |
+
}
|
| 138 |
+
|
| 139 |
+
.btn-danger:hover:not(:disabled) {
|
| 140 |
+
background: #c82333;
|
| 141 |
+
}
|
| 142 |
+
|
| 143 |
+
.btn-success {
|
| 144 |
+
background: #28a745;
|
| 145 |
+
color: white;
|
| 146 |
+
}
|
| 147 |
+
|
| 148 |
+
.btn-success.recording {
|
| 149 |
+
background: #dc3545;
|
| 150 |
+
animation: recordPulse 1s infinite;
|
| 151 |
+
}
|
| 152 |
+
|
| 153 |
+
@keyframes recordPulse {
|
| 154 |
+
0%, 100% { opacity: 1; }
|
| 155 |
+
50% { opacity: 0.7; }
|
| 156 |
+
}
|
| 157 |
+
|
| 158 |
+
.metrics {
|
| 159 |
+
display: grid;
|
| 160 |
+
grid-template-columns: repeat(3, 1fr);
|
| 161 |
+
gap: 15px;
|
| 162 |
+
margin-bottom: 20px;
|
| 163 |
+
}
|
| 164 |
+
|
| 165 |
+
.metric {
|
| 166 |
+
background: #f8f9fa;
|
| 167 |
+
padding: 15px;
|
| 168 |
+
border-radius: 10px;
|
| 169 |
+
text-align: center;
|
| 170 |
+
}
|
| 171 |
+
|
| 172 |
+
.metric-label {
|
| 173 |
+
font-size: 12px;
|
| 174 |
+
color: #6c757d;
|
| 175 |
+
margin-bottom: 5px;
|
| 176 |
+
}
|
| 177 |
+
|
| 178 |
+
.metric-value {
|
| 179 |
+
font-size: 24px;
|
| 180 |
+
font-weight: bold;
|
| 181 |
+
color: #333;
|
| 182 |
+
}
|
| 183 |
+
|
| 184 |
+
.log {
|
| 185 |
+
background: #f8f9fa;
|
| 186 |
+
border-radius: 10px;
|
| 187 |
+
padding: 20px;
|
| 188 |
+
height: 300px;
|
| 189 |
+
overflow-y: auto;
|
| 190 |
+
font-family: 'Monaco', 'Menlo', monospace;
|
| 191 |
+
font-size: 12px;
|
| 192 |
+
}
|
| 193 |
+
|
| 194 |
+
.log-entry {
|
| 195 |
+
padding: 5px 0;
|
| 196 |
+
border-bottom: 1px solid #e9ecef;
|
| 197 |
+
display: flex;
|
| 198 |
+
align-items: flex-start;
|
| 199 |
+
}
|
| 200 |
+
|
| 201 |
+
.log-time {
|
| 202 |
+
color: #6c757d;
|
| 203 |
+
margin-right: 10px;
|
| 204 |
+
flex-shrink: 0;
|
| 205 |
+
}
|
| 206 |
+
|
| 207 |
+
.log-message {
|
| 208 |
+
flex: 1;
|
| 209 |
+
}
|
| 210 |
+
|
| 211 |
+
.log-entry.error { color: #dc3545; }
|
| 212 |
+
.log-entry.success { color: #28a745; }
|
| 213 |
+
.log-entry.info { color: #007bff; }
|
| 214 |
+
.log-entry.warning { color: #ffc107; }
|
| 215 |
+
|
| 216 |
+
.audio-player {
|
| 217 |
+
display: inline-flex;
|
| 218 |
+
align-items: center;
|
| 219 |
+
gap: 10px;
|
| 220 |
+
margin-left: 10px;
|
| 221 |
+
}
|
| 222 |
+
|
| 223 |
+
.play-btn {
|
| 224 |
+
background: #007bff;
|
| 225 |
+
color: white;
|
| 226 |
+
border: none;
|
| 227 |
+
border-radius: 5px;
|
| 228 |
+
padding: 5px 10px;
|
| 229 |
+
cursor: pointer;
|
| 230 |
+
font-size: 12px;
|
| 231 |
+
}
|
| 232 |
+
|
| 233 |
+
.play-btn:hover {
|
| 234 |
+
background: #0056b3;
|
| 235 |
+
}
|
| 236 |
+
</style>
|
| 237 |
+
</head>
|
| 238 |
+
<body>
|
| 239 |
+
<div class="container">
|
| 240 |
+
<h1>🚀 Ultravox PCM - Otimizado</h1>
|
| 241 |
+
|
| 242 |
+
<div class="status">
|
| 243 |
+
<div>
|
| 244 |
+
<span class="status-dot" id="statusDot"></span>
|
| 245 |
+
<span id="statusText">Desconectado</span>
|
| 246 |
+
</div>
|
| 247 |
+
<span id="latencyText">Latência: --ms</span>
|
| 248 |
+
</div>
|
| 249 |
+
|
| 250 |
+
<div class="voice-selector">
|
| 251 |
+
<label for="voiceSelect">🔊 Voz TTS:</label>
|
| 252 |
+
<select id="voiceSelect">
|
| 253 |
+
<option value="pf_dora" selected>🇧🇷 [pf_dora] Português Feminino (Dora)</option>
|
| 254 |
+
<option value="pm_alex">🇧🇷 [pm_alex] Português Masculino (Alex)</option>
|
| 255 |
+
<option value="af_heart">🌍 [af_heart] Alternativa Feminina (Heart)</option>
|
| 256 |
+
<option value="af_bella">🌍 [af_bella] Alternativa Feminina (Bella)</option>
|
| 257 |
+
</select>
|
| 258 |
+
</div>
|
| 259 |
+
|
| 260 |
+
<div class="controls">
|
| 261 |
+
<button id="connectBtn" class="btn-primary">Conectar</button>
|
| 262 |
+
<button id="talkBtn" class="btn-success" disabled>Push to Talk</button>
|
| 263 |
+
</div>
|
| 264 |
+
|
| 265 |
+
<div class="metrics">
|
| 266 |
+
<div class="metric">
|
| 267 |
+
<div class="metric-label">Enviado</div>
|
| 268 |
+
<div class="metric-value" id="sentBytes">0 KB</div>
|
| 269 |
+
</div>
|
| 270 |
+
<div class="metric">
|
| 271 |
+
<div class="metric-label">Recebido</div>
|
| 272 |
+
<div class="metric-value" id="receivedBytes">0 KB</div>
|
| 273 |
+
</div>
|
| 274 |
+
<div class="metric">
|
| 275 |
+
<div class="metric-label">Formato</div>
|
| 276 |
+
<div class="metric-value" id="format">PCM</div>
|
| 277 |
+
</div>
|
| 278 |
+
<div class="metric">
|
| 279 |
+
<div class="metric-label">🎤 Voz</div>
|
| 280 |
+
<div class="metric-value" id="currentVoice" style="font-family: monospace; color: #4CAF50; font-weight: bold;">pf_dora</div>
|
| 281 |
+
</div>
|
| 282 |
+
</div>
|
| 283 |
+
|
| 284 |
+
<div class="log" id="log"></div>
|
| 285 |
+
</div>
|
| 286 |
+
|
| 287 |
+
<!-- Seção TTS Direto -->
|
| 288 |
+
<div class="container" style="margin-top: 20px;">
|
| 289 |
+
<h2>🎵 Text-to-Speech Direto</h2>
|
| 290 |
+
<p>Digite ou edite o texto abaixo e escolha uma voz para converter em áudio</p>
|
| 291 |
+
|
| 292 |
+
<div class="section">
|
| 293 |
+
<textarea id="ttsText" style="width: 100%; height: 120px; padding: 10px; border: 1px solid #333; border-radius: 8px; background: #1e1e1e; color: #e0e0e0; font-family: 'Segoe UI', system-ui, sans-serif; font-size: 14px; resize: vertical;">Olá! Teste de voz.</textarea>
|
| 294 |
+
</div>
|
| 295 |
+
|
| 296 |
+
<div class="section" style="display: flex; gap: 10px; align-items: center; margin-top: 15px;">
|
| 297 |
+
<label for="ttsVoiceSelect" style="font-weight: 600;">🔊 Voz:</label>
|
| 298 |
+
<select id="ttsVoiceSelect" style="flex: 1; padding: 8px; border: 1px solid #333; border-radius: 5px; background: #2a2a2a; color: #e0e0e0;">
|
| 299 |
+
<optgroup label="🇧🇷 Português">
|
| 300 |
+
<option value="pf_dora" selected>[pf_dora] Feminino - Dora</option>
|
| 301 |
+
<option value="pm_alex">[pm_alex] Masculino - Alex</option>
|
| 302 |
+
<option value="pm_santa">[pm_santa] Masculino - Santa (Festivo)</option>
|
| 303 |
+
</optgroup>
|
| 304 |
+
<optgroup label="🇫🇷 Francês">
|
| 305 |
+
<option value="ff_siwis">[ff_siwis] Feminino - Siwis (Nativa)</option>
|
| 306 |
+
</optgroup>
|
| 307 |
+
<optgroup label="🇺🇸 Inglês Americano">
|
| 308 |
+
<option value="af_alloy">Feminino - Alloy</option>
|
| 309 |
+
<option value="af_aoede">Feminino - Aoede</option>
|
| 310 |
+
<option value="af_bella">Feminino - Bella</option>
|
| 311 |
+
<option value="af_heart">Feminino - Heart</option>
|
| 312 |
+
<option value="af_jessica">Feminino - Jessica</option>
|
| 313 |
+
<option value="af_kore">Feminino - Kore</option>
|
| 314 |
+
<option value="af_nicole">Feminino - Nicole</option>
|
| 315 |
+
<option value="af_nova">Feminino - Nova</option>
|
| 316 |
+
<option value="af_river">Feminino - River</option>
|
| 317 |
+
<option value="af_sarah">Feminino - Sarah</option>
|
| 318 |
+
<option value="af_sky">Feminino - Sky</option>
|
| 319 |
+
<option value="am_adam">Masculino - Adam</option>
|
| 320 |
+
<option value="am_echo">Masculino - Echo</option>
|
| 321 |
+
<option value="am_eric">Masculino - Eric</option>
|
| 322 |
+
<option value="am_fenrir">Masculino - Fenrir</option>
|
| 323 |
+
<option value="am_liam">Masculino - Liam</option>
|
| 324 |
+
<option value="am_michael">Masculino - Michael</option>
|
| 325 |
+
<option value="am_onyx">Masculino - Onyx</option>
|
| 326 |
+
<option value="am_puck">Masculino - Puck</option>
|
| 327 |
+
<option value="am_santa">Masculino - Santa</option>
|
| 328 |
+
</optgroup>
|
| 329 |
+
<optgroup label="🇬🇧 Inglês Britânico">
|
| 330 |
+
<option value="bf_alice">Feminino - Alice</option>
|
| 331 |
+
<option value="bf_emma">Feminino - Emma</option>
|
| 332 |
+
<option value="bf_isabella">Feminino - Isabella</option>
|
| 333 |
+
<option value="bf_lily">Feminino - Lily</option>
|
| 334 |
+
<option value="bm_daniel">Masculino - Daniel</option>
|
| 335 |
+
<option value="bm_fable">Masculino - Fable</option>
|
| 336 |
+
<option value="bm_george">Masculino - George</option>
|
| 337 |
+
<option value="bm_lewis">Masculino - Lewis</option>
|
| 338 |
+
</optgroup>
|
| 339 |
+
<optgroup label="🇪🇸 Espanhol">
|
| 340 |
+
<option value="ef_dora">Feminino - Dora</option>
|
| 341 |
+
<option value="em_alex">Masculino - Alex</option>
|
| 342 |
+
<option value="em_santa">Masculino - Santa</option>
|
| 343 |
+
</optgroup>
|
| 344 |
+
<optgroup label="🇮🇹 Italiano">
|
| 345 |
+
<option value="if_sara">Feminino - Sara</option>
|
| 346 |
+
<option value="im_nicola">Masculino - Nicola</option>
|
| 347 |
+
</optgroup>
|
| 348 |
+
<optgroup label="🇯🇵 Japonês">
|
| 349 |
+
<option value="jf_alpha">Feminino - Alpha</option>
|
| 350 |
+
<option value="jf_gongitsune">Feminino - Gongitsune</option>
|
| 351 |
+
<option value="jf_nezumi">Feminino - Nezumi</option>
|
| 352 |
+
<option value="jf_tebukuro">Feminino - Tebukuro</option>
|
| 353 |
+
<option value="jm_kumo">Masculino - Kumo</option>
|
| 354 |
+
</optgroup>
|
| 355 |
+
<optgroup label="🇨🇳 Chinês">
|
| 356 |
+
<option value="zf_xiaobei">Feminino - Xiaobei</option>
|
| 357 |
+
<option value="zf_xiaoni">Feminino - Xiaoni</option>
|
| 358 |
+
<option value="zf_xiaoxiao">Feminino - Xiaoxiao</option>
|
| 359 |
+
<option value="zf_xiaoyi">Feminino - Xiaoyi</option>
|
| 360 |
+
<option value="zm_yunjian">Masculino - Yunjian</option>
|
| 361 |
+
<option value="zm_yunxi">Masculino - Yunxi</option>
|
| 362 |
+
<option value="zm_yunxia">Masculino - Yunxia</option>
|
| 363 |
+
<option value="zm_yunyang">Masculino - Yunyang</option>
|
| 364 |
+
</optgroup>
|
| 365 |
+
<optgroup label="🇮🇳 Hindi">
|
| 366 |
+
<option value="hf_alpha">Feminino - Alpha</option>
|
| 367 |
+
<option value="hf_beta">Feminino - Beta</option>
|
| 368 |
+
<option value="hm_omega">Masculino - Omega</option>
|
| 369 |
+
<option value="hm_psi">Masculino - Psi</option>
|
| 370 |
+
</optgroup>
|
| 371 |
+
</select>
|
| 372 |
+
|
| 373 |
+
<button id="ttsPlayBtn" class="btn-success" disabled style="padding: 10px 20px;">
|
| 374 |
+
▶️ Gerar Áudio
|
| 375 |
+
</button>
|
| 376 |
+
</div>
|
| 377 |
+
|
| 378 |
+
<div id="ttsStatus" style="display: none; margin-top: 15px; padding: 15px; background: #2a2a2a; border-radius: 8px;">
|
| 379 |
+
<span id="ttsStatusText">⏳ Processando...</span>
|
| 380 |
+
</div>
|
| 381 |
+
|
| 382 |
+
<div id="ttsPlayer" style="display: none; margin-top: 15px;">
|
| 383 |
+
<audio id="ttsAudio" controls style="width: 100%;"></audio>
|
| 384 |
+
</div>
|
| 385 |
+
</div>
|
| 386 |
+
|
| 387 |
+
<script>
|
| 388 |
+
// Estado da aplicação
|
| 389 |
+
let ws = null;
|
| 390 |
+
let isConnected = false;
|
| 391 |
+
let isRecording = false;
|
| 392 |
+
let audioContext = null;
|
| 393 |
+
let stream = null;
|
| 394 |
+
let audioSource = null;
|
| 395 |
+
let audioProcessor = null;
|
| 396 |
+
let pcmBuffer = [];
|
| 397 |
+
|
| 398 |
+
// Métricas
|
| 399 |
+
const metrics = {
|
| 400 |
+
sentBytes: 0,
|
| 401 |
+
receivedBytes: 0,
|
| 402 |
+
latency: 0,
|
| 403 |
+
recordingStartTime: 0
|
| 404 |
+
};
|
| 405 |
+
|
| 406 |
+
// Elementos DOM
|
| 407 |
+
const elements = {
|
| 408 |
+
statusDot: document.getElementById('statusDot'),
|
| 409 |
+
statusText: document.getElementById('statusText'),
|
| 410 |
+
latencyText: document.getElementById('latencyText'),
|
| 411 |
+
connectBtn: document.getElementById('connectBtn'),
|
| 412 |
+
talkBtn: document.getElementById('talkBtn'),
|
| 413 |
+
voiceSelect: document.getElementById('voiceSelect'),
|
| 414 |
+
sentBytes: document.getElementById('sentBytes'),
|
| 415 |
+
receivedBytes: document.getElementById('receivedBytes'),
|
| 416 |
+
format: document.getElementById('format'),
|
| 417 |
+
log: document.getElementById('log'),
|
| 418 |
+
// TTS elements
|
| 419 |
+
ttsText: document.getElementById('ttsText'),
|
| 420 |
+
ttsVoiceSelect: document.getElementById('ttsVoiceSelect'),
|
| 421 |
+
ttsPlayBtn: document.getElementById('ttsPlayBtn'),
|
| 422 |
+
ttsStatus: document.getElementById('ttsStatus'),
|
| 423 |
+
ttsStatusText: document.getElementById('ttsStatusText'),
|
| 424 |
+
ttsPlayer: document.getElementById('ttsPlayer'),
|
| 425 |
+
ttsAudio: document.getElementById('ttsAudio')
|
| 426 |
+
};
|
| 427 |
+
|
| 428 |
+
// Log no console visual
|
| 429 |
+
function log(message, type = 'info') {
|
| 430 |
+
const time = new Date().toLocaleTimeString('pt-BR');
|
| 431 |
+
const entry = document.createElement('div');
|
| 432 |
+
entry.className = `log-entry ${type}`;
|
| 433 |
+
entry.innerHTML = `
|
| 434 |
+
<span class="log-time">[${time}]</span>
|
| 435 |
+
<span class="log-message">${message}</span>
|
| 436 |
+
`;
|
| 437 |
+
elements.log.appendChild(entry);
|
| 438 |
+
elements.log.scrollTop = elements.log.scrollHeight;
|
| 439 |
+
console.log(`[${type}] ${message}`);
|
| 440 |
+
}
|
| 441 |
+
|
| 442 |
+
// Atualizar métricas
|
| 443 |
+
function updateMetrics() {
|
| 444 |
+
elements.sentBytes.textContent = `${(metrics.sentBytes / 1024).toFixed(1)} KB`;
|
| 445 |
+
elements.receivedBytes.textContent = `${(metrics.receivedBytes / 1024).toFixed(1)} KB`;
|
| 446 |
+
elements.latencyText.textContent = `Latência: ${metrics.latency}ms`;
|
| 447 |
+
}
|
| 448 |
+
|
| 449 |
+
// Conectar ao WebSocket
|
| 450 |
+
async function connect() {
|
| 451 |
+
try {
|
| 452 |
+
// Solicitar acesso ao microfone
|
| 453 |
+
stream = await navigator.mediaDevices.getUserMedia({
|
| 454 |
+
audio: {
|
| 455 |
+
echoCancellation: true,
|
| 456 |
+
noiseSuppression: true,
|
| 457 |
+
sampleRate: 24000 // High quality 24kHz
|
| 458 |
+
}
|
| 459 |
+
});
|
| 460 |
+
|
| 461 |
+
log('✅ Microfone acessado', 'success');
|
| 462 |
+
|
| 463 |
+
// Conectar WebSocket com suporte binário
|
| 464 |
+
const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
|
| 465 |
+
const wsUrl = `${protocol}//${window.location.host}/ws`;
|
| 466 |
+
ws = new WebSocket(wsUrl);
|
| 467 |
+
ws.binaryType = 'arraybuffer';
|
| 468 |
+
|
| 469 |
+
ws.onopen = () => {
|
| 470 |
+
isConnected = true;
|
| 471 |
+
elements.statusDot.classList.add('connected');
|
| 472 |
+
elements.statusText.textContent = 'Conectado';
|
| 473 |
+
elements.connectBtn.textContent = 'Desconectar';
|
| 474 |
+
elements.connectBtn.classList.remove('btn-primary');
|
| 475 |
+
elements.connectBtn.classList.add('btn-danger');
|
| 476 |
+
elements.talkBtn.disabled = false;
|
| 477 |
+
|
| 478 |
+
// Enviar voz selecionada ao conectar
|
| 479 |
+
const currentVoice = elements.voiceSelect.value || elements.ttsVoiceSelect.value || 'pf_dora';
|
| 480 |
+
ws.send(JSON.stringify({
|
| 481 |
+
type: 'set-voice',
|
| 482 |
+
voice_id: currentVoice
|
| 483 |
+
}));
|
| 484 |
+
log(`🔊 Voz configurada: ${currentVoice}`, 'info');
|
| 485 |
+
elements.ttsPlayBtn.disabled = false; // Habilitar TTS button
|
| 486 |
+
log('✅ Conectado ao servidor', 'success');
|
| 487 |
+
};
|
| 488 |
+
|
| 489 |
+
ws.onmessage = (event) => {
|
| 490 |
+
if (event.data instanceof ArrayBuffer) {
|
| 491 |
+
// Áudio PCM binário recebido
|
| 492 |
+
handlePCMAudio(event.data);
|
| 493 |
+
} else {
|
| 494 |
+
// Mensagem JSON
|
| 495 |
+
const data = JSON.parse(event.data);
|
| 496 |
+
handleMessage(data);
|
| 497 |
+
}
|
| 498 |
+
};
|
| 499 |
+
|
| 500 |
+
ws.onerror = (error) => {
|
| 501 |
+
log(`❌ Erro WebSocket: ${error}`, 'error');
|
| 502 |
+
};
|
| 503 |
+
|
| 504 |
+
ws.onclose = () => {
|
| 505 |
+
disconnect();
|
| 506 |
+
};
|
| 507 |
+
|
| 508 |
+
} catch (error) {
|
| 509 |
+
log(`❌ Erro ao conectar: ${error.message}`, 'error');
|
| 510 |
+
}
|
| 511 |
+
}
|
| 512 |
+
|
| 513 |
+
// Desconectar
|
| 514 |
+
function disconnect() {
|
| 515 |
+
isConnected = false;
|
| 516 |
+
|
| 517 |
+
if (ws) {
|
| 518 |
+
ws.close();
|
| 519 |
+
ws = null;
|
| 520 |
+
}
|
| 521 |
+
|
| 522 |
+
if (stream) {
|
| 523 |
+
stream.getTracks().forEach(track => track.stop());
|
| 524 |
+
stream = null;
|
| 525 |
+
}
|
| 526 |
+
|
| 527 |
+
if (audioContext) {
|
| 528 |
+
audioContext.close();
|
| 529 |
+
audioContext = null;
|
| 530 |
+
}
|
| 531 |
+
|
| 532 |
+
elements.statusDot.classList.remove('connected');
|
| 533 |
+
elements.statusText.textContent = 'Desconectado';
|
| 534 |
+
elements.connectBtn.textContent = 'Conectar';
|
| 535 |
+
elements.connectBtn.classList.remove('btn-danger');
|
| 536 |
+
elements.connectBtn.classList.add('btn-primary');
|
| 537 |
+
elements.talkBtn.disabled = true;
|
| 538 |
+
|
| 539 |
+
log('👋 Desconectado', 'warning');
|
| 540 |
+
}
|
| 541 |
+
|
| 542 |
+
// Iniciar gravação PCM
|
| 543 |
+
function startRecording() {
|
| 544 |
+
if (isRecording) return;
|
| 545 |
+
|
| 546 |
+
isRecording = true;
|
| 547 |
+
metrics.recordingStartTime = Date.now();
|
| 548 |
+
elements.talkBtn.classList.add('recording');
|
| 549 |
+
elements.talkBtn.textContent = 'Gravando...';
|
| 550 |
+
pcmBuffer = [];
|
| 551 |
+
|
| 552 |
+
const sampleRate = 24000; // Sempre usar melhor qualidade
|
| 553 |
+
log(`🎤 Gravando PCM 16-bit @ ${sampleRate}Hz (alta qualidade)`, 'info');
|
| 554 |
+
|
| 555 |
+
// Criar AudioContext se necessário
|
| 556 |
+
if (!audioContext) {
|
| 557 |
+
// Sempre usar melhor qualidade (24kHz)
|
| 558 |
+
const sampleRate = 24000;
|
| 559 |
+
|
| 560 |
+
audioContext = new (window.AudioContext || window.webkitAudioContext)({
|
| 561 |
+
sampleRate: sampleRate
|
| 562 |
+
});
|
| 563 |
+
|
| 564 |
+
log(`🎧 AudioContext criado: ${sampleRate}Hz (alta qualidade)`, 'info');
|
| 565 |
+
}
|
| 566 |
+
|
| 567 |
+
// Criar processador de áudio
|
| 568 |
+
audioSource = audioContext.createMediaStreamSource(stream);
|
| 569 |
+
audioProcessor = audioContext.createScriptProcessor(4096, 1, 1);
|
| 570 |
+
|
| 571 |
+
audioProcessor.onaudioprocess = (e) => {
|
| 572 |
+
if (!isRecording) return;
|
| 573 |
+
|
| 574 |
+
const inputData = e.inputBuffer.getChannelData(0);
|
| 575 |
+
|
| 576 |
+
// Calcular RMS (Root Mean Square) para melhor detecção de volume
|
| 577 |
+
let sumSquares = 0;
|
| 578 |
+
for (let i = 0; i < inputData.length; i++) {
|
| 579 |
+
sumSquares += inputData[i] * inputData[i];
|
| 580 |
+
}
|
| 581 |
+
const rms = Math.sqrt(sumSquares / inputData.length);
|
| 582 |
+
|
| 583 |
+
// Calcular amplitude máxima também
|
| 584 |
+
let maxAmplitude = 0;
|
| 585 |
+
for (let i = 0; i < inputData.length; i++) {
|
| 586 |
+
maxAmplitude = Math.max(maxAmplitude, Math.abs(inputData[i]));
|
| 587 |
+
}
|
| 588 |
+
|
| 589 |
+
// Detecção de voz baseada em RMS (mais confiável que amplitude máxima)
|
| 590 |
+
const voiceThreshold = 0.01; // Threshold para detectar voz
|
| 591 |
+
const hasVoice = rms > voiceThreshold;
|
| 592 |
+
|
| 593 |
+
// Aplicar ganho suave apenas se necessário
|
| 594 |
+
let gain = 1.0;
|
| 595 |
+
if (hasVoice && rms < 0.05) {
|
| 596 |
+
// Ganho suave baseado em RMS, máximo 5x
|
| 597 |
+
gain = Math.min(5.0, 0.05 / rms);
|
| 598 |
+
if (gain > 1.2) {
|
| 599 |
+
log(`🎤 Volume baixo detectado, aplicando ganho: ${gain.toFixed(1)}x`, 'info');
|
| 600 |
+
}
|
| 601 |
+
}
|
| 602 |
+
|
| 603 |
+
// Converter Float32 para Int16 com processamento melhorado
|
| 604 |
+
const pcmData = new Int16Array(inputData.length);
|
| 605 |
+
for (let i = 0; i < inputData.length; i++) {
|
| 606 |
+
// Aplicar ganho suave
|
| 607 |
+
let sample = inputData[i] * gain;
|
| 608 |
+
|
| 609 |
+
// Soft clipping para evitar distorção
|
| 610 |
+
if (Math.abs(sample) > 0.95) {
|
| 611 |
+
sample = Math.sign(sample) * (0.95 + 0.05 * Math.tanh((Math.abs(sample) - 0.95) * 10));
|
| 612 |
+
}
|
| 613 |
+
|
| 614 |
+
// Converter para Int16
|
| 615 |
+
sample = Math.max(-1, Math.min(1, sample));
|
| 616 |
+
pcmData[i] = sample < 0 ? sample * 0x8000 : sample * 0x7FFF;
|
| 617 |
+
}
|
| 618 |
+
|
| 619 |
+
// Adicionar ao buffer apenas se detectar voz
|
| 620 |
+
if (hasVoice) {
|
| 621 |
+
pcmBuffer.push(pcmData);
|
| 622 |
+
}
|
| 623 |
+
};
|
| 624 |
+
|
| 625 |
+
audioSource.connect(audioProcessor);
|
| 626 |
+
audioProcessor.connect(audioContext.destination);
|
| 627 |
+
}
|
| 628 |
+
|
| 629 |
+
// Parar gravação e enviar
|
| 630 |
+
function stopRecording() {
|
| 631 |
+
if (!isRecording) return;
|
| 632 |
+
|
| 633 |
+
isRecording = false;
|
| 634 |
+
const duration = Date.now() - metrics.recordingStartTime;
|
| 635 |
+
elements.talkBtn.classList.remove('recording');
|
| 636 |
+
elements.talkBtn.textContent = 'Push to Talk';
|
| 637 |
+
|
| 638 |
+
// Desconectar processador
|
| 639 |
+
if (audioProcessor) {
|
| 640 |
+
audioProcessor.disconnect();
|
| 641 |
+
audioProcessor = null;
|
| 642 |
+
}
|
| 643 |
+
if (audioSource) {
|
| 644 |
+
audioSource.disconnect();
|
| 645 |
+
audioSource = null;
|
| 646 |
+
}
|
| 647 |
+
|
| 648 |
+
// Verificar se há áudio para enviar
|
| 649 |
+
if (pcmBuffer.length === 0) {
|
| 650 |
+
log(`⚠️ Nenhum áudio capturado (silêncio ou volume muito baixo)`, 'warning');
|
| 651 |
+
pcmBuffer = [];
|
| 652 |
+
return;
|
| 653 |
+
}
|
| 654 |
+
|
| 655 |
+
// Combinar todos os chunks PCM
|
| 656 |
+
const totalLength = pcmBuffer.reduce((acc, chunk) => acc + chunk.length, 0);
|
| 657 |
+
|
| 658 |
+
// Verificar tamanho mínimo (0.5 segundos)
|
| 659 |
+
const sampleRate = 24000; // Sempre 24kHz
|
| 660 |
+
const minSamples = sampleRate * 0.5;
|
| 661 |
+
|
| 662 |
+
if (totalLength < minSamples) {
|
| 663 |
+
log(`⚠️ Áudio muito curto: ${(totalLength/sampleRate).toFixed(2)}s (mínimo 0.5s)`, 'warning');
|
| 664 |
+
pcmBuffer = [];
|
| 665 |
+
return;
|
| 666 |
+
}
|
| 667 |
+
|
| 668 |
+
const fullPCM = new Int16Array(totalLength);
|
| 669 |
+
let offset = 0;
|
| 670 |
+
for (const chunk of pcmBuffer) {
|
| 671 |
+
fullPCM.set(chunk, offset);
|
| 672 |
+
offset += chunk.length;
|
| 673 |
+
}
|
| 674 |
+
|
| 675 |
+
// Calcular amplitude final para debug
|
| 676 |
+
let maxAmp = 0;
|
| 677 |
+
for (let i = 0; i < Math.min(fullPCM.length, 1000); i++) {
|
| 678 |
+
maxAmp = Math.max(maxAmp, Math.abs(fullPCM[i] / 32768));
|
| 679 |
+
}
|
| 680 |
+
|
| 681 |
+
// Enviar PCM binário direto (sem Base64!)
|
| 682 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 683 |
+
// Enviar um header simples antes do áudio
|
| 684 |
+
const header = new ArrayBuffer(8);
|
| 685 |
+
const view = new DataView(header);
|
| 686 |
+
view.setUint32(0, 0x50434D16); // Magic: "PCM16"
|
| 687 |
+
view.setUint32(4, fullPCM.length * 2); // Tamanho em bytes
|
| 688 |
+
|
| 689 |
+
ws.send(header);
|
| 690 |
+
ws.send(fullPCM.buffer);
|
| 691 |
+
|
| 692 |
+
metrics.sentBytes += fullPCM.length * 2;
|
| 693 |
+
updateMetrics();
|
| 694 |
+
const sampleRate = 24000; // Sempre 24kHz
|
| 695 |
+
log(`📤 PCM enviado: ${(fullPCM.length * 2 / 1024).toFixed(1)}KB, ${(totalLength/sampleRate).toFixed(1)}s @ ${sampleRate}Hz, amp:${maxAmp.toFixed(3)}`, 'success');
|
| 696 |
+
}
|
| 697 |
+
|
| 698 |
+
// Limpar buffer após enviar
|
| 699 |
+
pcmBuffer = [];
|
| 700 |
+
}
|
| 701 |
+
|
| 702 |
+
// Processar mensagem JSON
|
| 703 |
+
function handleMessage(data) {
|
| 704 |
+
switch (data.type) {
|
| 705 |
+
case 'metrics':
|
| 706 |
+
metrics.latency = data.latency;
|
| 707 |
+
updateMetrics();
|
| 708 |
+
log(`📊 Resposta: "${data.response}" (${data.latency}ms)`, 'success');
|
| 709 |
+
break;
|
| 710 |
+
|
| 711 |
+
case 'error':
|
| 712 |
+
log(`❌ Erro: ${data.message}`, 'error');
|
| 713 |
+
break;
|
| 714 |
+
|
| 715 |
+
case 'tts-response':
|
| 716 |
+
// Resposta do TTS direto (Opus 24kHz ou PCM)
|
| 717 |
+
if (data.audio) {
|
| 718 |
+
// Decodificar base64 para arraybuffer
|
| 719 |
+
const binaryString = atob(data.audio);
|
| 720 |
+
const bytes = new Uint8Array(binaryString.length);
|
| 721 |
+
for (let i = 0; i < binaryString.length; i++) {
|
| 722 |
+
bytes[i] = binaryString.charCodeAt(i);
|
| 723 |
+
}
|
| 724 |
+
|
| 725 |
+
let audioData = bytes.buffer;
|
| 726 |
+
// IMPORTANTE: Usar a taxa enviada pelo servidor
|
| 727 |
+
const sampleRate = data.sampleRate || 24000;
|
| 728 |
+
|
| 729 |
+
console.log(`🎯 TTS Response - Taxa recebida: ${sampleRate}Hz, Formato: ${data.format}, Tamanho: ${bytes.length} bytes`);
|
| 730 |
+
|
| 731 |
+
// Se for Opus, usar WebAudio API para decodificar nativamente
|
| 732 |
+
let wavBuffer;
|
| 733 |
+
if (data.format === 'opus') {
|
| 734 |
+
console.log(`🗜️ Opus 24kHz recebido: ${(bytes.length/1024).toFixed(1)}KB`);
|
| 735 |
+
|
| 736 |
+
// Log de economia de banda
|
| 737 |
+
if (data.originalSize) {
|
| 738 |
+
const compression = Math.round(100 - (bytes.length / data.originalSize) * 100);
|
| 739 |
+
console.log(`📊 Economia de banda: ${compression}% (${(data.originalSize/1024).toFixed(1)}KB → ${(bytes.length/1024).toFixed(1)}KB)`);
|
| 740 |
+
}
|
| 741 |
+
|
| 742 |
+
// WebAudio API pode decodificar Opus nativamente
|
| 743 |
+
// Por agora, tratar como PCM até implementar decoder completo
|
| 744 |
+
wavBuffer = addWavHeader(audioData, sampleRate);
|
| 745 |
+
} else {
|
| 746 |
+
// PCM - adicionar WAV header com a taxa correta
|
| 747 |
+
wavBuffer = addWavHeader(audioData, sampleRate);
|
| 748 |
+
}
|
| 749 |
+
|
| 750 |
+
// Log da qualidade recebida
|
| 751 |
+
console.log(`🎵 TTS pronto: ${(audioData.byteLength/1024).toFixed(1)}KB @ ${sampleRate}Hz (${data.quality || 'high'} quality, ${data.format || 'pcm'})`);
|
| 752 |
+
|
| 753 |
+
// Criar blob e URL
|
| 754 |
+
const blob = new Blob([wavBuffer], { type: 'audio/wav' });
|
| 755 |
+
const audioUrl = URL.createObjectURL(blob);
|
| 756 |
+
|
| 757 |
+
// Atualizar player
|
| 758 |
+
elements.ttsAudio.src = audioUrl;
|
| 759 |
+
elements.ttsPlayer.style.display = 'block';
|
| 760 |
+
elements.ttsStatus.style.display = 'none';
|
| 761 |
+
elements.ttsPlayBtn.disabled = false;
|
| 762 |
+
elements.ttsPlayBtn.textContent = '▶️ Gerar Áudio';
|
| 763 |
+
|
| 764 |
+
log('🎵 Áudio TTS gerado com sucesso!', 'success');
|
| 765 |
+
}
|
| 766 |
+
break;
|
| 767 |
+
}
|
| 768 |
+
}
|
| 769 |
+
|
| 770 |
+
// Processar áudio PCM recebido
|
| 771 |
+
function handlePCMAudio(arrayBuffer) {
|
| 772 |
+
metrics.receivedBytes += arrayBuffer.byteLength;
|
| 773 |
+
updateMetrics();
|
| 774 |
+
|
| 775 |
+
// Criar WAV header para reproduzir
|
| 776 |
+
const wavBuffer = addWavHeader(arrayBuffer);
|
| 777 |
+
|
| 778 |
+
// Criar blob e URL para o áudio
|
| 779 |
+
const blob = new Blob([wavBuffer], { type: 'audio/wav' });
|
| 780 |
+
const audioUrl = URL.createObjectURL(blob);
|
| 781 |
+
|
| 782 |
+
// Criar log com botão de play
|
| 783 |
+
const time = new Date().toLocaleTimeString('pt-BR');
|
| 784 |
+
const entry = document.createElement('div');
|
| 785 |
+
entry.className = 'log-entry success';
|
| 786 |
+
entry.innerHTML = `
|
| 787 |
+
<span class="log-time">[${time}]</span>
|
| 788 |
+
<span class="log-message">🔊 Áudio recebido: ${(arrayBuffer.byteLength / 1024).toFixed(1)}KB</span>
|
| 789 |
+
<div class="audio-player">
|
| 790 |
+
<button class="play-btn" onclick="playAudio('${audioUrl}')">▶️ Play</button>
|
| 791 |
+
<audio id="audio-${Date.now()}" src="${audioUrl}" style="display: none;"></audio>
|
| 792 |
+
</div>
|
| 793 |
+
`;
|
| 794 |
+
elements.log.appendChild(entry);
|
| 795 |
+
elements.log.scrollTop = elements.log.scrollHeight;
|
| 796 |
+
|
| 797 |
+
// Auto-play o áudio
|
| 798 |
+
const audio = new Audio(audioUrl);
|
| 799 |
+
audio.play().catch(err => {
|
| 800 |
+
console.log('Auto-play bloqueado, use o botão para reproduzir');
|
| 801 |
+
});
|
| 802 |
+
}
|
| 803 |
+
|
| 804 |
+
// Função para tocar áudio manualmente
|
| 805 |
+
function playAudio(url) {
|
| 806 |
+
const audio = new Audio(url);
|
| 807 |
+
audio.play();
|
| 808 |
+
}
|
| 809 |
+
|
| 810 |
+
// Adicionar header WAV ao PCM
|
| 811 |
+
function addWavHeader(pcmBuffer, customSampleRate) {
|
| 812 |
+
const pcmData = new Uint8Array(pcmBuffer);
|
| 813 |
+
const wavBuffer = new ArrayBuffer(44 + pcmData.length);
|
| 814 |
+
const view = new DataView(wavBuffer);
|
| 815 |
+
|
| 816 |
+
// WAV header
|
| 817 |
+
const writeString = (offset, string) => {
|
| 818 |
+
for (let i = 0; i < string.length; i++) {
|
| 819 |
+
view.setUint8(offset + i, string.charCodeAt(i));
|
| 820 |
+
}
|
| 821 |
+
};
|
| 822 |
+
|
| 823 |
+
writeString(0, 'RIFF');
|
| 824 |
+
view.setUint32(4, 36 + pcmData.length, true);
|
| 825 |
+
writeString(8, 'WAVE');
|
| 826 |
+
writeString(12, 'fmt ');
|
| 827 |
+
view.setUint32(16, 16, true); // fmt chunk size
|
| 828 |
+
view.setUint16(20, 1, true); // PCM format
|
| 829 |
+
view.setUint16(22, 1, true); // Mono
|
| 830 |
+
|
| 831 |
+
// Usar taxa customizada se fornecida, senão usar 24kHz
|
| 832 |
+
let sampleRate = customSampleRate || 24000;
|
| 833 |
+
|
| 834 |
+
console.log(`📝 WAV Header - Configurando taxa: ${sampleRate}Hz`);
|
| 835 |
+
|
| 836 |
+
view.setUint32(24, sampleRate, true); // Sample rate
|
| 837 |
+
view.setUint32(28, sampleRate * 2, true); // Byte rate: sampleRate * 1 * 2
|
| 838 |
+
view.setUint16(32, 2, true); // Block align: 1 * 2
|
| 839 |
+
view.setUint16(34, 16, true); // Bits per sample: 16-bit
|
| 840 |
+
writeString(36, 'data');
|
| 841 |
+
view.setUint32(40, pcmData.length, true);
|
| 842 |
+
|
| 843 |
+
// Copiar dados PCM
|
| 844 |
+
new Uint8Array(wavBuffer, 44).set(pcmData);
|
| 845 |
+
|
| 846 |
+
return wavBuffer;
|
| 847 |
+
}
|
| 848 |
+
|
| 849 |
+
// Event Listeners
|
| 850 |
+
elements.connectBtn.addEventListener('click', () => {
|
| 851 |
+
if (isConnected) {
|
| 852 |
+
disconnect();
|
| 853 |
+
} else {
|
| 854 |
+
connect();
|
| 855 |
+
}
|
| 856 |
+
});
|
| 857 |
+
|
| 858 |
+
elements.talkBtn.addEventListener('mousedown', startRecording);
|
| 859 |
+
elements.talkBtn.addEventListener('mouseup', stopRecording);
|
| 860 |
+
elements.talkBtn.addEventListener('mouseleave', stopRecording);
|
| 861 |
+
|
| 862 |
+
// Voice selector listener
|
| 863 |
+
elements.voiceSelect.addEventListener('change', (e) => {
|
| 864 |
+
const voice_id = e.target.value;
|
| 865 |
+
console.log('Voice select changed to:', voice_id);
|
| 866 |
+
|
| 867 |
+
// Update current voice display
|
| 868 |
+
const currentVoiceElement = document.getElementById('currentVoice');
|
| 869 |
+
if (currentVoiceElement) {
|
| 870 |
+
currentVoiceElement.textContent = voice_id;
|
| 871 |
+
}
|
| 872 |
+
|
| 873 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 874 |
+
console.log('Sending set-voice command:', voice_id);
|
| 875 |
+
ws.send(JSON.stringify({
|
| 876 |
+
type: 'set-voice',
|
| 877 |
+
voice_id: voice_id
|
| 878 |
+
}));
|
| 879 |
+
log(`🔊 Voz alterada para: ${voice_id} - ${e.target.options[e.target.selectedIndex].text}`, 'info');
|
| 880 |
+
} else {
|
| 881 |
+
console.log('WebSocket not connected, cannot send voice change');
|
| 882 |
+
log(`⚠️ Conecte-se primeiro para mudar a voz`, 'warning');
|
| 883 |
+
}
|
| 884 |
+
});
|
| 885 |
+
elements.talkBtn.addEventListener('touchstart', startRecording);
|
| 886 |
+
elements.talkBtn.addEventListener('touchend', stopRecording);
|
| 887 |
+
|
| 888 |
+
// TTS Voice selector listener
|
| 889 |
+
elements.ttsVoiceSelect.addEventListener('change', (e) => {
|
| 890 |
+
const voice_id = e.target.value;
|
| 891 |
+
|
| 892 |
+
// Update main voice selector
|
| 893 |
+
elements.voiceSelect.value = voice_id;
|
| 894 |
+
|
| 895 |
+
// Update current voice display
|
| 896 |
+
const currentVoiceElement = document.getElementById('currentVoice');
|
| 897 |
+
if (currentVoiceElement) {
|
| 898 |
+
currentVoiceElement.textContent = voice_id;
|
| 899 |
+
}
|
| 900 |
+
|
| 901 |
+
// Send voice change to server
|
| 902 |
+
if (ws && ws.readyState === WebSocket.OPEN) {
|
| 903 |
+
ws.send(JSON.stringify({
|
| 904 |
+
type: 'set-voice',
|
| 905 |
+
voice_id: voice_id
|
| 906 |
+
}));
|
| 907 |
+
log(`🎤 Voz TTS alterada para: ${voice_id}`, 'info');
|
| 908 |
+
}
|
| 909 |
+
});
|
| 910 |
+
|
| 911 |
+
// TTS Button Event Listener
|
| 912 |
+
elements.ttsPlayBtn.addEventListener('click', (e) => {
|
| 913 |
+
e.preventDefault();
|
| 914 |
+
e.stopPropagation();
|
| 915 |
+
|
| 916 |
+
console.log('TTS Button clicked!');
|
| 917 |
+
const text = elements.ttsText.value.trim();
|
| 918 |
+
const voice = elements.ttsVoiceSelect.value;
|
| 919 |
+
|
| 920 |
+
console.log('TTS Text:', text);
|
| 921 |
+
console.log('TTS Voice:', voice);
|
| 922 |
+
|
| 923 |
+
if (!text) {
|
| 924 |
+
alert('Por favor, digite algum texto para converter em áudio');
|
| 925 |
+
return;
|
| 926 |
+
}
|
| 927 |
+
|
| 928 |
+
if (!ws || ws.readyState !== WebSocket.OPEN) {
|
| 929 |
+
alert('Por favor, conecte-se primeiro clicando em "Conectar"');
|
| 930 |
+
return;
|
| 931 |
+
}
|
| 932 |
+
|
| 933 |
+
// Mostrar status
|
| 934 |
+
elements.ttsStatus.style.display = 'block';
|
| 935 |
+
elements.ttsStatusText.textContent = '⏳ Gerando áudio...';
|
| 936 |
+
elements.ttsPlayBtn.disabled = true;
|
| 937 |
+
elements.ttsPlayBtn.textContent = '⏳ Processando...';
|
| 938 |
+
elements.ttsPlayer.style.display = 'none';
|
| 939 |
+
|
| 940 |
+
// Sempre usar melhor qualidade (24kHz)
|
| 941 |
+
const quality = 'high';
|
| 942 |
+
|
| 943 |
+
// Enviar request para TTS com qualidade máxima
|
| 944 |
+
const ttsRequest = {
|
| 945 |
+
type: 'text-to-speech',
|
| 946 |
+
text: text,
|
| 947 |
+
voice_id: voice,
|
| 948 |
+
quality: quality,
|
| 949 |
+
format: 'opus' // Opus 24kHz @ 32kbps - máxima qualidade, mínima banda
|
| 950 |
+
};
|
| 951 |
+
|
| 952 |
+
console.log('Sending TTS request:', ttsRequest);
|
| 953 |
+
ws.send(JSON.stringify(ttsRequest));
|
| 954 |
+
|
| 955 |
+
log(`🎤 Solicitando TTS: voz=${voice}, texto="${text.substring(0, 50)}..."`, 'info');
|
| 956 |
+
});
|
| 957 |
+
|
| 958 |
+
// Inicialização
|
| 959 |
+
log('🚀 Ultravox Chat PCM Otimizado', 'info');
|
| 960 |
+
log('📊 Formato: PCM 16-bit @ 16kHz', 'info');
|
| 961 |
+
log('⚡ Sem FFmpeg, sem Base64!', 'success');
|
| 962 |
+
</script>
|
| 963 |
+
</body>
|
| 964 |
+
</html>
|
services/webrtc_gateway/webrtc.pid
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
5415
|
test-24khz-support.html
ADDED
|
@@ -0,0 +1,243 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="pt-BR">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<title>Teste: Suporte 24kHz vs 16kHz no Navegador</title>
|
| 6 |
+
<style>
|
| 7 |
+
body {
|
| 8 |
+
font-family: 'Segoe UI', system-ui, sans-serif;
|
| 9 |
+
max-width: 800px;
|
| 10 |
+
margin: 50px auto;
|
| 11 |
+
padding: 20px;
|
| 12 |
+
background: #1a1a1a;
|
| 13 |
+
color: #e0e0e0;
|
| 14 |
+
}
|
| 15 |
+
.test-section {
|
| 16 |
+
background: #2a2a2a;
|
| 17 |
+
padding: 20px;
|
| 18 |
+
border-radius: 10px;
|
| 19 |
+
margin: 20px 0;
|
| 20 |
+
}
|
| 21 |
+
h2 { color: #4CAF50; }
|
| 22 |
+
.result {
|
| 23 |
+
padding: 10px;
|
| 24 |
+
margin: 10px 0;
|
| 25 |
+
border-radius: 5px;
|
| 26 |
+
background: #333;
|
| 27 |
+
}
|
| 28 |
+
.success { background: #1e4620; }
|
| 29 |
+
.warning { background: #4a3c1e; }
|
| 30 |
+
.error { background: #4a1e1e; }
|
| 31 |
+
button {
|
| 32 |
+
background: #4CAF50;
|
| 33 |
+
color: white;
|
| 34 |
+
border: none;
|
| 35 |
+
padding: 10px 20px;
|
| 36 |
+
border-radius: 5px;
|
| 37 |
+
cursor: pointer;
|
| 38 |
+
margin: 5px;
|
| 39 |
+
font-size: 16px;
|
| 40 |
+
}
|
| 41 |
+
button:hover { background: #45a049; }
|
| 42 |
+
audio { width: 100%; margin: 10px 0; }
|
| 43 |
+
</style>
|
| 44 |
+
</head>
|
| 45 |
+
<body>
|
| 46 |
+
<h1>🎵 Teste de Qualidade: 24kHz vs 16kHz</h1>
|
| 47 |
+
|
| 48 |
+
<div class="test-section">
|
| 49 |
+
<h2>📊 Capacidades do Navegador</h2>
|
| 50 |
+
<div id="capabilities"></div>
|
| 51 |
+
</div>
|
| 52 |
+
|
| 53 |
+
<div class="test-section">
|
| 54 |
+
<h2>🎤 Teste de Reprodução</h2>
|
| 55 |
+
<button onclick="test16kHz()">▶️ Tocar 16kHz (Atual)</button>
|
| 56 |
+
<button onclick="test24kHz()">▶️ Tocar 24kHz (Alta Qualidade)</button>
|
| 57 |
+
<button onclick="test22kHz()">▶️ Tocar 22.05kHz (CD)</button>
|
| 58 |
+
<button onclick="test48kHz()">▶️ Tocar 48kHz (Studio)</button>
|
| 59 |
+
<div id="playback-result"></div>
|
| 60 |
+
</div>
|
| 61 |
+
|
| 62 |
+
<div class="test-section">
|
| 63 |
+
<h2>📈 Análise de Banda</h2>
|
| 64 |
+
<div id="bandwidth"></div>
|
| 65 |
+
</div>
|
| 66 |
+
|
| 67 |
+
<div class="test-section">
|
| 68 |
+
<h2>💡 Recomendação</h2>
|
| 69 |
+
<div id="recommendation"></div>
|
| 70 |
+
</div>
|
| 71 |
+
|
| 72 |
+
<script>
|
| 73 |
+
// Testar capacidades do navegador
|
| 74 |
+
function checkCapabilities() {
|
| 75 |
+
const cap = document.getElementById('capabilities');
|
| 76 |
+
let html = '';
|
| 77 |
+
|
| 78 |
+
// Verificar AudioContext
|
| 79 |
+
const AC = window.AudioContext || window.webkitAudioContext;
|
| 80 |
+
if (AC) {
|
| 81 |
+
const ctx = new AC();
|
| 82 |
+
html += `<div class="result success">✅ AudioContext suportado</div>`;
|
| 83 |
+
html += `<div class="result">📍 Taxa padrão do sistema: ${ctx.sampleRate}Hz</div>`;
|
| 84 |
+
|
| 85 |
+
// Testar diferentes sample rates
|
| 86 |
+
const rates = [16000, 22050, 24000, 44100, 48000];
|
| 87 |
+
html += '<div class="result">📊 Taxas testadas:</div>';
|
| 88 |
+
|
| 89 |
+
rates.forEach(rate => {
|
| 90 |
+
try {
|
| 91 |
+
const testCtx = new AC({ sampleRate: rate });
|
| 92 |
+
const actualRate = testCtx.sampleRate;
|
| 93 |
+
if (actualRate === rate) {
|
| 94 |
+
html += `<div class="result success">✅ ${rate}Hz: Suportado nativamente</div>`;
|
| 95 |
+
} else {
|
| 96 |
+
html += `<div class="result warning">⚠️ ${rate}Hz: Resampled para ${actualRate}Hz</div>`;
|
| 97 |
+
}
|
| 98 |
+
testCtx.close();
|
| 99 |
+
} catch (e) {
|
| 100 |
+
html += `<div class="result error">❌ ${rate}Hz: Erro - ${e.message}</div>`;
|
| 101 |
+
}
|
| 102 |
+
});
|
| 103 |
+
|
| 104 |
+
ctx.close();
|
| 105 |
+
} else {
|
| 106 |
+
html += `<div class="result error">❌ AudioContext não suportado</div>`;
|
| 107 |
+
}
|
| 108 |
+
|
| 109 |
+
// Verificar Web Audio API features
|
| 110 |
+
if (window.AudioBuffer) {
|
| 111 |
+
html += `<div class="result success">✅ AudioBuffer suportado</div>`;
|
| 112 |
+
}
|
| 113 |
+
|
| 114 |
+
cap.innerHTML = html;
|
| 115 |
+
}
|
| 116 |
+
|
| 117 |
+
// Gerar tom de teste
|
| 118 |
+
function generateTone(sampleRate, frequency = 440, duration = 1) {
|
| 119 |
+
const samples = sampleRate * duration;
|
| 120 |
+
const buffer = new Float32Array(samples);
|
| 121 |
+
|
| 122 |
+
for (let i = 0; i < samples; i++) {
|
| 123 |
+
buffer[i] = Math.sin(2 * Math.PI * frequency * i / sampleRate) * 0.3;
|
| 124 |
+
}
|
| 125 |
+
|
| 126 |
+
return buffer;
|
| 127 |
+
}
|
| 128 |
+
|
| 129 |
+
// Testar reprodução em diferentes taxas
|
| 130 |
+
async function testSampleRate(rate) {
|
| 131 |
+
const result = document.getElementById('playback-result');
|
| 132 |
+
|
| 133 |
+
try {
|
| 134 |
+
const audioContext = new (window.AudioContext || window.webkitAudioContext)({
|
| 135 |
+
sampleRate: rate
|
| 136 |
+
});
|
| 137 |
+
|
| 138 |
+
// Criar buffer de teste
|
| 139 |
+
const audioBuffer = audioContext.createBuffer(1, rate, rate);
|
| 140 |
+
const channelData = generateTone(rate, 440, 0.5);
|
| 141 |
+
audioBuffer.copyToChannel(channelData, 0);
|
| 142 |
+
|
| 143 |
+
// Tocar
|
| 144 |
+
const source = audioContext.createBufferSource();
|
| 145 |
+
source.buffer = audioBuffer;
|
| 146 |
+
source.connect(audioContext.destination);
|
| 147 |
+
source.start();
|
| 148 |
+
|
| 149 |
+
result.innerHTML = `<div class="result success">🔊 Tocando em ${rate}Hz (taxa real: ${audioContext.sampleRate}Hz)</div>`;
|
| 150 |
+
|
| 151 |
+
// Cleanup
|
| 152 |
+
setTimeout(() => {
|
| 153 |
+
audioContext.close();
|
| 154 |
+
}, 600);
|
| 155 |
+
|
| 156 |
+
} catch (e) {
|
| 157 |
+
result.innerHTML = `<div class="result error">❌ Erro ao tocar ${rate}Hz: ${e.message}</div>`;
|
| 158 |
+
}
|
| 159 |
+
}
|
| 160 |
+
|
| 161 |
+
function test16kHz() { testSampleRate(16000); }
|
| 162 |
+
function test24kHz() { testSampleRate(24000); }
|
| 163 |
+
function test22kHz() { testSampleRate(22050); }
|
| 164 |
+
function test48kHz() { testSampleRate(48000); }
|
| 165 |
+
|
| 166 |
+
// Calcular uso de banda
|
| 167 |
+
function calculateBandwidth() {
|
| 168 |
+
const bw = document.getElementById('bandwidth');
|
| 169 |
+
|
| 170 |
+
const rates = [
|
| 171 |
+
{ rate: 16000, name: '16kHz (Atual)' },
|
| 172 |
+
{ rate: 22050, name: '22.05kHz (CD)' },
|
| 173 |
+
{ rate: 24000, name: '24kHz (Kokoro)' },
|
| 174 |
+
{ rate: 48000, name: '48kHz (Studio)' }
|
| 175 |
+
];
|
| 176 |
+
|
| 177 |
+
let html = '<h3>📊 Comparação de Banda (PCM 16-bit mono):</h3>';
|
| 178 |
+
|
| 179 |
+
rates.forEach(r => {
|
| 180 |
+
const bytesPerSec = r.rate * 2; // 16-bit = 2 bytes
|
| 181 |
+
const kbps = (bytesPerSec * 8) / 1000;
|
| 182 |
+
const mbPerMin = (bytesPerSec * 60) / (1024 * 1024);
|
| 183 |
+
|
| 184 |
+
html += `<div class="result">`;
|
| 185 |
+
html += `<strong>${r.name}:</strong><br>`;
|
| 186 |
+
html += `• ${kbps.toFixed(0)} kbps<br>`;
|
| 187 |
+
html += `• ${mbPerMin.toFixed(2)} MB/min<br>`;
|
| 188 |
+
html += `• ${((r.rate/16000 - 1) * 100).toFixed(0)}% maior que 16kHz`;
|
| 189 |
+
html += `</div>`;
|
| 190 |
+
});
|
| 191 |
+
|
| 192 |
+
bw.innerHTML = html;
|
| 193 |
+
}
|
| 194 |
+
|
| 195 |
+
// Gerar recomendação
|
| 196 |
+
function generateRecommendation() {
|
| 197 |
+
const rec = document.getElementById('recommendation');
|
| 198 |
+
|
| 199 |
+
let html = `
|
| 200 |
+
<h3>✅ Recomendações:</h3>
|
| 201 |
+
<div class="result success">
|
| 202 |
+
<strong>SIM, é possível e RECOMENDADO enviar 24kHz direto!</strong><br><br>
|
| 203 |
+
|
| 204 |
+
<strong>Vantagens:</strong><br>
|
| 205 |
+
• 🎵 Qualidade 50% superior (8kHz a mais de frequências)<br>
|
| 206 |
+
• 🎤 Melhor clareza em português (consoantes mais nítidas)<br>
|
| 207 |
+
• 💯 Preserva qualidade original do Kokoro<br>
|
| 208 |
+
• ✅ Todos navegadores modernos suportam<br><br>
|
| 209 |
+
|
| 210 |
+
<strong>Desvantagens:</strong><br>
|
| 211 |
+
• 📊 50% mais banda (384 kbps vs 256 kbps)<br>
|
| 212 |
+
• 💾 50% mais memória<br><br>
|
| 213 |
+
|
| 214 |
+
<strong>Implementação Ideal:</strong><br>
|
| 215 |
+
1. <strong>Opção Adaptativa:</strong> Detectar velocidade da conexão<br>
|
| 216 |
+
2. <strong>Configurável:</strong> Botão "Qualidade: Normal | Alta | Ultra"<br>
|
| 217 |
+
3. <strong>Padrão Inteligente:</strong><br>
|
| 218 |
+
• WiFi/Ethernet: 24kHz<br>
|
| 219 |
+
• 4G/5G: 22.05kHz<br>
|
| 220 |
+
• 3G/Slow: 16kHz<br>
|
| 221 |
+
</div>
|
| 222 |
+
|
| 223 |
+
<div class="result warning">
|
| 224 |
+
<strong>⚡ Para implementar agora (rápido):</strong><br>
|
| 225 |
+
1. Mudar AudioContext para 24000Hz na interface<br>
|
| 226 |
+
2. Remover downsampling no servidor<br>
|
| 227 |
+
3. Ajustar WAV header para 24000Hz<br>
|
| 228 |
+
4. Ganho imediato de 50% na qualidade!
|
| 229 |
+
</div>
|
| 230 |
+
`;
|
| 231 |
+
|
| 232 |
+
rec.innerHTML = html;
|
| 233 |
+
}
|
| 234 |
+
|
| 235 |
+
// Inicializar testes
|
| 236 |
+
window.onload = () => {
|
| 237 |
+
checkCapabilities();
|
| 238 |
+
calculateBandwidth();
|
| 239 |
+
generateRecommendation();
|
| 240 |
+
};
|
| 241 |
+
</script>
|
| 242 |
+
</body>
|
| 243 |
+
</html>
|
test-audio-cli.js
ADDED
|
@@ -0,0 +1,178 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env node
|
| 2 |
+
|
| 3 |
+
/**
|
| 4 |
+
* Teste CLI para simular envio de áudio PCM ao servidor
|
| 5 |
+
* Similar ao que o navegador faz, mas via linha de comando
|
| 6 |
+
*/
|
| 7 |
+
|
| 8 |
+
const WebSocket = require('ws');
|
| 9 |
+
const fs = require('fs');
|
| 10 |
+
const path = require('path');
|
| 11 |
+
|
| 12 |
+
const WS_URL = 'ws://localhost:8082/ws';
|
| 13 |
+
|
| 14 |
+
class AudioTester {
|
| 15 |
+
constructor() {
|
| 16 |
+
this.ws = null;
|
| 17 |
+
this.conversationId = null;
|
| 18 |
+
this.clientId = null;
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
connect() {
|
| 22 |
+
return new Promise((resolve, reject) => {
|
| 23 |
+
console.log('🔌 Conectando ao WebSocket...');
|
| 24 |
+
|
| 25 |
+
this.ws = new WebSocket(WS_URL);
|
| 26 |
+
|
| 27 |
+
this.ws.on('open', () => {
|
| 28 |
+
console.log('✅ Conectado ao servidor');
|
| 29 |
+
resolve();
|
| 30 |
+
});
|
| 31 |
+
|
| 32 |
+
this.ws.on('error', (error) => {
|
| 33 |
+
console.error('❌ Erro:', error.message);
|
| 34 |
+
reject(error);
|
| 35 |
+
});
|
| 36 |
+
|
| 37 |
+
this.ws.on('message', (data) => {
|
| 38 |
+
// Verificar se é binário (áudio) ou JSON (mensagem)
|
| 39 |
+
if (data instanceof Buffer) {
|
| 40 |
+
console.log(`🔊 Áudio recebido: ${(data.length / 1024).toFixed(1)}KB`);
|
| 41 |
+
// Salvar áudio para análise
|
| 42 |
+
const filename = `response_${Date.now()}.pcm`;
|
| 43 |
+
fs.writeFileSync(filename, data);
|
| 44 |
+
console.log(` Salvo como: ${filename}`);
|
| 45 |
+
} else {
|
| 46 |
+
try {
|
| 47 |
+
const msg = JSON.parse(data);
|
| 48 |
+
console.log('📨 Mensagem recebida:', msg);
|
| 49 |
+
|
| 50 |
+
if (msg.type === 'init') {
|
| 51 |
+
this.clientId = msg.clientId;
|
| 52 |
+
this.conversationId = msg.conversationId;
|
| 53 |
+
console.log(`🔑 Client ID: ${this.clientId}`);
|
| 54 |
+
console.log(`🔑 Conversation ID: ${this.conversationId}`);
|
| 55 |
+
} else if (msg.type === 'metrics') {
|
| 56 |
+
console.log(`📊 Resposta: "${msg.response}" (${msg.latency}ms)`);
|
| 57 |
+
}
|
| 58 |
+
} catch (e) {
|
| 59 |
+
console.log('📨 Dados recebidos:', data.toString());
|
| 60 |
+
}
|
| 61 |
+
}
|
| 62 |
+
});
|
| 63 |
+
});
|
| 64 |
+
}
|
| 65 |
+
|
| 66 |
+
/**
|
| 67 |
+
* Gera áudio PCM sintético com tom de 440Hz (nota Lá)
|
| 68 |
+
* @param {number} durationMs - Duração em milissegundos
|
| 69 |
+
* @returns {Buffer} - Buffer PCM 16-bit @ 16kHz
|
| 70 |
+
*/
|
| 71 |
+
generateTestAudio(durationMs = 2000) {
|
| 72 |
+
const sampleRate = 16000;
|
| 73 |
+
const frequency = 440; // Hz (nota Lá)
|
| 74 |
+
const samples = Math.floor(sampleRate * durationMs / 1000);
|
| 75 |
+
const buffer = Buffer.alloc(samples * 2); // 16-bit = 2 bytes por sample
|
| 76 |
+
|
| 77 |
+
for (let i = 0; i < samples; i++) {
|
| 78 |
+
// Gerar onda senoidal
|
| 79 |
+
const t = i / sampleRate;
|
| 80 |
+
const value = Math.sin(2 * Math.PI * frequency * t);
|
| 81 |
+
|
| 82 |
+
// Converter para int16
|
| 83 |
+
const int16Value = Math.floor(value * 32767);
|
| 84 |
+
|
| 85 |
+
// Escrever no buffer (little-endian)
|
| 86 |
+
buffer.writeInt16LE(int16Value, i * 2);
|
| 87 |
+
}
|
| 88 |
+
|
| 89 |
+
return buffer;
|
| 90 |
+
}
|
| 91 |
+
|
| 92 |
+
/**
|
| 93 |
+
* Gera áudio de fala real usando espeak (se disponível)
|
| 94 |
+
*/
|
| 95 |
+
async generateSpeechAudio(text = "Olá, este é um teste de áudio") {
|
| 96 |
+
const { execSync } = require('child_process');
|
| 97 |
+
const tempFile = `/tmp/test_audio_${Date.now()}.raw`;
|
| 98 |
+
|
| 99 |
+
try {
|
| 100 |
+
// Usar espeak para gerar áudio
|
| 101 |
+
console.log(`🎤 Gerando áudio de fala: "${text}"`);
|
| 102 |
+
execSync(`espeak -s 150 -v pt-br "${text}" --stdout | sox - -r 16000 -b 16 -e signed-integer ${tempFile}`);
|
| 103 |
+
|
| 104 |
+
const audioBuffer = fs.readFileSync(tempFile);
|
| 105 |
+
fs.unlinkSync(tempFile); // Limpar arquivo temporário
|
| 106 |
+
|
| 107 |
+
return audioBuffer;
|
| 108 |
+
} catch (error) {
|
| 109 |
+
console.warn('⚠️ espeak/sox não disponível, usando áudio sintético');
|
| 110 |
+
return this.generateTestAudio(2000);
|
| 111 |
+
}
|
| 112 |
+
}
|
| 113 |
+
|
| 114 |
+
async sendAudio(audioBuffer) {
|
| 115 |
+
console.log(`\n📤 Enviando áudio PCM: ${(audioBuffer.length / 1024).toFixed(1)}KB`);
|
| 116 |
+
|
| 117 |
+
// Enviar como dados binários diretos (como o navegador faz)
|
| 118 |
+
this.ws.send(audioBuffer);
|
| 119 |
+
|
| 120 |
+
console.log('✅ Áudio enviado');
|
| 121 |
+
}
|
| 122 |
+
|
| 123 |
+
async testConversation() {
|
| 124 |
+
console.log('\n=== Iniciando teste de conversação ===\n');
|
| 125 |
+
|
| 126 |
+
// Teste 1: Enviar tom sintético
|
| 127 |
+
console.log('1️⃣ Teste com tom sintético (440Hz por 2s)');
|
| 128 |
+
const syntheticAudio = this.generateTestAudio(2000);
|
| 129 |
+
await this.sendAudio(syntheticAudio);
|
| 130 |
+
await this.wait(5000); // Aguardar resposta
|
| 131 |
+
|
| 132 |
+
// Teste 2: Enviar áudio de fala (se possível)
|
| 133 |
+
console.log('\n2️⃣ Teste com fala sintetizada');
|
| 134 |
+
const speechAudio = await this.generateSpeechAudio("Qual é o seu nome?");
|
| 135 |
+
await this.sendAudio(speechAudio);
|
| 136 |
+
await this.wait(5000); // Aguardar resposta
|
| 137 |
+
|
| 138 |
+
// Teste 3: Enviar silêncio
|
| 139 |
+
console.log('\n3️⃣ Teste com silêncio');
|
| 140 |
+
const silentAudio = Buffer.alloc(32000); // 1 segundo de silêncio
|
| 141 |
+
await this.sendAudio(silentAudio);
|
| 142 |
+
await this.wait(5000); // Aguardar resposta
|
| 143 |
+
}
|
| 144 |
+
|
| 145 |
+
wait(ms) {
|
| 146 |
+
return new Promise(resolve => setTimeout(resolve, ms));
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
disconnect() {
|
| 150 |
+
if (this.ws) {
|
| 151 |
+
console.log('\n👋 Desconectando...');
|
| 152 |
+
this.ws.close();
|
| 153 |
+
}
|
| 154 |
+
}
|
| 155 |
+
}
|
| 156 |
+
|
| 157 |
+
async function main() {
|
| 158 |
+
const tester = new AudioTester();
|
| 159 |
+
|
| 160 |
+
try {
|
| 161 |
+
await tester.connect();
|
| 162 |
+
await tester.wait(500);
|
| 163 |
+
await tester.testConversation();
|
| 164 |
+
await tester.wait(2000); // Aguardar últimas respostas
|
| 165 |
+
} catch (error) {
|
| 166 |
+
console.error('Erro fatal:', error);
|
| 167 |
+
} finally {
|
| 168 |
+
tester.disconnect();
|
| 169 |
+
}
|
| 170 |
+
}
|
| 171 |
+
|
| 172 |
+
console.log('╔═══════════════════════════════════════╗');
|
| 173 |
+
console.log('║ Teste CLI de Áudio PCM ║');
|
| 174 |
+
console.log('╚═══════════════════════════════════════╝\n');
|
| 175 |
+
console.log('Este teste simula o envio de áudio PCM');
|
| 176 |
+
console.log('como o navegador faz, mas via CLI.\n');
|
| 177 |
+
|
| 178 |
+
main().catch(console.error);
|
test-grpc-updated.py
ADDED
|
@@ -0,0 +1,161 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Teste do servidor Ultravox via gRPC com formato de áudio atualizado
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import grpc
|
| 7 |
+
import numpy as np
|
| 8 |
+
import librosa
|
| 9 |
+
import tempfile
|
| 10 |
+
from gtts import gTTS
|
| 11 |
+
import sys
|
| 12 |
+
import os
|
| 13 |
+
import time
|
| 14 |
+
|
| 15 |
+
# Adicionar paths para protos
|
| 16 |
+
sys.path.append('/workspace/ultravox-pipeline/services/ultravox')
|
| 17 |
+
sys.path.append('/workspace/ultravox-pipeline/protos/generated')
|
| 18 |
+
|
| 19 |
+
import speech_pb2
|
| 20 |
+
import speech_pb2_grpc
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
def generate_audio_for_grpc(text, lang='pt-br'):
|
| 24 |
+
"""Gera áudio TTS e retorna como bytes float32 para gRPC"""
|
| 25 |
+
print(f"🔊 Gerando TTS: '{text}'")
|
| 26 |
+
|
| 27 |
+
# Criar arquivo temporário para o TTS
|
| 28 |
+
with tempfile.NamedTemporaryFile(suffix='.mp3', delete=False) as tmp_file:
|
| 29 |
+
tmp_path = tmp_file.name
|
| 30 |
+
|
| 31 |
+
try:
|
| 32 |
+
# Gerar TTS como MP3
|
| 33 |
+
tts = gTTS(text=text, lang=lang)
|
| 34 |
+
tts.save(tmp_path)
|
| 35 |
+
|
| 36 |
+
# Carregar com librosa (converte automaticamente para float32 normalizado)
|
| 37 |
+
audio, sr = librosa.load(tmp_path, sr=16000)
|
| 38 |
+
|
| 39 |
+
print(f"📊 Áudio carregado:")
|
| 40 |
+
print(f" - Shape: {audio.shape}")
|
| 41 |
+
print(f" - Dtype: {audio.dtype}")
|
| 42 |
+
print(f" - Min: {audio.min():.3f}, Max: {audio.max():.3f}")
|
| 43 |
+
print(f" - Sample rate: {sr} Hz")
|
| 44 |
+
|
| 45 |
+
# Converter para bytes para enviar via gRPC
|
| 46 |
+
audio_bytes = audio.tobytes()
|
| 47 |
+
|
| 48 |
+
return audio_bytes, sr
|
| 49 |
+
|
| 50 |
+
finally:
|
| 51 |
+
# Limpar arquivo temporário
|
| 52 |
+
if os.path.exists(tmp_path):
|
| 53 |
+
os.unlink(tmp_path)
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
async def test_ultravox_grpc():
|
| 57 |
+
"""Testa o servidor Ultravox via gRPC"""
|
| 58 |
+
|
| 59 |
+
print("=" * 60)
|
| 60 |
+
print("🚀 TESTE ULTRAVOX gRPC COM FORMATO ATUALIZADO")
|
| 61 |
+
print("=" * 60)
|
| 62 |
+
|
| 63 |
+
# Conectar ao servidor gRPC
|
| 64 |
+
channel = grpc.aio.insecure_channel('localhost:50051')
|
| 65 |
+
stub = speech_pb2_grpc.SpeechServiceStub(channel)
|
| 66 |
+
|
| 67 |
+
# Lista de testes
|
| 68 |
+
tests = [
|
| 69 |
+
{
|
| 70 |
+
"audio_text": "Quanto é dois mais dois?",
|
| 71 |
+
"prompt": "Responda em português:",
|
| 72 |
+
"lang": "pt-br",
|
| 73 |
+
"expected": ["quatro", "4", "dois mais dois"]
|
| 74 |
+
},
|
| 75 |
+
{
|
| 76 |
+
"audio_text": "Qual é a capital do Brasil?",
|
| 77 |
+
"prompt": "", # Testar sem prompt customizado
|
| 78 |
+
"lang": "pt-br",
|
| 79 |
+
"expected": ["Brasília", "capital"]
|
| 80 |
+
},
|
| 81 |
+
{
|
| 82 |
+
"audio_text": "What is the capital of France?",
|
| 83 |
+
"prompt": "Answer the question:",
|
| 84 |
+
"lang": "en",
|
| 85 |
+
"expected": ["Paris", "capital", "France"]
|
| 86 |
+
}
|
| 87 |
+
]
|
| 88 |
+
|
| 89 |
+
for i, test in enumerate(tests, 1):
|
| 90 |
+
print(f"\n{'='*50}")
|
| 91 |
+
print(f"📝 Teste {i}: {test['audio_text']}")
|
| 92 |
+
if test['prompt']:
|
| 93 |
+
print(f" Prompt: {test['prompt']}")
|
| 94 |
+
print(f" Esperado: {', '.join(test['expected'])}")
|
| 95 |
+
|
| 96 |
+
# Gerar áudio
|
| 97 |
+
audio_bytes, sample_rate = generate_audio_for_grpc(test['audio_text'], test['lang'])
|
| 98 |
+
|
| 99 |
+
# Criar requisição gRPC
|
| 100 |
+
async def generate_requests():
|
| 101 |
+
# Primeiro chunk com metadados
|
| 102 |
+
chunk = speech_pb2.AudioChunk()
|
| 103 |
+
chunk.session_id = f"test_{i}"
|
| 104 |
+
chunk.audio_data = audio_bytes[:len(audio_bytes)//2] # Primeira metade
|
| 105 |
+
chunk.sample_rate = sample_rate
|
| 106 |
+
chunk.is_final_chunk = False
|
| 107 |
+
if test['prompt']:
|
| 108 |
+
chunk.system_prompt = test['prompt']
|
| 109 |
+
yield chunk
|
| 110 |
+
|
| 111 |
+
# Segundo chunk com resto do áudio
|
| 112 |
+
chunk = speech_pb2.AudioChunk()
|
| 113 |
+
chunk.session_id = f"test_{i}"
|
| 114 |
+
chunk.audio_data = audio_bytes[len(audio_bytes)//2:] # Segunda metade
|
| 115 |
+
chunk.sample_rate = sample_rate
|
| 116 |
+
chunk.is_final_chunk = True
|
| 117 |
+
yield chunk
|
| 118 |
+
|
| 119 |
+
# Enviar e receber resposta
|
| 120 |
+
print("⏳ Enviando para servidor...")
|
| 121 |
+
start_time = time.time()
|
| 122 |
+
|
| 123 |
+
try:
|
| 124 |
+
response_text = ""
|
| 125 |
+
token_count = 0
|
| 126 |
+
|
| 127 |
+
async for token in stub.StreamingRecognize(generate_requests()):
|
| 128 |
+
if token.text:
|
| 129 |
+
response_text += token.text
|
| 130 |
+
token_count += 1
|
| 131 |
+
|
| 132 |
+
if token.is_final:
|
| 133 |
+
break
|
| 134 |
+
|
| 135 |
+
elapsed = time.time() - start_time
|
| 136 |
+
|
| 137 |
+
# Verificar resposta
|
| 138 |
+
success = any(exp.lower() in response_text.lower() for exp in test['expected'])
|
| 139 |
+
|
| 140 |
+
print(f"💬 Resposta: '{response_text.strip()}'")
|
| 141 |
+
print(f"📊 Tokens: {token_count}")
|
| 142 |
+
print(f"⏱️ Tempo: {elapsed:.2f}s")
|
| 143 |
+
|
| 144 |
+
if success:
|
| 145 |
+
print(f"✅ SUCESSO! Resposta reconhecida")
|
| 146 |
+
else:
|
| 147 |
+
print(f"⚠️ Resposta não reconhecida")
|
| 148 |
+
|
| 149 |
+
except Exception as e:
|
| 150 |
+
print(f"❌ Erro: {e}")
|
| 151 |
+
|
| 152 |
+
await channel.close()
|
| 153 |
+
|
| 154 |
+
print("\n" + "=" * 60)
|
| 155 |
+
print("📊 TESTE CONCLUÍDO")
|
| 156 |
+
print("=" * 60)
|
| 157 |
+
|
| 158 |
+
|
| 159 |
+
if __name__ == "__main__":
|
| 160 |
+
import asyncio
|
| 161 |
+
asyncio.run(test_ultravox_grpc())
|
test-opus-support.html
ADDED
|
@@ -0,0 +1,337 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="pt-BR">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>Opus Codec Test</title>
|
| 7 |
+
<style>
|
| 8 |
+
body {
|
| 9 |
+
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'Roboto', 'Helvetica', 'Arial', sans-serif;
|
| 10 |
+
padding: 20px;
|
| 11 |
+
max-width: 800px;
|
| 12 |
+
margin: 0 auto;
|
| 13 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 14 |
+
min-height: 100vh;
|
| 15 |
+
}
|
| 16 |
+
|
| 17 |
+
.container {
|
| 18 |
+
background: white;
|
| 19 |
+
border-radius: 12px;
|
| 20 |
+
padding: 30px;
|
| 21 |
+
box-shadow: 0 20px 40px rgba(0, 0, 0, 0.1);
|
| 22 |
+
}
|
| 23 |
+
|
| 24 |
+
h1 {
|
| 25 |
+
color: #333;
|
| 26 |
+
margin-bottom: 30px;
|
| 27 |
+
}
|
| 28 |
+
|
| 29 |
+
.codec-info {
|
| 30 |
+
background: #f0f0f5;
|
| 31 |
+
padding: 15px;
|
| 32 |
+
border-radius: 8px;
|
| 33 |
+
margin-bottom: 20px;
|
| 34 |
+
font-family: monospace;
|
| 35 |
+
}
|
| 36 |
+
|
| 37 |
+
.status {
|
| 38 |
+
display: inline-block;
|
| 39 |
+
padding: 5px 10px;
|
| 40 |
+
border-radius: 4px;
|
| 41 |
+
font-weight: bold;
|
| 42 |
+
margin-left: 10px;
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
+
.supported {
|
| 46 |
+
background: #4CAF50;
|
| 47 |
+
color: white;
|
| 48 |
+
}
|
| 49 |
+
|
| 50 |
+
.not-supported {
|
| 51 |
+
background: #f44336;
|
| 52 |
+
color: white;
|
| 53 |
+
}
|
| 54 |
+
|
| 55 |
+
.test-section {
|
| 56 |
+
margin: 20px 0;
|
| 57 |
+
padding: 20px;
|
| 58 |
+
border: 2px solid #e0e0e0;
|
| 59 |
+
border-radius: 8px;
|
| 60 |
+
}
|
| 61 |
+
|
| 62 |
+
button {
|
| 63 |
+
background: linear-gradient(145deg, #667eea, #764ba2);
|
| 64 |
+
color: white;
|
| 65 |
+
border: none;
|
| 66 |
+
padding: 12px 24px;
|
| 67 |
+
border-radius: 8px;
|
| 68 |
+
font-size: 16px;
|
| 69 |
+
cursor: pointer;
|
| 70 |
+
margin: 5px;
|
| 71 |
+
transition: transform 0.2s;
|
| 72 |
+
}
|
| 73 |
+
|
| 74 |
+
button:hover {
|
| 75 |
+
transform: translateY(-2px);
|
| 76 |
+
}
|
| 77 |
+
|
| 78 |
+
button:disabled {
|
| 79 |
+
opacity: 0.5;
|
| 80 |
+
cursor: not-allowed;
|
| 81 |
+
}
|
| 82 |
+
|
| 83 |
+
.log {
|
| 84 |
+
background: #1e1e1e;
|
| 85 |
+
color: #d4d4d4;
|
| 86 |
+
padding: 15px;
|
| 87 |
+
border-radius: 8px;
|
| 88 |
+
font-family: monospace;
|
| 89 |
+
font-size: 12px;
|
| 90 |
+
max-height: 300px;
|
| 91 |
+
overflow-y: auto;
|
| 92 |
+
margin-top: 20px;
|
| 93 |
+
}
|
| 94 |
+
|
| 95 |
+
.log-entry {
|
| 96 |
+
margin: 5px 0;
|
| 97 |
+
padding: 5px;
|
| 98 |
+
border-left: 3px solid #667eea;
|
| 99 |
+
padding-left: 10px;
|
| 100 |
+
}
|
| 101 |
+
|
| 102 |
+
.log-entry.error {
|
| 103 |
+
border-left-color: #f44336;
|
| 104 |
+
color: #ff9999;
|
| 105 |
+
}
|
| 106 |
+
|
| 107 |
+
.log-entry.success {
|
| 108 |
+
border-left-color: #4CAF50;
|
| 109 |
+
color: #90ee90;
|
| 110 |
+
}
|
| 111 |
+
|
| 112 |
+
.log-entry.info {
|
| 113 |
+
border-left-color: #2196F3;
|
| 114 |
+
color: #87ceeb;
|
| 115 |
+
}
|
| 116 |
+
</style>
|
| 117 |
+
</head>
|
| 118 |
+
<body>
|
| 119 |
+
<div class="container">
|
| 120 |
+
<h1>🎵 Opus Codec Support Test</h1>
|
| 121 |
+
|
| 122 |
+
<div class="codec-info">
|
| 123 |
+
<h3>Codec Support Detection:</h3>
|
| 124 |
+
<div id="codecStatus"></div>
|
| 125 |
+
</div>
|
| 126 |
+
|
| 127 |
+
<div class="test-section">
|
| 128 |
+
<h3>🎤 Recording Test</h3>
|
| 129 |
+
<button id="startRecord">Start Recording (Opus)</button>
|
| 130 |
+
<button id="stopRecord" disabled>Stop Recording</button>
|
| 131 |
+
<button id="startPCM">Start Recording (PCM)</button>
|
| 132 |
+
<button id="stopPCM" disabled>Stop Recording</button>
|
| 133 |
+
</div>
|
| 134 |
+
|
| 135 |
+
<div class="test-section">
|
| 136 |
+
<h3>📊 Recording Info</h3>
|
| 137 |
+
<div id="recordingInfo">
|
| 138 |
+
<p>Format: <span id="format">-</span></p>
|
| 139 |
+
<p>Size: <span id="size">-</span></p>
|
| 140 |
+
<p>Duration: <span id="duration">-</span></p>
|
| 141 |
+
</div>
|
| 142 |
+
</div>
|
| 143 |
+
|
| 144 |
+
<div class="log" id="log"></div>
|
| 145 |
+
</div>
|
| 146 |
+
|
| 147 |
+
<script>
|
| 148 |
+
let mediaRecorder;
|
| 149 |
+
let audioChunks = [];
|
| 150 |
+
let stream;
|
| 151 |
+
let startTime;
|
| 152 |
+
|
| 153 |
+
function log(message, type = 'info') {
|
| 154 |
+
const logEl = document.getElementById('log');
|
| 155 |
+
const entry = document.createElement('div');
|
| 156 |
+
entry.className = `log-entry ${type}`;
|
| 157 |
+
const time = new Date().toLocaleTimeString();
|
| 158 |
+
entry.textContent = `[${time}] ${message}`;
|
| 159 |
+
logEl.appendChild(entry);
|
| 160 |
+
logEl.scrollTop = logEl.scrollHeight;
|
| 161 |
+
}
|
| 162 |
+
|
| 163 |
+
// Check codec support
|
| 164 |
+
function checkCodecSupport() {
|
| 165 |
+
const statusEl = document.getElementById('codecStatus');
|
| 166 |
+
const codecs = [
|
| 167 |
+
'audio/webm;codecs=opus',
|
| 168 |
+
'audio/ogg;codecs=opus',
|
| 169 |
+
'audio/webm',
|
| 170 |
+
'audio/ogg'
|
| 171 |
+
];
|
| 172 |
+
|
| 173 |
+
let html = '';
|
| 174 |
+
codecs.forEach(codec => {
|
| 175 |
+
const supported = MediaRecorder.isTypeSupported(codec);
|
| 176 |
+
html += `<div>${codec}: <span class="status ${supported ? 'supported' : 'not-supported'}">${supported ? 'SUPPORTED' : 'NOT SUPPORTED'}</span></div>`;
|
| 177 |
+
log(`Codec ${codec}: ${supported ? 'Supported' : 'Not Supported'}`, supported ? 'success' : 'error');
|
| 178 |
+
});
|
| 179 |
+
|
| 180 |
+
statusEl.innerHTML = html;
|
| 181 |
+
}
|
| 182 |
+
|
| 183 |
+
// Initialize
|
| 184 |
+
async function init() {
|
| 185 |
+
try {
|
| 186 |
+
stream = await navigator.mediaDevices.getUserMedia({ audio: true });
|
| 187 |
+
log('Microphone access granted', 'success');
|
| 188 |
+
checkCodecSupport();
|
| 189 |
+
} catch (error) {
|
| 190 |
+
log('Failed to get microphone access: ' + error.message, 'error');
|
| 191 |
+
}
|
| 192 |
+
}
|
| 193 |
+
|
| 194 |
+
// Start Opus recording
|
| 195 |
+
document.getElementById('startRecord').addEventListener('click', () => {
|
| 196 |
+
if (!stream) {
|
| 197 |
+
log('No stream available', 'error');
|
| 198 |
+
return;
|
| 199 |
+
}
|
| 200 |
+
|
| 201 |
+
audioChunks = [];
|
| 202 |
+
startTime = Date.now();
|
| 203 |
+
|
| 204 |
+
const mimeType = 'audio/webm;codecs=opus';
|
| 205 |
+
const options = {
|
| 206 |
+
mimeType: MediaRecorder.isTypeSupported(mimeType) ? mimeType : 'audio/webm',
|
| 207 |
+
audioBitsPerSecond: 32000
|
| 208 |
+
};
|
| 209 |
+
|
| 210 |
+
try {
|
| 211 |
+
mediaRecorder = new MediaRecorder(stream, options);
|
| 212 |
+
log(`Recording started with ${mediaRecorder.mimeType}`, 'success');
|
| 213 |
+
|
| 214 |
+
mediaRecorder.ondataavailable = (event) => {
|
| 215 |
+
if (event.data.size > 0) {
|
| 216 |
+
audioChunks.push(event.data);
|
| 217 |
+
log(`Chunk received: ${event.data.size} bytes`);
|
| 218 |
+
}
|
| 219 |
+
};
|
| 220 |
+
|
| 221 |
+
mediaRecorder.onstop = () => {
|
| 222 |
+
const duration = ((Date.now() - startTime) / 1000).toFixed(2);
|
| 223 |
+
const blob = new Blob(audioChunks, { type: mediaRecorder.mimeType });
|
| 224 |
+
|
| 225 |
+
document.getElementById('format').textContent = mediaRecorder.mimeType;
|
| 226 |
+
document.getElementById('size').textContent = `${(blob.size / 1024).toFixed(2)} KB`;
|
| 227 |
+
document.getElementById('duration').textContent = `${duration} seconds`;
|
| 228 |
+
|
| 229 |
+
log(`Recording stopped. Total size: ${(blob.size / 1024).toFixed(2)} KB`, 'success');
|
| 230 |
+
|
| 231 |
+
// Create download link
|
| 232 |
+
const url = URL.createObjectURL(blob);
|
| 233 |
+
const a = document.createElement('a');
|
| 234 |
+
a.href = url;
|
| 235 |
+
a.download = `opus-test-${Date.now()}.webm`;
|
| 236 |
+
a.click();
|
| 237 |
+
};
|
| 238 |
+
|
| 239 |
+
mediaRecorder.start(100);
|
| 240 |
+
|
| 241 |
+
document.getElementById('startRecord').disabled = true;
|
| 242 |
+
document.getElementById('stopRecord').disabled = false;
|
| 243 |
+
|
| 244 |
+
} catch (error) {
|
| 245 |
+
log('Failed to start recording: ' + error.message, 'error');
|
| 246 |
+
}
|
| 247 |
+
});
|
| 248 |
+
|
| 249 |
+
// Stop Opus recording
|
| 250 |
+
document.getElementById('stopRecord').addEventListener('click', () => {
|
| 251 |
+
if (mediaRecorder && mediaRecorder.state === 'recording') {
|
| 252 |
+
mediaRecorder.stop();
|
| 253 |
+
document.getElementById('startRecord').disabled = false;
|
| 254 |
+
document.getElementById('stopRecord').disabled = true;
|
| 255 |
+
}
|
| 256 |
+
});
|
| 257 |
+
|
| 258 |
+
// PCM recording (for comparison)
|
| 259 |
+
let audioContext;
|
| 260 |
+
let audioSource;
|
| 261 |
+
let audioProcessor;
|
| 262 |
+
let pcmBuffer = [];
|
| 263 |
+
|
| 264 |
+
document.getElementById('startPCM').addEventListener('click', () => {
|
| 265 |
+
if (!stream) {
|
| 266 |
+
log('No stream available', 'error');
|
| 267 |
+
return;
|
| 268 |
+
}
|
| 269 |
+
|
| 270 |
+
pcmBuffer = [];
|
| 271 |
+
startTime = Date.now();
|
| 272 |
+
|
| 273 |
+
if (!audioContext) {
|
| 274 |
+
audioContext = new (window.AudioContext || window.webkitAudioContext)({ sampleRate: 24000 });
|
| 275 |
+
}
|
| 276 |
+
|
| 277 |
+
audioSource = audioContext.createMediaStreamSource(stream);
|
| 278 |
+
audioProcessor = audioContext.createScriptProcessor(4096, 1, 1);
|
| 279 |
+
|
| 280 |
+
audioProcessor.onaudioprocess = (e) => {
|
| 281 |
+
const inputData = e.inputBuffer.getChannelData(0);
|
| 282 |
+
const pcmData = new Int16Array(inputData.length);
|
| 283 |
+
|
| 284 |
+
for (let i = 0; i < inputData.length; i++) {
|
| 285 |
+
const sample = Math.max(-1, Math.min(1, inputData[i]));
|
| 286 |
+
pcmData[i] = sample < 0 ? sample * 0x8000 : sample * 0x7FFF;
|
| 287 |
+
}
|
| 288 |
+
|
| 289 |
+
pcmBuffer.push(pcmData);
|
| 290 |
+
};
|
| 291 |
+
|
| 292 |
+
audioSource.connect(audioProcessor);
|
| 293 |
+
audioProcessor.connect(audioContext.destination);
|
| 294 |
+
|
| 295 |
+
log('PCM recording started (24kHz, 16-bit)', 'success');
|
| 296 |
+
|
| 297 |
+
document.getElementById('startPCM').disabled = true;
|
| 298 |
+
document.getElementById('stopPCM').disabled = false;
|
| 299 |
+
});
|
| 300 |
+
|
| 301 |
+
document.getElementById('stopPCM').addEventListener('click', () => {
|
| 302 |
+
if (audioProcessor) {
|
| 303 |
+
audioProcessor.disconnect();
|
| 304 |
+
audioProcessor = null;
|
| 305 |
+
}
|
| 306 |
+
if (audioSource) {
|
| 307 |
+
audioSource.disconnect();
|
| 308 |
+
audioSource = null;
|
| 309 |
+
}
|
| 310 |
+
|
| 311 |
+
const duration = ((Date.now() - startTime) / 1000).toFixed(2);
|
| 312 |
+
const totalLength = pcmBuffer.reduce((acc, chunk) => acc + chunk.length, 0);
|
| 313 |
+
const fullPCM = new Int16Array(totalLength);
|
| 314 |
+
let offset = 0;
|
| 315 |
+
|
| 316 |
+
for (const chunk of pcmBuffer) {
|
| 317 |
+
fullPCM.set(chunk, offset);
|
| 318 |
+
offset += chunk.length;
|
| 319 |
+
}
|
| 320 |
+
|
| 321 |
+
const sizeKB = (fullPCM.length * 2 / 1024).toFixed(2);
|
| 322 |
+
|
| 323 |
+
document.getElementById('format').textContent = 'PCM 16-bit 24kHz';
|
| 324 |
+
document.getElementById('size').textContent = `${sizeKB} KB`;
|
| 325 |
+
document.getElementById('duration').textContent = `${duration} seconds`;
|
| 326 |
+
|
| 327 |
+
log(`PCM recording stopped. Total size: ${sizeKB} KB`, 'success');
|
| 328 |
+
|
| 329 |
+
document.getElementById('startPCM').disabled = false;
|
| 330 |
+
document.getElementById('stopPCM').disabled = true;
|
| 331 |
+
});
|
| 332 |
+
|
| 333 |
+
// Initialize on load
|
| 334 |
+
init();
|
| 335 |
+
</script>
|
| 336 |
+
</body>
|
| 337 |
+
</html>
|
test-simple.py
ADDED
|
@@ -0,0 +1,70 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Teste simples do Ultravox com prompt básico
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import grpc
|
| 7 |
+
import numpy as np
|
| 8 |
+
import time
|
| 9 |
+
import sys
|
| 10 |
+
|
| 11 |
+
sys.path.append('/workspace/ultravox-pipeline/ultravox')
|
| 12 |
+
sys.path.append('/workspace/ultravox-pipeline/protos')
|
| 13 |
+
|
| 14 |
+
import speech_pb2
|
| 15 |
+
import speech_pb2_grpc
|
| 16 |
+
|
| 17 |
+
def test_ultravox():
|
| 18 |
+
"""Testa o Ultravox com áudio simples"""
|
| 19 |
+
|
| 20 |
+
print("📡 Conectando ao Ultravox...")
|
| 21 |
+
channel = grpc.insecure_channel('localhost:50051')
|
| 22 |
+
stub = speech_pb2_grpc.SpeechServiceStub(channel)
|
| 23 |
+
|
| 24 |
+
# Criar áudio simples de silêncio
|
| 25 |
+
# O modelo deveria processar mesmo sem áudio real
|
| 26 |
+
audio = np.zeros(16000, dtype=np.float32) # 1 segundo de silêncio
|
| 27 |
+
|
| 28 |
+
print(f"🎵 Áudio: {len(audio)} samples @ 16kHz")
|
| 29 |
+
|
| 30 |
+
# Criar requisição simples
|
| 31 |
+
def audio_generator():
|
| 32 |
+
chunk = speech_pb2.AudioChunk()
|
| 33 |
+
chunk.audio_data = audio.tobytes()
|
| 34 |
+
chunk.sample_rate = 16000
|
| 35 |
+
chunk.is_final_chunk = True
|
| 36 |
+
chunk.session_id = f"test_{int(time.time())}"
|
| 37 |
+
# Não enviar prompt - usar padrão <|audio|>
|
| 38 |
+
yield chunk
|
| 39 |
+
|
| 40 |
+
print("⏳ Processando...")
|
| 41 |
+
start_time = time.time()
|
| 42 |
+
|
| 43 |
+
try:
|
| 44 |
+
response_text = ""
|
| 45 |
+
token_count = 0
|
| 46 |
+
|
| 47 |
+
for response in stub.StreamingRecognize(audio_generator()):
|
| 48 |
+
if response.text:
|
| 49 |
+
response_text += response.text
|
| 50 |
+
token_count += 1
|
| 51 |
+
print(f" Token {token_count}: '{response.text.strip()}'")
|
| 52 |
+
|
| 53 |
+
if response.is_final:
|
| 54 |
+
print(" [FINAL]")
|
| 55 |
+
break
|
| 56 |
+
|
| 57 |
+
elapsed = time.time() - start_time
|
| 58 |
+
|
| 59 |
+
print(f"\n📊 Resultado:")
|
| 60 |
+
print(f" - Resposta: '{response_text.strip()}'")
|
| 61 |
+
print(f" - Tempo: {elapsed:.2f}s")
|
| 62 |
+
print(f" - Tokens: {token_count}")
|
| 63 |
+
|
| 64 |
+
except grpc.RpcError as e:
|
| 65 |
+
print(f"❌ Erro gRPC: {e.code()} - {e.details()}")
|
| 66 |
+
except Exception as e:
|
| 67 |
+
print(f"❌ Erro: {e}")
|
| 68 |
+
|
| 69 |
+
if __name__ == "__main__":
|
| 70 |
+
test_ultravox()
|
test-tts-button.html
ADDED
|
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html>
|
| 3 |
+
<head>
|
| 4 |
+
<title>Test TTS Button</title>
|
| 5 |
+
</head>
|
| 6 |
+
<body>
|
| 7 |
+
<h1>Test TTS WebSocket</h1>
|
| 8 |
+
<button id="connectBtn">Connect</button>
|
| 9 |
+
<button id="testTTSBtn" disabled>Test TTS</button>
|
| 10 |
+
<div id="log"></div>
|
| 11 |
+
|
| 12 |
+
<script>
|
| 13 |
+
let ws = null;
|
| 14 |
+
const log = document.getElementById('log');
|
| 15 |
+
|
| 16 |
+
function addLog(msg) {
|
| 17 |
+
log.innerHTML += `<p>${msg}</p>`;
|
| 18 |
+
console.log(msg);
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
document.getElementById('connectBtn').onclick = () => {
|
| 22 |
+
ws = new WebSocket('ws://localhost:8082/ws');
|
| 23 |
+
ws.binaryType = 'arraybuffer';
|
| 24 |
+
|
| 25 |
+
ws.onopen = () => {
|
| 26 |
+
addLog('✅ Connected');
|
| 27 |
+
document.getElementById('testTTSBtn').disabled = false;
|
| 28 |
+
};
|
| 29 |
+
|
| 30 |
+
ws.onmessage = (event) => {
|
| 31 |
+
if (event.data instanceof ArrayBuffer) {
|
| 32 |
+
addLog(`📦 Received binary: ${event.data.byteLength} bytes`);
|
| 33 |
+
} else {
|
| 34 |
+
try {
|
| 35 |
+
const data = JSON.parse(event.data);
|
| 36 |
+
addLog(`📨 Received JSON: ${JSON.stringify(data)}`);
|
| 37 |
+
} catch (e) {
|
| 38 |
+
addLog(`📨 Received text: ${event.data}`);
|
| 39 |
+
}
|
| 40 |
+
}
|
| 41 |
+
};
|
| 42 |
+
|
| 43 |
+
ws.onerror = (error) => {
|
| 44 |
+
addLog(`❌ Error: ${error}`);
|
| 45 |
+
};
|
| 46 |
+
|
| 47 |
+
ws.onclose = () => {
|
| 48 |
+
addLog('❌ Disconnected');
|
| 49 |
+
document.getElementById('testTTSBtn').disabled = true;
|
| 50 |
+
};
|
| 51 |
+
};
|
| 52 |
+
|
| 53 |
+
document.getElementById('testTTSBtn').onclick = () => {
|
| 54 |
+
const ttsRequest = {
|
| 55 |
+
type: 'text-to-speech',
|
| 56 |
+
text: 'Teste de TTS direto',
|
| 57 |
+
voice_id: 'pf_dora'
|
| 58 |
+
};
|
| 59 |
+
|
| 60 |
+
addLog(`📤 Sending: ${JSON.stringify(ttsRequest)}`);
|
| 61 |
+
ws.send(JSON.stringify(ttsRequest));
|
| 62 |
+
};
|
| 63 |
+
</script>
|
| 64 |
+
</body>
|
| 65 |
+
</html>
|
test-ultravox-auto.py
ADDED
|
@@ -0,0 +1,172 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Teste automatizado do Ultravox com TTS
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import grpc
|
| 7 |
+
import numpy as np
|
| 8 |
+
import time
|
| 9 |
+
import sys
|
| 10 |
+
import os
|
| 11 |
+
from gtts import gTTS
|
| 12 |
+
from pydub import AudioSegment
|
| 13 |
+
import io
|
| 14 |
+
|
| 15 |
+
# Adicionar paths
|
| 16 |
+
sys.path.append('/workspace/ultravox-pipeline/ultravox')
|
| 17 |
+
sys.path.append('/workspace/ultravox-pipeline/protos')
|
| 18 |
+
|
| 19 |
+
# Importar os protobuffers compilados
|
| 20 |
+
import speech_pb2
|
| 21 |
+
import speech_pb2_grpc
|
| 22 |
+
|
| 23 |
+
def generate_tts_audio(text, lang='pt-br'):
|
| 24 |
+
"""Gera áudio TTS a partir de texto"""
|
| 25 |
+
print(f"🔊 Gerando TTS: '{text}'")
|
| 26 |
+
|
| 27 |
+
tts = gTTS(text=text, lang=lang)
|
| 28 |
+
mp3_buffer = io.BytesIO()
|
| 29 |
+
tts.write_to_fp(mp3_buffer)
|
| 30 |
+
mp3_buffer.seek(0)
|
| 31 |
+
|
| 32 |
+
# Converter MP3 para PCM 16kHz
|
| 33 |
+
audio = AudioSegment.from_mp3(mp3_buffer)
|
| 34 |
+
audio = audio.set_frame_rate(16000).set_channels(1).set_sample_width(2)
|
| 35 |
+
|
| 36 |
+
# Converter para numpy float32
|
| 37 |
+
samples = np.array(audio.get_array_of_samples()).astype(np.float32) / 32768.0
|
| 38 |
+
|
| 39 |
+
return samples
|
| 40 |
+
|
| 41 |
+
def test_ultravox(question, expected_answer=None):
|
| 42 |
+
"""Testa o Ultravox com uma pergunta"""
|
| 43 |
+
|
| 44 |
+
print(f"\n{'='*60}")
|
| 45 |
+
print(f"📝 Pergunta: {question}")
|
| 46 |
+
if expected_answer:
|
| 47 |
+
print(f"✅ Resposta esperada: {expected_answer}")
|
| 48 |
+
print(f"{'='*60}")
|
| 49 |
+
|
| 50 |
+
# Gerar áudio da pergunta
|
| 51 |
+
audio = generate_tts_audio(question)
|
| 52 |
+
print(f"🎵 Áudio gerado: {len(audio)} samples @ 16kHz ({len(audio)/16000:.2f}s)")
|
| 53 |
+
|
| 54 |
+
# Conectar ao Ultravox
|
| 55 |
+
print("📡 Conectando ao Ultravox...")
|
| 56 |
+
channel = grpc.insecure_channel('localhost:50051')
|
| 57 |
+
stub = speech_pb2_grpc.SpeechServiceStub(channel)
|
| 58 |
+
|
| 59 |
+
# Criar requisição
|
| 60 |
+
def audio_generator():
|
| 61 |
+
chunk = speech_pb2.AudioChunk()
|
| 62 |
+
chunk.audio_data = audio.tobytes()
|
| 63 |
+
chunk.sample_rate = 16000
|
| 64 |
+
chunk.is_final_chunk = True
|
| 65 |
+
chunk.session_id = f"test_{int(time.time())}"
|
| 66 |
+
# Não enviar system_prompt - deixar o servidor usar o padrão com <|audio|>
|
| 67 |
+
# chunk.system_prompt = ""
|
| 68 |
+
yield chunk
|
| 69 |
+
|
| 70 |
+
# Enviar e receber resposta
|
| 71 |
+
print("⏳ Processando...")
|
| 72 |
+
start_time = time.time()
|
| 73 |
+
|
| 74 |
+
try:
|
| 75 |
+
response_text = ""
|
| 76 |
+
token_count = 0
|
| 77 |
+
|
| 78 |
+
for response in stub.StreamingRecognize(audio_generator()):
|
| 79 |
+
if response.text:
|
| 80 |
+
response_text += response.text
|
| 81 |
+
token_count += 1
|
| 82 |
+
print(f" Token {token_count}: '{response.text.strip()}'", end="")
|
| 83 |
+
|
| 84 |
+
if response.is_final:
|
| 85 |
+
print(" [FINAL]")
|
| 86 |
+
break
|
| 87 |
+
else:
|
| 88 |
+
print()
|
| 89 |
+
|
| 90 |
+
elapsed = time.time() - start_time
|
| 91 |
+
|
| 92 |
+
print(f"\n📊 Estatísticas:")
|
| 93 |
+
print(f" - Resposta: '{response_text.strip()}'")
|
| 94 |
+
print(f" - Tempo: {elapsed:.2f}s")
|
| 95 |
+
print(f" - Tokens: {token_count}")
|
| 96 |
+
|
| 97 |
+
# Verificar resposta esperada
|
| 98 |
+
if expected_answer:
|
| 99 |
+
if expected_answer.lower() in response_text.lower():
|
| 100 |
+
print(f" ✅ SUCESSO! Resposta contém '{expected_answer}'")
|
| 101 |
+
return True
|
| 102 |
+
else:
|
| 103 |
+
print(f" ⚠️ AVISO: Resposta não contém '{expected_answer}'")
|
| 104 |
+
return False
|
| 105 |
+
|
| 106 |
+
return True
|
| 107 |
+
|
| 108 |
+
except grpc.RpcError as e:
|
| 109 |
+
print(f"❌ Erro gRPC: {e.code()} - {e.details()}")
|
| 110 |
+
return False
|
| 111 |
+
except Exception as e:
|
| 112 |
+
print(f"❌ Erro: {e}")
|
| 113 |
+
return False
|
| 114 |
+
|
| 115 |
+
def main():
|
| 116 |
+
"""Executa bateria de testes"""
|
| 117 |
+
|
| 118 |
+
print("\n" + "="*60)
|
| 119 |
+
print("🚀 TESTE AUTOMATIZADO DO ULTRAVOX")
|
| 120 |
+
print("="*60)
|
| 121 |
+
|
| 122 |
+
# Lista de testes
|
| 123 |
+
tests = [
|
| 124 |
+
{
|
| 125 |
+
"question": "Quanto é dois mais dois?",
|
| 126 |
+
"expected": "quatro"
|
| 127 |
+
},
|
| 128 |
+
{
|
| 129 |
+
"question": "Qual é a capital do Brasil?",
|
| 130 |
+
"expected": "Brasília"
|
| 131 |
+
},
|
| 132 |
+
{
|
| 133 |
+
"question": "Que dia é hoje?",
|
| 134 |
+
"expected": None # Resposta variável
|
| 135 |
+
},
|
| 136 |
+
{
|
| 137 |
+
"question": "Olá, como você está?",
|
| 138 |
+
"expected": None # Resposta variável
|
| 139 |
+
}
|
| 140 |
+
]
|
| 141 |
+
|
| 142 |
+
# Executar testes
|
| 143 |
+
results = []
|
| 144 |
+
for i, test in enumerate(tests, 1):
|
| 145 |
+
print(f"\n🧪 TESTE {i}/{len(tests)}")
|
| 146 |
+
success = test_ultravox(test["question"], test.get("expected"))
|
| 147 |
+
results.append(success)
|
| 148 |
+
time.sleep(2) # Pausa entre testes
|
| 149 |
+
|
| 150 |
+
# Resumo
|
| 151 |
+
print("\n" + "="*60)
|
| 152 |
+
print("📊 RESUMO DOS TESTES")
|
| 153 |
+
print("="*60)
|
| 154 |
+
|
| 155 |
+
total = len(results)
|
| 156 |
+
passed = sum(1 for r in results if r)
|
| 157 |
+
failed = total - passed
|
| 158 |
+
|
| 159 |
+
print(f"Total: {total}")
|
| 160 |
+
print(f"✅ Passou: {passed}")
|
| 161 |
+
print(f"❌ Falhou: {failed}")
|
| 162 |
+
print(f"Taxa de sucesso: {(passed/total)*100:.1f}%")
|
| 163 |
+
|
| 164 |
+
if passed == total:
|
| 165 |
+
print("\n🎉 TODOS OS TESTES PASSARAM!")
|
| 166 |
+
return 0
|
| 167 |
+
else:
|
| 168 |
+
print(f"\n⚠️ {failed} teste(s) falharam")
|
| 169 |
+
return 1
|
| 170 |
+
|
| 171 |
+
if __name__ == "__main__":
|
| 172 |
+
sys.exit(main())
|
test-ultravox-librosa.py
ADDED
|
@@ -0,0 +1,166 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Teste do Ultravox com formato de áudio correto usando librosa
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import sys
|
| 7 |
+
sys.path.append('/workspace/ultravox-pipeline/ultravox')
|
| 8 |
+
|
| 9 |
+
from vllm import LLM, SamplingParams
|
| 10 |
+
import numpy as np
|
| 11 |
+
import librosa
|
| 12 |
+
import soundfile as sf
|
| 13 |
+
import tempfile
|
| 14 |
+
from gtts import gTTS
|
| 15 |
+
import time
|
| 16 |
+
import os
|
| 17 |
+
|
| 18 |
+
def generate_audio_librosa(text, lang='pt-br'):
|
| 19 |
+
"""Gera áudio TTS e converte para formato esperado pelo Ultravox"""
|
| 20 |
+
print(f"🔊 Gerando TTS: '{text}'")
|
| 21 |
+
|
| 22 |
+
# Criar arquivo temporário para o TTS
|
| 23 |
+
with tempfile.NamedTemporaryFile(suffix='.mp3', delete=False) as tmp_file:
|
| 24 |
+
tmp_path = tmp_file.name
|
| 25 |
+
|
| 26 |
+
try:
|
| 27 |
+
# Gerar TTS como MP3
|
| 28 |
+
tts = gTTS(text=text, lang=lang)
|
| 29 |
+
tts.save(tmp_path)
|
| 30 |
+
|
| 31 |
+
# Carregar com librosa (converte automaticamente para float32 normalizado)
|
| 32 |
+
# librosa normaliza entre -1 e 1 automaticamente
|
| 33 |
+
audio, sr = librosa.load(tmp_path, sr=16000)
|
| 34 |
+
|
| 35 |
+
print(f"📊 Áudio carregado com librosa:")
|
| 36 |
+
print(f" - Shape: {audio.shape}")
|
| 37 |
+
print(f" - Dtype: {audio.dtype}")
|
| 38 |
+
print(f" - Min: {audio.min():.3f}, Max: {audio.max():.3f}")
|
| 39 |
+
print(f" - Sample rate: {sr} Hz")
|
| 40 |
+
|
| 41 |
+
return audio, sr
|
| 42 |
+
|
| 43 |
+
finally:
|
| 44 |
+
# Limpar arquivo temporário
|
| 45 |
+
if os.path.exists(tmp_path):
|
| 46 |
+
os.unlink(tmp_path)
|
| 47 |
+
|
| 48 |
+
def test_ultravox_librosa():
|
| 49 |
+
"""Testa Ultravox com formato de áudio correto"""
|
| 50 |
+
|
| 51 |
+
print("=" * 60)
|
| 52 |
+
print("🚀 TESTE ULTRAVOX COM LIBROSA (FORMATO CORRETO)")
|
| 53 |
+
print("=" * 60)
|
| 54 |
+
|
| 55 |
+
# Configurar modelo
|
| 56 |
+
model_name = "fixie-ai/ultravox-v0_5-llama-3_2-1b"
|
| 57 |
+
|
| 58 |
+
# Inicializar LLM
|
| 59 |
+
print(f"📡 Inicializando {model_name}...")
|
| 60 |
+
llm = LLM(
|
| 61 |
+
model=model_name,
|
| 62 |
+
trust_remote_code=True,
|
| 63 |
+
enforce_eager=True,
|
| 64 |
+
max_model_len=256,
|
| 65 |
+
gpu_memory_utilization=0.3
|
| 66 |
+
)
|
| 67 |
+
|
| 68 |
+
# Parâmetros de sampling
|
| 69 |
+
sampling_params = SamplingParams(
|
| 70 |
+
temperature=0.3,
|
| 71 |
+
max_tokens=50,
|
| 72 |
+
repetition_penalty=1.1
|
| 73 |
+
)
|
| 74 |
+
|
| 75 |
+
# Lista de testes
|
| 76 |
+
tests = [
|
| 77 |
+
("Quanto é dois mais dois?", "pt-br", "quatro"),
|
| 78 |
+
("Qual é a capital do Brasil?", "pt-br", "Brasília"),
|
| 79 |
+
("What is two plus two?", "en", "four"),
|
| 80 |
+
]
|
| 81 |
+
|
| 82 |
+
results = []
|
| 83 |
+
|
| 84 |
+
for question, lang, expected in tests:
|
| 85 |
+
print(f"\n{'='*50}")
|
| 86 |
+
print(f"📝 Pergunta: {question}")
|
| 87 |
+
print(f"✅ Esperado: {expected}")
|
| 88 |
+
|
| 89 |
+
# Gerar áudio com librosa
|
| 90 |
+
audio, sr = generate_audio_librosa(question, lang)
|
| 91 |
+
|
| 92 |
+
# Preparar prompt com token de áudio
|
| 93 |
+
prompt = "<|audio|>"
|
| 94 |
+
|
| 95 |
+
# Preparar entrada com áudio
|
| 96 |
+
llm_input = {
|
| 97 |
+
"prompt": prompt,
|
| 98 |
+
"multi_modal_data": {
|
| 99 |
+
"audio": audio # Agora no formato correto do librosa
|
| 100 |
+
}
|
| 101 |
+
}
|
| 102 |
+
|
| 103 |
+
# Fazer inferência
|
| 104 |
+
print("⏳ Processando...")
|
| 105 |
+
start_time = time.time()
|
| 106 |
+
|
| 107 |
+
try:
|
| 108 |
+
outputs = llm.generate(
|
| 109 |
+
prompts=[llm_input],
|
| 110 |
+
sampling_params=sampling_params
|
| 111 |
+
)
|
| 112 |
+
|
| 113 |
+
elapsed = time.time() - start_time
|
| 114 |
+
|
| 115 |
+
# Extrair resposta
|
| 116 |
+
response = outputs[0].outputs[0].text.strip()
|
| 117 |
+
|
| 118 |
+
# Verificar se a resposta contém o esperado
|
| 119 |
+
success = expected.lower() in response.lower() if expected else False
|
| 120 |
+
|
| 121 |
+
print(f"💬 Resposta: '{response}'")
|
| 122 |
+
print(f"⏱️ Tempo: {elapsed:.2f}s")
|
| 123 |
+
|
| 124 |
+
if success:
|
| 125 |
+
print(f"✅ SUCESSO! Resposta contém '{expected}'")
|
| 126 |
+
else:
|
| 127 |
+
print(f"⚠️ Resposta não contém '{expected}'")
|
| 128 |
+
|
| 129 |
+
results.append({
|
| 130 |
+
'question': question,
|
| 131 |
+
'expected': expected,
|
| 132 |
+
'response': response,
|
| 133 |
+
'success': success,
|
| 134 |
+
'time': elapsed
|
| 135 |
+
})
|
| 136 |
+
|
| 137 |
+
except Exception as e:
|
| 138 |
+
print(f"❌ Erro: {e}")
|
| 139 |
+
results.append({
|
| 140 |
+
'question': question,
|
| 141 |
+
'expected': expected,
|
| 142 |
+
'response': str(e),
|
| 143 |
+
'success': False,
|
| 144 |
+
'time': 0
|
| 145 |
+
})
|
| 146 |
+
|
| 147 |
+
# Resumo
|
| 148 |
+
print("\n" + "=" * 60)
|
| 149 |
+
print("📊 RESUMO DOS TESTES")
|
| 150 |
+
print("=" * 60)
|
| 151 |
+
|
| 152 |
+
total = len(results)
|
| 153 |
+
passed = sum(1 for r in results if r['success'])
|
| 154 |
+
|
| 155 |
+
for i, result in enumerate(results, 1):
|
| 156 |
+
status = "✅" if result['success'] else "❌"
|
| 157 |
+
print(f"{status} Teste {i}: {result['question'][:30]}...")
|
| 158 |
+
print(f" Resposta: {result['response'][:50]}...")
|
| 159 |
+
|
| 160 |
+
print(f"\nTotal: {total}")
|
| 161 |
+
print(f"✅ Passou: {passed}")
|
| 162 |
+
print(f"❌ Falhou: {total - passed}")
|
| 163 |
+
print(f"Taxa de sucesso: {(passed/total)*100:.1f}%")
|
| 164 |
+
|
| 165 |
+
if __name__ == "__main__":
|
| 166 |
+
test_ultravox_librosa()
|
test-ultravox-simple-prompt.py
ADDED
|
@@ -0,0 +1,206 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Teste do Ultravox com prompt simples sem chat template
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import sys
|
| 7 |
+
sys.path.append('/workspace/ultravox-pipeline/ultravox')
|
| 8 |
+
|
| 9 |
+
from vllm import LLM, SamplingParams
|
| 10 |
+
import numpy as np
|
| 11 |
+
import librosa
|
| 12 |
+
import tempfile
|
| 13 |
+
from gtts import gTTS
|
| 14 |
+
import time
|
| 15 |
+
import os
|
| 16 |
+
|
| 17 |
+
def generate_audio_tuple(text, lang='pt-br'):
|
| 18 |
+
"""Gera áudio TTS e retorna como tupla (audio, sample_rate)"""
|
| 19 |
+
print(f"🔊 Gerando TTS: '{text}'")
|
| 20 |
+
|
| 21 |
+
# Criar arquivo temporário para o TTS
|
| 22 |
+
with tempfile.NamedTemporaryFile(suffix='.mp3', delete=False) as tmp_file:
|
| 23 |
+
tmp_path = tmp_file.name
|
| 24 |
+
|
| 25 |
+
try:
|
| 26 |
+
# Gerar TTS como MP3
|
| 27 |
+
tts = gTTS(text=text, lang=lang)
|
| 28 |
+
tts.save(tmp_path)
|
| 29 |
+
|
| 30 |
+
# Carregar com librosa (converte automaticamente para float32 normalizado)
|
| 31 |
+
audio, sr = librosa.load(tmp_path, sr=16000)
|
| 32 |
+
|
| 33 |
+
print(f"📊 Áudio carregado:")
|
| 34 |
+
print(f" - Shape: {audio.shape}")
|
| 35 |
+
print(f" - Dtype: {audio.dtype}")
|
| 36 |
+
print(f" - Min: {audio.min():.3f}, Max: {audio.max():.3f}")
|
| 37 |
+
print(f" - Sample rate: {sr} Hz")
|
| 38 |
+
|
| 39 |
+
# Retornar como tupla (audio, sample_rate) - formato esperado pelo vLLM
|
| 40 |
+
return (audio, sr)
|
| 41 |
+
|
| 42 |
+
finally:
|
| 43 |
+
# Limpar arquivo temporário
|
| 44 |
+
if os.path.exists(tmp_path):
|
| 45 |
+
os.unlink(tmp_path)
|
| 46 |
+
|
| 47 |
+
def test_ultravox_simple():
|
| 48 |
+
"""Testa Ultravox com prompt simples"""
|
| 49 |
+
|
| 50 |
+
print("=" * 60)
|
| 51 |
+
print("🚀 TESTE ULTRAVOX COM PROMPT SIMPLES")
|
| 52 |
+
print("=" * 60)
|
| 53 |
+
|
| 54 |
+
# Configurar modelo
|
| 55 |
+
model_name = "fixie-ai/ultravox-v0_5-llama-3_2-1b"
|
| 56 |
+
|
| 57 |
+
# Inicializar LLM
|
| 58 |
+
print(f"📡 Inicializando {model_name}...")
|
| 59 |
+
llm = LLM(
|
| 60 |
+
model=model_name,
|
| 61 |
+
trust_remote_code=True,
|
| 62 |
+
enforce_eager=True,
|
| 63 |
+
max_model_len=4096,
|
| 64 |
+
gpu_memory_utilization=0.3
|
| 65 |
+
)
|
| 66 |
+
|
| 67 |
+
# Parâmetros de sampling
|
| 68 |
+
sampling_params = SamplingParams(
|
| 69 |
+
temperature=0.2,
|
| 70 |
+
max_tokens=64
|
| 71 |
+
)
|
| 72 |
+
|
| 73 |
+
# Lista de testes com diferentes formatos de prompt
|
| 74 |
+
tests = [
|
| 75 |
+
{
|
| 76 |
+
"audio_text": "Quanto é dois mais dois?",
|
| 77 |
+
"prompts": [
|
| 78 |
+
"<|audio|>", # Apenas o token
|
| 79 |
+
"<|audio|>\nResponda em português:", # Com instrução
|
| 80 |
+
"<|audio|>\nO que foi perguntado no áudio?", # Com pergunta
|
| 81 |
+
],
|
| 82 |
+
"lang": "pt-br",
|
| 83 |
+
"expected": ["quatro", "4", "dois mais dois", "2+2"]
|
| 84 |
+
},
|
| 85 |
+
{
|
| 86 |
+
"audio_text": "What is the capital of France?",
|
| 87 |
+
"prompts": [
|
| 88 |
+
"<|audio|>",
|
| 89 |
+
"<|audio|>\nAnswer the question:",
|
| 90 |
+
"<|audio|>\nWhat did you hear?",
|
| 91 |
+
],
|
| 92 |
+
"lang": "en",
|
| 93 |
+
"expected": ["Paris", "capital", "France"]
|
| 94 |
+
}
|
| 95 |
+
]
|
| 96 |
+
|
| 97 |
+
results = []
|
| 98 |
+
|
| 99 |
+
for test in tests:
|
| 100 |
+
audio_tuple = generate_audio_tuple(test['audio_text'], test['lang'])
|
| 101 |
+
|
| 102 |
+
for prompt in test['prompts']:
|
| 103 |
+
print(f"\n{'='*50}")
|
| 104 |
+
print(f"📝 Áudio: {test['audio_text']}")
|
| 105 |
+
print(f"📝 Prompt: {prompt[:50]}...")
|
| 106 |
+
print(f"✅ Esperado: {', '.join(test['expected'])}")
|
| 107 |
+
|
| 108 |
+
# Preparar entrada com áudio no formato de tupla
|
| 109 |
+
llm_input = {
|
| 110 |
+
"prompt": prompt,
|
| 111 |
+
"multi_modal_data": {
|
| 112 |
+
"audio": [audio_tuple] # Lista de tuplas (audio, sample_rate)
|
| 113 |
+
}
|
| 114 |
+
}
|
| 115 |
+
|
| 116 |
+
# Fazer inferência
|
| 117 |
+
print("⏳ Processando...")
|
| 118 |
+
start_time = time.time()
|
| 119 |
+
|
| 120 |
+
try:
|
| 121 |
+
outputs = llm.generate(
|
| 122 |
+
prompts=[llm_input],
|
| 123 |
+
sampling_params=sampling_params
|
| 124 |
+
)
|
| 125 |
+
|
| 126 |
+
elapsed = time.time() - start_time
|
| 127 |
+
|
| 128 |
+
# Extrair resposta
|
| 129 |
+
response = outputs[0].outputs[0].text.strip()
|
| 130 |
+
|
| 131 |
+
# Verificar se a resposta contém algum dos esperados
|
| 132 |
+
success = any(exp.lower() in response.lower() for exp in test['expected'])
|
| 133 |
+
|
| 134 |
+
print(f"💬 Resposta: '{response[:100]}...'")
|
| 135 |
+
print(f"⏱️ Tempo: {elapsed:.2f}s")
|
| 136 |
+
|
| 137 |
+
if success:
|
| 138 |
+
print(f"✅ SUCESSO! Resposta reconhecida")
|
| 139 |
+
else:
|
| 140 |
+
print(f"⚠️ Resposta não reconhecida")
|
| 141 |
+
|
| 142 |
+
results.append({
|
| 143 |
+
'audio': test['audio_text'],
|
| 144 |
+
'prompt': prompt[:30],
|
| 145 |
+
'response': response,
|
| 146 |
+
'success': success,
|
| 147 |
+
'time': elapsed
|
| 148 |
+
})
|
| 149 |
+
|
| 150 |
+
except Exception as e:
|
| 151 |
+
print(f"❌ Erro: {e}")
|
| 152 |
+
results.append({
|
| 153 |
+
'audio': test['audio_text'],
|
| 154 |
+
'prompt': prompt[:30],
|
| 155 |
+
'response': str(e),
|
| 156 |
+
'success': False,
|
| 157 |
+
'time': 0
|
| 158 |
+
})
|
| 159 |
+
|
| 160 |
+
# Resumo
|
| 161 |
+
print("\n" + "=" * 60)
|
| 162 |
+
print("📊 RESUMO DOS TESTES")
|
| 163 |
+
print("=" * 60)
|
| 164 |
+
|
| 165 |
+
total = len(results)
|
| 166 |
+
passed = sum(1 for r in results if r['success'])
|
| 167 |
+
|
| 168 |
+
# Agrupar por áudio
|
| 169 |
+
audio_groups = {}
|
| 170 |
+
for result in results:
|
| 171 |
+
if result['audio'] not in audio_groups:
|
| 172 |
+
audio_groups[result['audio']] = []
|
| 173 |
+
audio_groups[result['audio']].append(result)
|
| 174 |
+
|
| 175 |
+
for audio, group in audio_groups.items():
|
| 176 |
+
print(f"\n📝 Áudio: {audio}")
|
| 177 |
+
for result in group:
|
| 178 |
+
status = "✅" if result['success'] else "❌"
|
| 179 |
+
print(f" {status} Prompt: {result['prompt']}...")
|
| 180 |
+
print(f" Resposta: {result['response'][:60]}...")
|
| 181 |
+
|
| 182 |
+
print(f"\n📊 Estatísticas:")
|
| 183 |
+
print(f"Total de testes: {total}")
|
| 184 |
+
print(f"✅ Passou: {passed}")
|
| 185 |
+
print(f"❌ Falhou: {total - passed}")
|
| 186 |
+
print(f"Taxa de sucesso: {(passed/total)*100:.1f}%")
|
| 187 |
+
|
| 188 |
+
# Encontrar o melhor prompt
|
| 189 |
+
prompt_success = {}
|
| 190 |
+
for result in results:
|
| 191 |
+
prompt_key = result['prompt']
|
| 192 |
+
if prompt_key not in prompt_success:
|
| 193 |
+
prompt_success[prompt_key] = {'success': 0, 'total': 0}
|
| 194 |
+
prompt_success[prompt_key]['total'] += 1
|
| 195 |
+
if result['success']:
|
| 196 |
+
prompt_success[prompt_key]['success'] += 1
|
| 197 |
+
|
| 198 |
+
print(f"\n🏆 Melhor formato de prompt:")
|
| 199 |
+
for prompt, stats in sorted(prompt_success.items(),
|
| 200 |
+
key=lambda x: x[1]['success']/x[1]['total'],
|
| 201 |
+
reverse=True):
|
| 202 |
+
rate = (stats['success']/stats['total'])*100
|
| 203 |
+
print(f" {rate:.0f}% - {prompt}...")
|
| 204 |
+
|
| 205 |
+
if __name__ == "__main__":
|
| 206 |
+
test_ultravox_simple()
|
test-ultravox-tts.py
ADDED
|
@@ -0,0 +1,121 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Script de teste para Ultravox com TTS
|
| 4 |
+
Envia uma pergunta via áudio sintetizado e verifica a resposta
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import grpc
|
| 8 |
+
import numpy as np
|
| 9 |
+
import asyncio
|
| 10 |
+
import time
|
| 11 |
+
from gtts import gTTS
|
| 12 |
+
from pydub import AudioSegment
|
| 13 |
+
import io
|
| 14 |
+
import sys
|
| 15 |
+
import os
|
| 16 |
+
|
| 17 |
+
# Adicionar o path para os protobuffers
|
| 18 |
+
sys.path.append('/workspace/ultravox-pipeline/ultravox')
|
| 19 |
+
import speech_pb2
|
| 20 |
+
import speech_pb2_grpc
|
| 21 |
+
|
| 22 |
+
async def test_ultravox_with_tts():
|
| 23 |
+
"""Testa o Ultravox enviando áudio TTS com a pergunta 'Quanto é 2 + 2?'"""
|
| 24 |
+
|
| 25 |
+
print("🎤 Iniciando teste do Ultravox com TTS...")
|
| 26 |
+
|
| 27 |
+
# 1. Gerar áudio TTS com a pergunta
|
| 28 |
+
print("🔊 Gerando áudio TTS: 'Quanto é dois mais dois?'")
|
| 29 |
+
tts = gTTS(text="Quanto é dois mais dois?", lang='pt-br')
|
| 30 |
+
|
| 31 |
+
# Salvar em buffer de memória
|
| 32 |
+
mp3_buffer = io.BytesIO()
|
| 33 |
+
tts.write_to_fp(mp3_buffer)
|
| 34 |
+
mp3_buffer.seek(0)
|
| 35 |
+
|
| 36 |
+
# Converter MP3 para PCM 16kHz
|
| 37 |
+
audio = AudioSegment.from_mp3(mp3_buffer)
|
| 38 |
+
audio = audio.set_frame_rate(16000).set_channels(1).set_sample_width(2)
|
| 39 |
+
|
| 40 |
+
# Converter para numpy array float32
|
| 41 |
+
samples = np.array(audio.get_array_of_samples()).astype(np.float32) / 32768.0
|
| 42 |
+
|
| 43 |
+
print(f"✅ Áudio gerado: {len(samples)} samples @ 16kHz")
|
| 44 |
+
print(f" Duração: {len(samples)/16000:.2f} segundos")
|
| 45 |
+
|
| 46 |
+
# 2. Conectar ao servidor Ultravox
|
| 47 |
+
print("\n📡 Conectando ao Ultravox na porta 50051...")
|
| 48 |
+
|
| 49 |
+
try:
|
| 50 |
+
channel = grpc.aio.insecure_channel('localhost:50051')
|
| 51 |
+
stub = speech_pb2_grpc.UltravoxServiceStub(channel)
|
| 52 |
+
|
| 53 |
+
# 3. Criar request com o áudio
|
| 54 |
+
session_id = f"test_{int(time.time())}"
|
| 55 |
+
|
| 56 |
+
async def audio_generator():
|
| 57 |
+
"""Gera chunks de áudio para enviar"""
|
| 58 |
+
request = speech_pb2.AudioRequest()
|
| 59 |
+
request.session_id = session_id
|
| 60 |
+
request.audio_data = samples.tobytes()
|
| 61 |
+
request.sample_rate = 16000
|
| 62 |
+
request.is_final_chunk = True
|
| 63 |
+
request.system_prompt = "Responda em português de forma simples e direta"
|
| 64 |
+
|
| 65 |
+
print(f"📤 Enviando áudio para sessão: {session_id}")
|
| 66 |
+
yield request
|
| 67 |
+
|
| 68 |
+
# 4. Enviar e receber resposta
|
| 69 |
+
print("\n⏳ Aguardando resposta do Ultravox...")
|
| 70 |
+
start_time = time.time()
|
| 71 |
+
|
| 72 |
+
response_text = ""
|
| 73 |
+
token_count = 0
|
| 74 |
+
|
| 75 |
+
async for response in stub.TranscribeStream(audio_generator()):
|
| 76 |
+
if response.text:
|
| 77 |
+
response_text += response.text
|
| 78 |
+
token_count += 1
|
| 79 |
+
print(f" Token {token_count}: '{response.text.strip()}'")
|
| 80 |
+
|
| 81 |
+
if response.is_final:
|
| 82 |
+
break
|
| 83 |
+
|
| 84 |
+
elapsed = time.time() - start_time
|
| 85 |
+
|
| 86 |
+
# 5. Verificar resposta
|
| 87 |
+
print(f"\n📝 Resposta completa: '{response_text.strip()}'")
|
| 88 |
+
print(f"⏱️ Tempo de resposta: {elapsed:.2f}s")
|
| 89 |
+
print(f"📊 Tokens recebidos: {token_count}")
|
| 90 |
+
|
| 91 |
+
# Verificar se a resposta contém "4" ou "quatro"
|
| 92 |
+
if "4" in response_text.lower() or "quatro" in response_text.lower():
|
| 93 |
+
print("\n✅ SUCESSO! O Ultravox respondeu corretamente!")
|
| 94 |
+
else:
|
| 95 |
+
print("\n⚠️ AVISO: A resposta não contém '4' ou 'quatro'")
|
| 96 |
+
|
| 97 |
+
await channel.close()
|
| 98 |
+
|
| 99 |
+
except grpc.RpcError as e:
|
| 100 |
+
print(f"\n❌ Erro gRPC: {e.code()} - {e.details()}")
|
| 101 |
+
return False
|
| 102 |
+
except Exception as e:
|
| 103 |
+
print(f"\n❌ Erro: {e}")
|
| 104 |
+
return False
|
| 105 |
+
|
| 106 |
+
return True
|
| 107 |
+
|
| 108 |
+
if __name__ == "__main__":
|
| 109 |
+
print("=" * 60)
|
| 110 |
+
print("TESTE ULTRAVOX COM TTS")
|
| 111 |
+
print("=" * 60)
|
| 112 |
+
|
| 113 |
+
# Executar teste
|
| 114 |
+
success = asyncio.run(test_ultravox_with_tts())
|
| 115 |
+
|
| 116 |
+
if success:
|
| 117 |
+
print("\n🎉 Teste concluído com sucesso!")
|
| 118 |
+
else:
|
| 119 |
+
print("\n❌ Teste falhou!")
|
| 120 |
+
|
| 121 |
+
print("=" * 60)
|
test-ultravox-tuple.py
ADDED
|
@@ -0,0 +1,202 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Teste do Ultravox com formato correto de tupla (audio, sample_rate)
|
| 4 |
+
Baseado no exemplo oficial do vLLM
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import sys
|
| 8 |
+
sys.path.append('/workspace/ultravox-pipeline/ultravox')
|
| 9 |
+
|
| 10 |
+
from vllm import LLM, SamplingParams
|
| 11 |
+
import numpy as np
|
| 12 |
+
import librosa
|
| 13 |
+
import tempfile
|
| 14 |
+
from gtts import gTTS
|
| 15 |
+
import time
|
| 16 |
+
import os
|
| 17 |
+
from transformers import AutoTokenizer
|
| 18 |
+
|
| 19 |
+
def generate_audio_tuple(text, lang='pt-br'):
|
| 20 |
+
"""Gera áudio TTS e retorna como tupla (audio, sample_rate)"""
|
| 21 |
+
print(f"🔊 Gerando TTS: '{text}'")
|
| 22 |
+
|
| 23 |
+
# Criar arquivo temporário para o TTS
|
| 24 |
+
with tempfile.NamedTemporaryFile(suffix='.mp3', delete=False) as tmp_file:
|
| 25 |
+
tmp_path = tmp_file.name
|
| 26 |
+
|
| 27 |
+
try:
|
| 28 |
+
# Gerar TTS como MP3
|
| 29 |
+
tts = gTTS(text=text, lang=lang)
|
| 30 |
+
tts.save(tmp_path)
|
| 31 |
+
|
| 32 |
+
# Carregar com librosa (converte automaticamente para float32 normalizado)
|
| 33 |
+
audio, sr = librosa.load(tmp_path, sr=16000)
|
| 34 |
+
|
| 35 |
+
print(f"📊 Áudio carregado:")
|
| 36 |
+
print(f" - Shape: {audio.shape}")
|
| 37 |
+
print(f" - Dtype: {audio.dtype}")
|
| 38 |
+
print(f" - Min: {audio.min():.3f}, Max: {audio.max():.3f}")
|
| 39 |
+
print(f" - Sample rate: {sr} Hz")
|
| 40 |
+
|
| 41 |
+
# Retornar como tupla (audio, sample_rate) - formato esperado pelo vLLM
|
| 42 |
+
return (audio, sr)
|
| 43 |
+
|
| 44 |
+
finally:
|
| 45 |
+
# Limpar arquivo temporário
|
| 46 |
+
if os.path.exists(tmp_path):
|
| 47 |
+
os.unlink(tmp_path)
|
| 48 |
+
|
| 49 |
+
def test_ultravox_tuple():
|
| 50 |
+
"""Testa Ultravox com formato de tupla correto"""
|
| 51 |
+
|
| 52 |
+
print("=" * 60)
|
| 53 |
+
print("🚀 TESTE ULTRAVOX COM FORMATO DE TUPLA")
|
| 54 |
+
print("=" * 60)
|
| 55 |
+
|
| 56 |
+
# Configurar modelo
|
| 57 |
+
model_name = "fixie-ai/ultravox-v0_5-llama-3_2-1b"
|
| 58 |
+
|
| 59 |
+
# Inicializar tokenizer
|
| 60 |
+
print(f"📡 Inicializando tokenizer...")
|
| 61 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 62 |
+
|
| 63 |
+
# Inicializar LLM
|
| 64 |
+
print(f"📡 Inicializando {model_name}...")
|
| 65 |
+
llm = LLM(
|
| 66 |
+
model=model_name,
|
| 67 |
+
trust_remote_code=True,
|
| 68 |
+
enforce_eager=True,
|
| 69 |
+
max_model_len=4096, # Aumentando para 4096 como no exemplo oficial
|
| 70 |
+
gpu_memory_utilization=0.3
|
| 71 |
+
)
|
| 72 |
+
|
| 73 |
+
# Parâmetros de sampling
|
| 74 |
+
sampling_params = SamplingParams(
|
| 75 |
+
temperature=0.2, # Usando 0.2 como no exemplo oficial
|
| 76 |
+
max_tokens=64 # Usando 64 como no exemplo oficial
|
| 77 |
+
)
|
| 78 |
+
|
| 79 |
+
# Lista de testes
|
| 80 |
+
tests = [
|
| 81 |
+
{
|
| 82 |
+
"audio_text": "Quanto é dois mais dois?",
|
| 83 |
+
"question": "O que foi perguntado no áudio?",
|
| 84 |
+
"lang": "pt-br",
|
| 85 |
+
"expected": ["quatro", "2+2", "dois mais dois"]
|
| 86 |
+
},
|
| 87 |
+
{
|
| 88 |
+
"audio_text": "Qual é a capital do Brasil?",
|
| 89 |
+
"question": "Responda a pergunta que você ouviu.",
|
| 90 |
+
"lang": "pt-br",
|
| 91 |
+
"expected": ["Brasília", "capital", "Brasil"]
|
| 92 |
+
},
|
| 93 |
+
{
|
| 94 |
+
"audio_text": "What is two plus two?",
|
| 95 |
+
"question": "Answer the question you heard.",
|
| 96 |
+
"lang": "en",
|
| 97 |
+
"expected": ["four", "4", "two plus two"]
|
| 98 |
+
}
|
| 99 |
+
]
|
| 100 |
+
|
| 101 |
+
results = []
|
| 102 |
+
|
| 103 |
+
for test in tests:
|
| 104 |
+
print(f"\n{'='*50}")
|
| 105 |
+
print(f"📝 Áudio: {test['audio_text']}")
|
| 106 |
+
print(f"❓ Pergunta: {test['question']}")
|
| 107 |
+
print(f"✅ Esperado: {', '.join(test['expected'])}")
|
| 108 |
+
|
| 109 |
+
# Gerar áudio como tupla
|
| 110 |
+
audio_tuple = generate_audio_tuple(test['audio_text'], test['lang'])
|
| 111 |
+
|
| 112 |
+
# Criar mensagem com token de áudio
|
| 113 |
+
messages = [{
|
| 114 |
+
"role": "user",
|
| 115 |
+
"content": f"<|audio|>\n{test['question']}"
|
| 116 |
+
}]
|
| 117 |
+
|
| 118 |
+
# Aplicar chat template
|
| 119 |
+
prompt = tokenizer.apply_chat_template(
|
| 120 |
+
messages,
|
| 121 |
+
tokenize=False,
|
| 122 |
+
add_generation_prompt=True
|
| 123 |
+
)
|
| 124 |
+
|
| 125 |
+
print(f"📝 Prompt gerado: {prompt[:100]}...")
|
| 126 |
+
|
| 127 |
+
# Preparar entrada com áudio no formato de tupla
|
| 128 |
+
llm_input = {
|
| 129 |
+
"prompt": prompt,
|
| 130 |
+
"multi_modal_data": {
|
| 131 |
+
"audio": [audio_tuple] # Lista de tuplas (audio, sample_rate)
|
| 132 |
+
}
|
| 133 |
+
}
|
| 134 |
+
|
| 135 |
+
# Fazer inferência
|
| 136 |
+
print("⏳ Processando...")
|
| 137 |
+
start_time = time.time()
|
| 138 |
+
|
| 139 |
+
try:
|
| 140 |
+
outputs = llm.generate(
|
| 141 |
+
prompts=[llm_input],
|
| 142 |
+
sampling_params=sampling_params
|
| 143 |
+
)
|
| 144 |
+
|
| 145 |
+
elapsed = time.time() - start_time
|
| 146 |
+
|
| 147 |
+
# Extrair resposta
|
| 148 |
+
response = outputs[0].outputs[0].text.strip()
|
| 149 |
+
|
| 150 |
+
# Verificar se a resposta contém algum dos esperados
|
| 151 |
+
success = any(exp.lower() in response.lower() for exp in test['expected'])
|
| 152 |
+
|
| 153 |
+
print(f"💬 Resposta: '{response}'")
|
| 154 |
+
print(f"⏱️ Tempo: {elapsed:.2f}s")
|
| 155 |
+
|
| 156 |
+
if success:
|
| 157 |
+
print(f"✅ SUCESSO! Resposta reconhecida")
|
| 158 |
+
else:
|
| 159 |
+
print(f"���️ Resposta não reconhecida")
|
| 160 |
+
|
| 161 |
+
results.append({
|
| 162 |
+
'audio': test['audio_text'],
|
| 163 |
+
'question': test['question'],
|
| 164 |
+
'expected': test['expected'],
|
| 165 |
+
'response': response,
|
| 166 |
+
'success': success,
|
| 167 |
+
'time': elapsed
|
| 168 |
+
})
|
| 169 |
+
|
| 170 |
+
except Exception as e:
|
| 171 |
+
print(f"❌ Erro: {e}")
|
| 172 |
+
import traceback
|
| 173 |
+
traceback.print_exc()
|
| 174 |
+
results.append({
|
| 175 |
+
'audio': test['audio_text'],
|
| 176 |
+
'question': test['question'],
|
| 177 |
+
'expected': test['expected'],
|
| 178 |
+
'response': str(e),
|
| 179 |
+
'success': False,
|
| 180 |
+
'time': 0
|
| 181 |
+
})
|
| 182 |
+
|
| 183 |
+
# Resumo
|
| 184 |
+
print("\n" + "=" * 60)
|
| 185 |
+
print("📊 RESUMO DOS TESTES")
|
| 186 |
+
print("=" * 60)
|
| 187 |
+
|
| 188 |
+
total = len(results)
|
| 189 |
+
passed = sum(1 for r in results if r['success'])
|
| 190 |
+
|
| 191 |
+
for i, result in enumerate(results, 1):
|
| 192 |
+
status = "✅" if result['success'] else "❌"
|
| 193 |
+
print(f"{status} Teste {i}: {result['audio'][:30]}...")
|
| 194 |
+
print(f" Resposta: {result['response'][:80]}...")
|
| 195 |
+
|
| 196 |
+
print(f"\nTotal: {total}")
|
| 197 |
+
print(f"✅ Passou: {passed}")
|
| 198 |
+
print(f"❌ Falhou: {total - passed}")
|
| 199 |
+
print(f"Taxa de sucesso: {(passed/total)*100:.1f}%")
|
| 200 |
+
|
| 201 |
+
if __name__ == "__main__":
|
| 202 |
+
test_ultravox_tuple()
|
test-ultravox-vllm.py
ADDED
|
@@ -0,0 +1,113 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Teste do Ultravox usando vLLM diretamente
|
| 4 |
+
Baseado no exemplo oficial
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import sys
|
| 8 |
+
sys.path.append('/workspace/ultravox-pipeline/ultravox')
|
| 9 |
+
|
| 10 |
+
from vllm import LLM, SamplingParams
|
| 11 |
+
import numpy as np
|
| 12 |
+
from gtts import gTTS
|
| 13 |
+
from pydub import AudioSegment
|
| 14 |
+
import io
|
| 15 |
+
import time
|
| 16 |
+
|
| 17 |
+
def generate_audio(text, lang='pt-br'):
|
| 18 |
+
"""Gera áudio TTS"""
|
| 19 |
+
print(f"🔊 Gerando TTS: '{text}'")
|
| 20 |
+
|
| 21 |
+
tts = gTTS(text=text, lang=lang)
|
| 22 |
+
mp3_buffer = io.BytesIO()
|
| 23 |
+
tts.write_to_fp(mp3_buffer)
|
| 24 |
+
mp3_buffer.seek(0)
|
| 25 |
+
|
| 26 |
+
# Converter MP3 para PCM 16kHz
|
| 27 |
+
audio = AudioSegment.from_mp3(mp3_buffer)
|
| 28 |
+
audio = audio.set_frame_rate(16000).set_channels(1).set_sample_width(2)
|
| 29 |
+
|
| 30 |
+
# Converter para numpy float32
|
| 31 |
+
samples = np.array(audio.get_array_of_samples()).astype(np.float32) / 32768.0
|
| 32 |
+
|
| 33 |
+
return samples
|
| 34 |
+
|
| 35 |
+
def test_ultravox():
|
| 36 |
+
"""Testa Ultravox diretamente com vLLM"""
|
| 37 |
+
|
| 38 |
+
print("=" * 60)
|
| 39 |
+
print("🚀 TESTE DIRETO DO ULTRAVOX COM vLLM")
|
| 40 |
+
print("=" * 60)
|
| 41 |
+
|
| 42 |
+
# Configurar modelo
|
| 43 |
+
model_name = "fixie-ai/ultravox-v0_5-llama-3_2-1b"
|
| 44 |
+
|
| 45 |
+
# Inicializar LLM
|
| 46 |
+
print(f"📡 Inicializando {model_name}...")
|
| 47 |
+
llm = LLM(
|
| 48 |
+
model=model_name,
|
| 49 |
+
trust_remote_code=True,
|
| 50 |
+
enforce_eager=True,
|
| 51 |
+
max_model_len=256,
|
| 52 |
+
gpu_memory_utilization=0.3
|
| 53 |
+
)
|
| 54 |
+
|
| 55 |
+
# Parâmetros de sampling
|
| 56 |
+
sampling_params = SamplingParams(
|
| 57 |
+
temperature=0.3,
|
| 58 |
+
max_tokens=50,
|
| 59 |
+
repetition_penalty=1.1
|
| 60 |
+
)
|
| 61 |
+
|
| 62 |
+
# Lista de testes
|
| 63 |
+
tests = [
|
| 64 |
+
("What is 2 + 2?", "en"),
|
| 65 |
+
("Quanto é dois mais dois?", "pt-br"),
|
| 66 |
+
("What is the capital of Brazil?", "en")
|
| 67 |
+
]
|
| 68 |
+
|
| 69 |
+
for question, lang in tests:
|
| 70 |
+
print(f"\n📝 Pergunta: {question}")
|
| 71 |
+
|
| 72 |
+
# Gerar áudio
|
| 73 |
+
audio = generate_audio(question, lang)
|
| 74 |
+
print(f"🎵 Áudio: {len(audio)} samples @ 16kHz")
|
| 75 |
+
|
| 76 |
+
# Preparar prompt com token de áudio
|
| 77 |
+
prompt = "<|audio|>"
|
| 78 |
+
|
| 79 |
+
# Preparar entrada com áudio
|
| 80 |
+
llm_input = {
|
| 81 |
+
"prompt": prompt,
|
| 82 |
+
"multi_modal_data": {
|
| 83 |
+
"audio": audio
|
| 84 |
+
}
|
| 85 |
+
}
|
| 86 |
+
|
| 87 |
+
# Fazer inferência
|
| 88 |
+
print("⏳ Processando...")
|
| 89 |
+
start_time = time.time()
|
| 90 |
+
|
| 91 |
+
try:
|
| 92 |
+
outputs = llm.generate(
|
| 93 |
+
prompts=[llm_input],
|
| 94 |
+
sampling_params=sampling_params
|
| 95 |
+
)
|
| 96 |
+
|
| 97 |
+
elapsed = time.time() - start_time
|
| 98 |
+
|
| 99 |
+
# Extrair resposta
|
| 100 |
+
response = outputs[0].outputs[0].text
|
| 101 |
+
|
| 102 |
+
print(f"✅ Resposta: '{response}'")
|
| 103 |
+
print(f"⏱️ Tempo: {elapsed:.2f}s")
|
| 104 |
+
|
| 105 |
+
except Exception as e:
|
| 106 |
+
print(f"❌ Erro: {e}")
|
| 107 |
+
|
| 108 |
+
print("\n" + "=" * 60)
|
| 109 |
+
print("✅ TESTE CONCLUÍDO")
|
| 110 |
+
print("=" * 60)
|
| 111 |
+
|
| 112 |
+
if __name__ == "__main__":
|
| 113 |
+
test_ultravox()
|
test-vllm-openai.py
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Teste do Ultravox usando vLLM OpenAI API
|
| 4 |
+
Baseado no exemplo oficial
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import requests
|
| 8 |
+
import json
|
| 9 |
+
import numpy as np
|
| 10 |
+
import base64
|
| 11 |
+
from gtts import gTTS
|
| 12 |
+
from pydub import AudioSegment
|
| 13 |
+
import io
|
| 14 |
+
|
| 15 |
+
def generate_audio(text):
|
| 16 |
+
"""Gera áudio TTS"""
|
| 17 |
+
print(f"🔊 Gerando TTS: '{text}'")
|
| 18 |
+
|
| 19 |
+
tts = gTTS(text=text, lang='pt-br')
|
| 20 |
+
mp3_buffer = io.BytesIO()
|
| 21 |
+
tts.write_to_fp(mp3_buffer)
|
| 22 |
+
mp3_buffer.seek(0)
|
| 23 |
+
|
| 24 |
+
# Converter MP3 para PCM 16kHz
|
| 25 |
+
audio = AudioSegment.from_mp3(mp3_buffer)
|
| 26 |
+
audio = audio.set_frame_rate(16000).set_channels(1).set_sample_width(2)
|
| 27 |
+
|
| 28 |
+
# Converter para numpy float32
|
| 29 |
+
samples = np.array(audio.get_array_of_samples()).astype(np.float32) / 32768.0
|
| 30 |
+
|
| 31 |
+
return samples
|
| 32 |
+
|
| 33 |
+
def test_vllm_api():
|
| 34 |
+
"""Testa usando a API OpenAI do vLLM"""
|
| 35 |
+
|
| 36 |
+
# Gerar áudio de teste
|
| 37 |
+
audio = generate_audio("Quanto é dois mais dois?")
|
| 38 |
+
print(f"🎵 Áudio: {len(audio)} samples @ 16kHz")
|
| 39 |
+
|
| 40 |
+
# Codificar áudio em base64
|
| 41 |
+
audio_bytes = audio.tobytes()
|
| 42 |
+
audio_b64 = base64.b64encode(audio_bytes).decode('utf-8')
|
| 43 |
+
|
| 44 |
+
# Criar mensagem no formato OpenAI com áudio
|
| 45 |
+
messages = [
|
| 46 |
+
{
|
| 47 |
+
"role": "user",
|
| 48 |
+
"content": [
|
| 49 |
+
{
|
| 50 |
+
"type": "audio",
|
| 51 |
+
"audio": {
|
| 52 |
+
"data": audio_b64,
|
| 53 |
+
"format": "pcm16"
|
| 54 |
+
}
|
| 55 |
+
},
|
| 56 |
+
{
|
| 57 |
+
"type": "text",
|
| 58 |
+
"text": "What did you hear?"
|
| 59 |
+
}
|
| 60 |
+
]
|
| 61 |
+
}
|
| 62 |
+
]
|
| 63 |
+
|
| 64 |
+
# Fazer requisição para vLLM OpenAI API
|
| 65 |
+
url = "http://localhost:8000/v1/chat/completions"
|
| 66 |
+
|
| 67 |
+
payload = {
|
| 68 |
+
"model": "fixie-ai/ultravox-v0_5-llama-3_2-1b",
|
| 69 |
+
"messages": messages,
|
| 70 |
+
"temperature": 0.3,
|
| 71 |
+
"max_tokens": 50
|
| 72 |
+
}
|
| 73 |
+
|
| 74 |
+
print("📡 Enviando para vLLM API...")
|
| 75 |
+
|
| 76 |
+
try:
|
| 77 |
+
response = requests.post(url, json=payload)
|
| 78 |
+
|
| 79 |
+
if response.status_code == 200:
|
| 80 |
+
result = response.json()
|
| 81 |
+
print("✅ Resposta:", result['choices'][0]['message']['content'])
|
| 82 |
+
else:
|
| 83 |
+
print(f"❌ Erro: {response.status_code}")
|
| 84 |
+
print(response.text)
|
| 85 |
+
|
| 86 |
+
except Exception as e:
|
| 87 |
+
print(f"❌ Erro: {e}")
|
| 88 |
+
|
| 89 |
+
if __name__ == "__main__":
|
| 90 |
+
test_vllm_api()
|
tts_server_kokoro.py
ADDED
|
@@ -0,0 +1,255 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
TTS Server usando Kokoro para baixa latência
|
| 4 |
+
Retorna PCM direto sem conversões MP3/WAV
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import grpc
|
| 8 |
+
import asyncio
|
| 9 |
+
import sys
|
| 10 |
+
import os
|
| 11 |
+
import time
|
| 12 |
+
import logging
|
| 13 |
+
import numpy as np
|
| 14 |
+
from concurrent import futures
|
| 15 |
+
from pathlib import Path
|
| 16 |
+
import importlib.util
|
| 17 |
+
|
| 18 |
+
# Adicionar paths
|
| 19 |
+
sys.path.append('/workspace/ultravox-pipeline')
|
| 20 |
+
sys.path.append('/workspace/ultravox-pipeline/protos/generated')
|
| 21 |
+
sys.path.append('/workspace/tts-service-kokoro/engines/kokoro')
|
| 22 |
+
|
| 23 |
+
import tts_pb2
|
| 24 |
+
import tts_pb2_grpc
|
| 25 |
+
|
| 26 |
+
# Logging
|
| 27 |
+
logging.basicConfig(
|
| 28 |
+
level=logging.INFO,
|
| 29 |
+
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
| 30 |
+
)
|
| 31 |
+
logger = logging.getLogger(__name__)
|
| 32 |
+
|
| 33 |
+
class KokoroTTSService(tts_pb2_grpc.TTSServiceServicer):
|
| 34 |
+
"""Servidor TTS usando Kokoro para síntese de voz em português"""
|
| 35 |
+
|
| 36 |
+
def __init__(self):
|
| 37 |
+
logger.info("🚀 Inicializando Kokoro TTS Service...")
|
| 38 |
+
self.pipeline = None
|
| 39 |
+
self.is_loaded = False
|
| 40 |
+
self.total_requests = 0
|
| 41 |
+
self.load_model()
|
| 42 |
+
|
| 43 |
+
def load_model(self):
|
| 44 |
+
"""Carrega o modelo Kokoro uma vez e mantém em memória"""
|
| 45 |
+
if self.is_loaded:
|
| 46 |
+
return True
|
| 47 |
+
|
| 48 |
+
try:
|
| 49 |
+
logger.info("📚 Carregando modelo Kokoro...")
|
| 50 |
+
start_time = time.time()
|
| 51 |
+
|
| 52 |
+
# Importar módulo Kokoro dinamicamente
|
| 53 |
+
kokoro_path = Path('/workspace/tts-service-kokoro/engines/kokoro/gerar_audio.py')
|
| 54 |
+
|
| 55 |
+
if not kokoro_path.exists():
|
| 56 |
+
# Fallback para implementação simplificada
|
| 57 |
+
logger.warning("⚠️ Kokoro não encontrado, usando TTS simplificado")
|
| 58 |
+
self.use_simple_tts = True
|
| 59 |
+
self.is_loaded = True
|
| 60 |
+
return True
|
| 61 |
+
|
| 62 |
+
spec = importlib.util.spec_from_file_location("gerar_audio", kokoro_path)
|
| 63 |
+
gerar_audio = importlib.util.module_from_spec(spec)
|
| 64 |
+
spec.loader.exec_module(gerar_audio)
|
| 65 |
+
|
| 66 |
+
# Criar pipeline Kokoro
|
| 67 |
+
KPipeline = gerar_audio.KPipeline
|
| 68 |
+
self.pipeline = KPipeline(lang_code='p') # Português
|
| 69 |
+
self.use_simple_tts = False
|
| 70 |
+
|
| 71 |
+
load_time = time.time() - start_time
|
| 72 |
+
logger.info(f"✅ Kokoro carregado em {load_time:.2f}s")
|
| 73 |
+
|
| 74 |
+
self.is_loaded = True
|
| 75 |
+
|
| 76 |
+
# Warm-up
|
| 77 |
+
self.warmup()
|
| 78 |
+
|
| 79 |
+
return True
|
| 80 |
+
|
| 81 |
+
except Exception as e:
|
| 82 |
+
logger.error(f"❌ Erro ao carregar Kokoro: {e}")
|
| 83 |
+
logger.info("📌 Usando TTS simplificado como fallback")
|
| 84 |
+
self.use_simple_tts = True
|
| 85 |
+
self.is_loaded = True
|
| 86 |
+
return True
|
| 87 |
+
|
| 88 |
+
def warmup(self):
|
| 89 |
+
"""Aquece o modelo com uma síntese teste"""
|
| 90 |
+
try:
|
| 91 |
+
if not self.use_simple_tts:
|
| 92 |
+
logger.info("🔥 Aquecendo modelo Kokoro...")
|
| 93 |
+
start = time.time()
|
| 94 |
+
_ = self.synthesize_text("Teste")
|
| 95 |
+
logger.info(f"✅ Warm-up completo em {time.time() - start:.2f}s")
|
| 96 |
+
except Exception as e:
|
| 97 |
+
logger.error(f"⚠️ Erro no warm-up: {e}")
|
| 98 |
+
|
| 99 |
+
def synthesize_text(self, text: str) -> bytes:
|
| 100 |
+
"""Sintetiza texto para áudio PCM"""
|
| 101 |
+
try:
|
| 102 |
+
if self.use_simple_tts or not self.pipeline:
|
| 103 |
+
# Fallback para síntese simples
|
| 104 |
+
return self._generate_simple_pcm(text)
|
| 105 |
+
|
| 106 |
+
# Usar Kokoro
|
| 107 |
+
start = time.time()
|
| 108 |
+
|
| 109 |
+
# Gerar áudio com Kokoro (retorna numpy array)
|
| 110 |
+
audio_array = self.pipeline.generate(
|
| 111 |
+
text,
|
| 112 |
+
voice='p_gemidao', # Voz portuguesa
|
| 113 |
+
speed=1.0
|
| 114 |
+
)
|
| 115 |
+
|
| 116 |
+
# Converter para PCM 16-bit
|
| 117 |
+
if audio_array.dtype != np.int16:
|
| 118 |
+
# Normalizar e converter
|
| 119 |
+
audio_array = np.clip(audio_array * 32767, -32768, 32767).astype(np.int16)
|
| 120 |
+
|
| 121 |
+
synthesis_time = time.time() - start
|
| 122 |
+
logger.info(f"🎵 Kokoro synthesis: {synthesis_time*1000:.1f}ms")
|
| 123 |
+
|
| 124 |
+
return audio_array.tobytes()
|
| 125 |
+
|
| 126 |
+
except Exception as e:
|
| 127 |
+
logger.error(f"❌ Erro na síntese Kokoro: {e}")
|
| 128 |
+
# Fallback para síntese simples
|
| 129 |
+
return self._generate_simple_pcm(text)
|
| 130 |
+
|
| 131 |
+
def _generate_simple_pcm(self, text: str) -> bytes:
|
| 132 |
+
"""Gera PCM sintético simples como fallback"""
|
| 133 |
+
try:
|
| 134 |
+
# Parâmetros de áudio
|
| 135 |
+
sample_rate = 16000
|
| 136 |
+
duration = max(0.5, len(text) * 0.08) # ~80ms por caractere
|
| 137 |
+
|
| 138 |
+
# Gerar samples
|
| 139 |
+
num_samples = int(sample_rate * duration)
|
| 140 |
+
t = np.linspace(0, duration, num_samples)
|
| 141 |
+
|
| 142 |
+
# Frequência base (voz feminina)
|
| 143 |
+
base_freq = 220 + (hash(text) % 50)
|
| 144 |
+
|
| 145 |
+
# Gerar onda com harmônicos para som mais natural
|
| 146 |
+
signal = np.sin(2 * np.pi * base_freq * t) * 0.5
|
| 147 |
+
signal += np.sin(2 * np.pi * base_freq * 2 * t) * 0.3 # 2º harmônico
|
| 148 |
+
signal += np.sin(2 * np.pi * base_freq * 3 * t) * 0.2 # 3º harmônico
|
| 149 |
+
|
| 150 |
+
# Adicionar modulação para variação natural
|
| 151 |
+
modulation = np.sin(2 * np.pi * 3 * t) * 0.2
|
| 152 |
+
signal = signal * (0.8 + modulation)
|
| 153 |
+
|
| 154 |
+
# Envelope ADSR
|
| 155 |
+
fade_samples = int(0.02 * sample_rate) # 20ms fade
|
| 156 |
+
signal[:fade_samples] *= np.linspace(0, 1, fade_samples)
|
| 157 |
+
signal[-fade_samples:] *= np.linspace(1, 0, fade_samples)
|
| 158 |
+
|
| 159 |
+
# Converter para PCM 16-bit
|
| 160 |
+
pcm_data = np.clip(signal * 32767, -32768, 32767).astype(np.int16)
|
| 161 |
+
|
| 162 |
+
return pcm_data.tobytes()
|
| 163 |
+
|
| 164 |
+
except Exception as e:
|
| 165 |
+
logger.error(f"❌ Erro no TTS simples: {e}")
|
| 166 |
+
# Retornar silêncio
|
| 167 |
+
return np.zeros(16000, dtype=np.int16).tobytes()
|
| 168 |
+
|
| 169 |
+
def StreamingSynthesize(self, request, context):
|
| 170 |
+
"""
|
| 171 |
+
Implementação de streaming synthesis
|
| 172 |
+
Retorna PCM 16-bit @ 16kHz direto, sem conversões
|
| 173 |
+
"""
|
| 174 |
+
try:
|
| 175 |
+
text = request.text
|
| 176 |
+
voice_id = request.voice_id or "kokoro_pt"
|
| 177 |
+
|
| 178 |
+
logger.info(f"🎤 TTS Request: '{text}' [{len(text)} chars]")
|
| 179 |
+
start_time = time.time()
|
| 180 |
+
|
| 181 |
+
# Sintetizar áudio
|
| 182 |
+
pcm_data = self.synthesize_text(text)
|
| 183 |
+
|
| 184 |
+
# Enviar em chunks para streaming
|
| 185 |
+
chunk_size = 4096 # 4KB chunks
|
| 186 |
+
total_chunks = len(pcm_data) // chunk_size + 1
|
| 187 |
+
|
| 188 |
+
for i in range(total_chunks):
|
| 189 |
+
start_idx = i * chunk_size
|
| 190 |
+
end_idx = min((i + 1) * chunk_size, len(pcm_data))
|
| 191 |
+
|
| 192 |
+
if start_idx < len(pcm_data):
|
| 193 |
+
chunk_data = pcm_data[start_idx:end_idx]
|
| 194 |
+
|
| 195 |
+
response = tts_pb2.AudioResponse(
|
| 196 |
+
audio_data=chunk_data,
|
| 197 |
+
samples_count=len(chunk_data) // 2, # int16 = 2 bytes
|
| 198 |
+
is_final_chunk=(i == total_chunks - 1),
|
| 199 |
+
timestamp_ms=int(time.time() * 1000)
|
| 200 |
+
)
|
| 201 |
+
|
| 202 |
+
yield response
|
| 203 |
+
|
| 204 |
+
# Simular streaming realista (sem await pois não é async)
|
| 205 |
+
if not self.use_simple_tts:
|
| 206 |
+
time.sleep(0.001) # 1ms entre chunks
|
| 207 |
+
|
| 208 |
+
total_time = (time.time() - start_time) * 1000
|
| 209 |
+
self.total_requests += 1
|
| 210 |
+
|
| 211 |
+
logger.info(f"✅ TTS completo: {total_time:.1f}ms, {len(pcm_data)/1024:.1f}KB")
|
| 212 |
+
logger.info(f"📊 Total requests: {self.total_requests}")
|
| 213 |
+
|
| 214 |
+
except Exception as e:
|
| 215 |
+
logger.error(f"❌ TTS Synthesis error: {e}")
|
| 216 |
+
context.set_code(grpc.StatusCode.INTERNAL)
|
| 217 |
+
context.set_details(f"Synthesis failed: {e}")
|
| 218 |
+
|
| 219 |
+
async def serve():
|
| 220 |
+
"""Iniciar servidor TTS com Kokoro"""
|
| 221 |
+
|
| 222 |
+
logger.info("🚀 Iniciando Kokoro TTS Server...")
|
| 223 |
+
|
| 224 |
+
# Criar servidor gRPC assíncrono
|
| 225 |
+
server = grpc.aio.server(
|
| 226 |
+
futures.ThreadPoolExecutor(max_workers=10),
|
| 227 |
+
options=[
|
| 228 |
+
('grpc.max_send_message_length', 50 * 1024 * 1024), # 50MB
|
| 229 |
+
('grpc.max_receive_message_length', 50 * 1024 * 1024),
|
| 230 |
+
]
|
| 231 |
+
)
|
| 232 |
+
|
| 233 |
+
# Adicionar serviço
|
| 234 |
+
tts_service = KokoroTTSService()
|
| 235 |
+
tts_pb2_grpc.add_TTSServiceServicer_to_server(tts_service, server)
|
| 236 |
+
|
| 237 |
+
# Configurar porta
|
| 238 |
+
listen_addr = '[::]:50054'
|
| 239 |
+
server.add_insecure_port(listen_addr)
|
| 240 |
+
|
| 241 |
+
# Iniciar servidor
|
| 242 |
+
await server.start()
|
| 243 |
+
logger.info(f"🎵 Kokoro TTS Server rodando em {listen_addr}")
|
| 244 |
+
logger.info("💡 Latência esperada: <100ms para síntese")
|
| 245 |
+
logger.info("🔊 Formato: PCM 16-bit @ 16kHz (sem conversões!)")
|
| 246 |
+
|
| 247 |
+
# Manter rodando
|
| 248 |
+
try:
|
| 249 |
+
await server.wait_for_termination()
|
| 250 |
+
except KeyboardInterrupt:
|
| 251 |
+
logger.info("🛑 Parando servidor...")
|
| 252 |
+
await server.stop(5)
|
| 253 |
+
|
| 254 |
+
if __name__ == '__main__':
|
| 255 |
+
asyncio.run(serve())
|
tunnel-macbook.sh
ADDED
|
@@ -0,0 +1,70 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
|
| 3 |
+
# Script simplificado para MacBook - SSH Tunnel para Ultravox WebRTC
|
| 4 |
+
# Copie este script para seu MacBook e execute localmente
|
| 5 |
+
|
| 6 |
+
# Configurações - EDITE ESTAS VARIÁVEIS
|
| 7 |
+
REMOTE_HOST="SEU_SERVIDOR_AQUI" # Coloque o IP ou hostname do servidor
|
| 8 |
+
REMOTE_USER="ubuntu" # Seu usuário SSH
|
| 9 |
+
SSH_KEY="~/.ssh/id_rsa" # Caminho da sua chave SSH (opcional)
|
| 10 |
+
|
| 11 |
+
# Portas (não precisa mudar)
|
| 12 |
+
WEBRTC_PORT=8082
|
| 13 |
+
ULTRAVOX_PORT=50051
|
| 14 |
+
TTS_PORT=50054
|
| 15 |
+
|
| 16 |
+
# Cores
|
| 17 |
+
GREEN='\033[0;32m'
|
| 18 |
+
YELLOW='\033[1;33m'
|
| 19 |
+
RED='\033[0;31m'
|
| 20 |
+
BLUE='\033[0;34m'
|
| 21 |
+
NC='\033[0m'
|
| 22 |
+
|
| 23 |
+
clear
|
| 24 |
+
echo -e "${BLUE}╔═══════════════════════════════════════════════════════╗${NC}"
|
| 25 |
+
echo -e "${BLUE}║ 🚇 Ultravox WebRTC - Túnel SSH para MacBook ║${NC}"
|
| 26 |
+
echo -e "${BLUE}╚═══════════════════════════════════════════════════════╝${NC}"
|
| 27 |
+
echo
|
| 28 |
+
|
| 29 |
+
# Verificar se o host foi configurado
|
| 30 |
+
if [ "$REMOTE_HOST" = "SEU_SERVIDOR_AQUI" ]; then
|
| 31 |
+
echo -e "${RED}❌ Erro: Configure o REMOTE_HOST no script primeiro!${NC}"
|
| 32 |
+
echo -e "${YELLOW} Edite a linha 6 e coloque o IP ou hostname do seu servidor${NC}"
|
| 33 |
+
exit 1
|
| 34 |
+
fi
|
| 35 |
+
|
| 36 |
+
# Matar túneis existentes
|
| 37 |
+
echo -e "${YELLOW}🔍 Verificando túneis existentes...${NC}"
|
| 38 |
+
pkill -f "ssh.*$REMOTE_HOST.*8082:localhost:8082" 2>/dev/null
|
| 39 |
+
sleep 1
|
| 40 |
+
|
| 41 |
+
echo -e "${YELLOW}📡 Criando túnel SSH...${NC}"
|
| 42 |
+
echo -e " Servidor: ${GREEN}$REMOTE_USER@$REMOTE_HOST${NC}"
|
| 43 |
+
echo
|
| 44 |
+
|
| 45 |
+
# Criar túnel SSH (encaminha apenas a porta do WebRTC)
|
| 46 |
+
if [ -f "$SSH_KEY" ]; then
|
| 47 |
+
ssh -f -N -L 8082:localhost:8082 -i "$SSH_KEY" $REMOTE_USER@$REMOTE_HOST
|
| 48 |
+
else
|
| 49 |
+
ssh -f -N -L 8082:localhost:8082 $REMOTE_USER@$REMOTE_HOST
|
| 50 |
+
fi
|
| 51 |
+
|
| 52 |
+
if [ $? -eq 0 ]; then
|
| 53 |
+
echo -e "${GREEN}✅ Túnel SSH criado com sucesso!${NC}"
|
| 54 |
+
echo
|
| 55 |
+
echo -e "${BLUE}╔═══════════════════════════════════════════════════════╗${NC}"
|
| 56 |
+
echo -e "${BLUE}║ ACESSE NO SEU MACBOOK ║${NC}"
|
| 57 |
+
echo -e "${BLUE}╚═══════════════════════════════════════════════════════╝${NC}"
|
| 58 |
+
echo
|
| 59 |
+
echo -e " ${GREEN}➜ http://localhost:8082${NC}"
|
| 60 |
+
echo -e " ${GREEN}➜ http://localhost:8082/ultravox-chat.html${NC}"
|
| 61 |
+
echo -e " ${GREEN}➜ http://localhost:8082/ultravox-chat-ios.html${NC}"
|
| 62 |
+
echo
|
| 63 |
+
echo -e "${YELLOW}Para fechar o túnel:${NC}"
|
| 64 |
+
echo -e " ${BLUE}pkill -f 'ssh.*8082:localhost:8082'${NC}"
|
| 65 |
+
echo
|
| 66 |
+
else
|
| 67 |
+
echo -e "${RED}❌ Erro ao criar túnel SSH${NC}"
|
| 68 |
+
echo -e "${YELLOW}Verifique suas credenciais SSH e conexão${NC}"
|
| 69 |
+
exit 1
|
| 70 |
+
fi
|
tunnel.sh
ADDED
|
@@ -0,0 +1,95 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
|
| 3 |
+
# SSH Tunnel Script para acessar Ultravox WebRTC do MacBook local
|
| 4 |
+
# Este script cria um túnel SSH para encaminhar a porta 8082 do servidor remoto para sua máquina local
|
| 5 |
+
|
| 6 |
+
# Cores para output
|
| 7 |
+
GREEN='\033[0;32m'
|
| 8 |
+
YELLOW='\033[1;33m'
|
| 9 |
+
RED='\033[0;31m'
|
| 10 |
+
BLUE='\033[0;34m'
|
| 11 |
+
NC='\033[0m' # No Color
|
| 12 |
+
|
| 13 |
+
# Configurações
|
| 14 |
+
REMOTE_HOST="${SSH_HOST:-seu-servidor.com}" # Substitua com o endereço do seu servidor
|
| 15 |
+
REMOTE_USER="${SSH_USER:-ubuntu}" # Substitua com seu usuário SSH
|
| 16 |
+
REMOTE_PORT="${REMOTE_PORT:-8082}" # Porta do WebRTC no servidor remoto
|
| 17 |
+
LOCAL_PORT="${LOCAL_PORT:-8082}" # Porta local no seu MacBook
|
| 18 |
+
SSH_KEY="${SSH_KEY:-~/.ssh/id_rsa}" # Caminho para sua chave SSH
|
| 19 |
+
|
| 20 |
+
echo -e "${BLUE}═══════════════════════════════════════════════════════${NC}"
|
| 21 |
+
echo -e "${BLUE} 🚇 Ultravox WebRTC SSH Tunnel - MacBook Access${NC}"
|
| 22 |
+
echo -e "${BLUE}═══════════════════════════════════════════════════════${NC}"
|
| 23 |
+
echo
|
| 24 |
+
|
| 25 |
+
# Verificar se já existe um túnel na porta
|
| 26 |
+
if lsof -Pi :$LOCAL_PORT -sTCP:LISTEN -t >/dev/null 2>&1; then
|
| 27 |
+
echo -e "${YELLOW}⚠️ Porta $LOCAL_PORT já está em uso${NC}"
|
| 28 |
+
echo -e "${YELLOW}Matando processo existente...${NC}"
|
| 29 |
+
lsof -ti:$LOCAL_PORT | xargs kill -9 2>/dev/null
|
| 30 |
+
sleep 1
|
| 31 |
+
fi
|
| 32 |
+
|
| 33 |
+
# Função para limpar ao sair
|
| 34 |
+
cleanup() {
|
| 35 |
+
echo
|
| 36 |
+
echo -e "${YELLOW}Fechando túnel SSH...${NC}"
|
| 37 |
+
exit 0
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
# Capturar Ctrl+C
|
| 41 |
+
trap cleanup INT
|
| 42 |
+
|
| 43 |
+
echo -e "${YELLOW}📡 Configuração do Túnel:${NC}"
|
| 44 |
+
echo -e " Servidor Remoto: ${GREEN}$REMOTE_USER@$REMOTE_HOST${NC}"
|
| 45 |
+
echo -e " Porta Remota: ${GREEN}$REMOTE_PORT${NC}"
|
| 46 |
+
echo -e " Porta Local: ${GREEN}$LOCAL_PORT${NC}"
|
| 47 |
+
echo
|
| 48 |
+
|
| 49 |
+
echo -e "${YELLOW}🔗 Estabelecendo túnel SSH...${NC}"
|
| 50 |
+
|
| 51 |
+
# Criar o túnel SSH
|
| 52 |
+
# -N: Não executar comando remoto
|
| 53 |
+
# -L: Port forwarding local
|
| 54 |
+
# -o: Opções SSH para reconexão automática
|
| 55 |
+
ssh -N \
|
| 56 |
+
-L $LOCAL_PORT:localhost:$REMOTE_PORT \
|
| 57 |
+
-o ServerAliveInterval=60 \
|
| 58 |
+
-o ServerAliveCountMax=3 \
|
| 59 |
+
-o ExitOnForwardFailure=yes \
|
| 60 |
+
-o StrictHostKeyChecking=no \
|
| 61 |
+
-i $SSH_KEY \
|
| 62 |
+
$REMOTE_USER@$REMOTE_HOST &
|
| 63 |
+
|
| 64 |
+
SSH_PID=$!
|
| 65 |
+
|
| 66 |
+
# Aguardar conexão
|
| 67 |
+
sleep 2
|
| 68 |
+
|
| 69 |
+
# Verificar se o túnel foi estabelecido
|
| 70 |
+
if kill -0 $SSH_PID 2>/dev/null; then
|
| 71 |
+
echo -e "${GREEN}✅ Túnel SSH estabelecido com sucesso!${NC}"
|
| 72 |
+
echo
|
| 73 |
+
echo -e "${BLUE}═══════════════════════════════════════════════════════${NC}"
|
| 74 |
+
echo -e "${GREEN}🎉 Acesse o Ultravox Chat no seu MacBook:${NC}"
|
| 75 |
+
echo
|
| 76 |
+
echo -e " ${BLUE}➜${NC} ${GREEN}http://localhost:$LOCAL_PORT${NC}"
|
| 77 |
+
echo -e " ${BLUE}➜${NC} ${GREEN}http://localhost:$LOCAL_PORT/ultravox-chat.html${NC}"
|
| 78 |
+
echo -e " ${BLUE}➜${NC} ${GREEN}http://localhost:$LOCAL_PORT/ultravox-chat-ios.html${NC}"
|
| 79 |
+
echo
|
| 80 |
+
echo -e "${BLUE}═══════════════════════════════════════════════════════${NC}"
|
| 81 |
+
echo
|
| 82 |
+
echo -e "${YELLOW}📌 Pressione Ctrl+C para fechar o túnel${NC}"
|
| 83 |
+
echo
|
| 84 |
+
|
| 85 |
+
# Manter o túnel aberto
|
| 86 |
+
wait $SSH_PID
|
| 87 |
+
else
|
| 88 |
+
echo -e "${RED}❌ Falha ao estabelecer túnel SSH${NC}"
|
| 89 |
+
echo -e "${YELLOW}Verifique:${NC}"
|
| 90 |
+
echo -e " 1. O endereço do servidor está correto: $REMOTE_HOST"
|
| 91 |
+
echo -e " 2. Suas credenciais SSH estão configuradas"
|
| 92 |
+
echo -e " 3. O servidor está acessível"
|
| 93 |
+
echo -e " 4. A porta $REMOTE_PORT está ativa no servidor"
|
| 94 |
+
exit 1
|
| 95 |
+
fi
|
ultravox/restart_ultravox.sh
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
|
| 3 |
+
# Script para reiniciar o servidor Ultravox com limpeza completa
|
| 4 |
+
|
| 5 |
+
echo "🔄 Reiniciando servidor Ultravox..."
|
| 6 |
+
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
| 7 |
+
echo ""
|
| 8 |
+
|
| 9 |
+
# 1. Executar script de parada
|
| 10 |
+
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
|
| 11 |
+
echo "📍 Parando servidor atual..."
|
| 12 |
+
bash "$SCRIPT_DIR/stop_ultravox.sh"
|
| 13 |
+
|
| 14 |
+
# 2. Aguardar um pouco mais para garantir liberação completa
|
| 15 |
+
echo ""
|
| 16 |
+
echo "⏳ Aguardando liberação completa de recursos..."
|
| 17 |
+
sleep 5
|
| 18 |
+
|
| 19 |
+
# 3. Verificar se realmente liberou
|
| 20 |
+
echo "🔍 Verificando liberação..."
|
| 21 |
+
if lsof -i :50051 >/dev/null 2>&1; then
|
| 22 |
+
echo " ⚠️ Porta 50051 ainda ocupada, forçando limpeza..."
|
| 23 |
+
kill -9 $(lsof -t -i:50051) 2>/dev/null
|
| 24 |
+
sleep 2
|
| 25 |
+
fi
|
| 26 |
+
|
| 27 |
+
# 4. Verificar GPU uma última vez
|
| 28 |
+
GPU_FREE=$(nvidia-smi --query-gpu=memory.free --format=csv,noheader,nounits 2>/dev/null | head -1)
|
| 29 |
+
if [ -n "$GPU_FREE" ] && [ "$GPU_FREE" -lt "20000" ]; then
|
| 30 |
+
echo " ⚠️ GPU com menos de 20GB livres, limpeza adicional..."
|
| 31 |
+
pkill -9 -f "python" 2>/dev/null
|
| 32 |
+
sleep 3
|
| 33 |
+
fi
|
| 34 |
+
|
| 35 |
+
# 5. Iniciar servidor
|
| 36 |
+
echo ""
|
| 37 |
+
echo "🚀 Iniciando novo servidor..."
|
| 38 |
+
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
| 39 |
+
bash "$SCRIPT_DIR/start_ultravox.sh"
|
ultravox/server.py
CHANGED
|
@@ -118,12 +118,10 @@ class UltravoxServicer(speech_pb2_grpc.SpeechServiceServicer):
|
|
| 118 |
enforce_eager=True, # Desabilitar CUDA graphs para modelos customizados
|
| 119 |
enable_prefix_caching=False, # Desabilitar cache de prefixo
|
| 120 |
)
|
| 121 |
-
# Parâmetros otimizados baseados nos testes
|
| 122 |
self.sampling_params = SamplingParams(
|
| 123 |
-
temperature=0.
|
| 124 |
-
max_tokens=
|
| 125 |
-
repetition_penalty=1.1, # Evitar repetições
|
| 126 |
-
stop=[".", "!", "?", "\n\n"] # Parar em pontuação natural
|
| 127 |
)
|
| 128 |
self.pipeline = None # Não usar pipeline do Transformers
|
| 129 |
|
|
@@ -216,11 +214,14 @@ class UltravoxServicer(speech_pb2_grpc.SpeechServiceServicer):
|
|
| 216 |
logger.warning(f"Nenhum áudio recebido para sessão {session_id}")
|
| 217 |
return
|
| 218 |
|
| 219 |
-
#
|
| 220 |
if not prompt:
|
| 221 |
-
|
| 222 |
-
|
| 223 |
-
|
|
|
|
|
|
|
|
|
|
| 224 |
|
| 225 |
# Concatenar todo o áudio
|
| 226 |
full_audio = np.concatenate(audio_chunks)
|
|
@@ -244,26 +245,43 @@ class UltravoxServicer(speech_pb2_grpc.SpeechServiceServicer):
|
|
| 244 |
from vllm import SamplingParams
|
| 245 |
|
| 246 |
# Preparar entrada para vLLM com áudio
|
| 247 |
-
#
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
|
| 251 |
-
|
| 252 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 253 |
else:
|
| 254 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 255 |
|
| 256 |
# 🔍 LOG DETALHADO DO PROMPT PARA DEBUG
|
| 257 |
logger.info(f"🔍 PROMPT COMPLETO enviado para vLLM:")
|
| 258 |
-
logger.info(f" 📝 Prompt original
|
| 259 |
-
logger.info(f" 🎯 Prompt formatado
|
| 260 |
logger.info(f" 🎵 Áudio shape: {full_audio.shape}, dtype: {full_audio.dtype}")
|
| 261 |
logger.info(f" 📊 Áudio stats: min={full_audio.min():.3f}, max={full_audio.max():.3f}")
|
| 262 |
logger.info("=" * 80)
|
|
|
|
| 263 |
vllm_input = {
|
| 264 |
-
"prompt":
|
| 265 |
"multi_modal_data": {
|
| 266 |
-
"audio":
|
| 267 |
}
|
| 268 |
}
|
| 269 |
|
|
|
|
| 118 |
enforce_eager=True, # Desabilitar CUDA graphs para modelos customizados
|
| 119 |
enable_prefix_caching=False, # Desabilitar cache de prefixo
|
| 120 |
)
|
| 121 |
+
# Parâmetros otimizados baseados nos testes bem-sucedidos
|
| 122 |
self.sampling_params = SamplingParams(
|
| 123 |
+
temperature=0.2, # Temperatura baixa para respostas mais precisas
|
| 124 |
+
max_tokens=64 # Tokens suficientes para respostas completas
|
|
|
|
|
|
|
| 125 |
)
|
| 126 |
self.pipeline = None # Não usar pipeline do Transformers
|
| 127 |
|
|
|
|
| 214 |
logger.warning(f"Nenhum áudio recebido para sessão {session_id}")
|
| 215 |
return
|
| 216 |
|
| 217 |
+
# SEMPRE incluir o token de áudio no prompt
|
| 218 |
if not prompt:
|
| 219 |
+
prompt = "<|audio|>"
|
| 220 |
+
logger.info("Usando prompt simples com token de áudio")
|
| 221 |
+
elif "<|audio|>" not in prompt:
|
| 222 |
+
# Se tem prompt mas não tem o token de áudio, adicionar
|
| 223 |
+
prompt = f"{prompt}\n<|audio|>"
|
| 224 |
+
logger.info(f"Adicionando token <|audio|> ao prompt customizado")
|
| 225 |
|
| 226 |
# Concatenar todo o áudio
|
| 227 |
full_audio = np.concatenate(audio_chunks)
|
|
|
|
| 245 |
from vllm import SamplingParams
|
| 246 |
|
| 247 |
# Preparar entrada para vLLM com áudio
|
| 248 |
+
# Importar tokenizer para chat template
|
| 249 |
+
from transformers import AutoTokenizer
|
| 250 |
+
model_name = self.model_config['model_path']
|
| 251 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 252 |
+
|
| 253 |
+
# Criar mensagem com token de áudio
|
| 254 |
+
if prompt and "<|audio|>" not in prompt:
|
| 255 |
+
user_content = f"<|audio|>\n{prompt}"
|
| 256 |
+
elif not prompt:
|
| 257 |
+
user_content = "<|audio|>\nResponda em português:"
|
| 258 |
else:
|
| 259 |
+
user_content = prompt
|
| 260 |
+
|
| 261 |
+
messages = [{"role": "user", "content": user_content}]
|
| 262 |
+
|
| 263 |
+
# Aplicar chat template
|
| 264 |
+
formatted_prompt = tokenizer.apply_chat_template(
|
| 265 |
+
messages,
|
| 266 |
+
tokenize=False,
|
| 267 |
+
add_generation_prompt=True
|
| 268 |
+
)
|
| 269 |
+
|
| 270 |
+
# Criar tupla (audio, sample_rate) - formato esperado pelo vLLM
|
| 271 |
+
audio_tuple = (full_audio, sample_rate)
|
| 272 |
|
| 273 |
# 🔍 LOG DETALHADO DO PROMPT PARA DEBUG
|
| 274 |
logger.info(f"🔍 PROMPT COMPLETO enviado para vLLM:")
|
| 275 |
+
logger.info(f" 📝 Prompt original: '{prompt[:100]}...'")
|
| 276 |
+
logger.info(f" 🎯 Prompt formatado: '{formatted_prompt[:100]}...'")
|
| 277 |
logger.info(f" 🎵 Áudio shape: {full_audio.shape}, dtype: {full_audio.dtype}")
|
| 278 |
logger.info(f" 📊 Áudio stats: min={full_audio.min():.3f}, max={full_audio.max():.3f}")
|
| 279 |
logger.info("=" * 80)
|
| 280 |
+
|
| 281 |
vllm_input = {
|
| 282 |
+
"prompt": formatted_prompt,
|
| 283 |
"multi_modal_data": {
|
| 284 |
+
"audio": [audio_tuple] # Lista de tuplas (audio, sample_rate)
|
| 285 |
}
|
| 286 |
}
|
| 287 |
|
ultravox/server_backup.py
ADDED
|
@@ -0,0 +1,446 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Servidor Ultravox gRPC - Implementação com vLLM para aceleração
|
| 4 |
+
Usa vLLM quando disponível, fallback para Transformers
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import grpc
|
| 8 |
+
import asyncio
|
| 9 |
+
import logging
|
| 10 |
+
import numpy as np
|
| 11 |
+
import time
|
| 12 |
+
import sys
|
| 13 |
+
import os
|
| 14 |
+
import torch
|
| 15 |
+
import transformers
|
| 16 |
+
from typing import Iterator, Optional
|
| 17 |
+
from concurrent import futures
|
| 18 |
+
|
| 19 |
+
# Tentar importar vLLM
|
| 20 |
+
try:
|
| 21 |
+
from vllm import LLM, SamplingParams
|
| 22 |
+
VLLM_AVAILABLE = True
|
| 23 |
+
logger_vllm = logging.getLogger("vllm")
|
| 24 |
+
logger_vllm.info("✅ vLLM disponível - usando inferência acelerada")
|
| 25 |
+
except ImportError:
|
| 26 |
+
VLLM_AVAILABLE = False
|
| 27 |
+
logger_vllm = logging.getLogger("vllm")
|
| 28 |
+
logger_vllm.warning("⚠️ vLLM não disponível - usando Transformers padrão")
|
| 29 |
+
|
| 30 |
+
# Adicionar paths para protos
|
| 31 |
+
sys.path.append('/workspace/ultravox-pipeline/services/ultravox')
|
| 32 |
+
sys.path.append('/workspace/ultravox-pipeline/protos/generated')
|
| 33 |
+
|
| 34 |
+
import speech_pb2
|
| 35 |
+
import speech_pb2_grpc
|
| 36 |
+
|
| 37 |
+
logging.basicConfig(
|
| 38 |
+
level=logging.INFO,
|
| 39 |
+
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
| 40 |
+
)
|
| 41 |
+
logger = logging.getLogger(__name__)
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
class UltravoxServicer(speech_pb2_grpc.SpeechServiceServicer):
|
| 45 |
+
"""Implementação gRPC do Ultravox usando a arquitetura correta"""
|
| 46 |
+
|
| 47 |
+
def __init__(self):
|
| 48 |
+
"""Inicializa o serviço"""
|
| 49 |
+
logger.info("Inicializando Ultravox Service...")
|
| 50 |
+
|
| 51 |
+
# Verificar GPU antes de inicializar
|
| 52 |
+
if not torch.cuda.is_available():
|
| 53 |
+
logger.error("❌ GPU não disponível! Ultravox requer GPU para funcionar.")
|
| 54 |
+
logger.error("Verifique se CUDA está instalado e funcionando.")
|
| 55 |
+
raise RuntimeError("GPU não disponível. Ultravox não pode funcionar sem GPU.")
|
| 56 |
+
|
| 57 |
+
# Forçar uso da GPU com mais memória livre
|
| 58 |
+
best_gpu = 0
|
| 59 |
+
best_free = 0
|
| 60 |
+
for i in range(torch.cuda.device_count()):
|
| 61 |
+
total = torch.cuda.get_device_properties(i).total_memory / (1024**3)
|
| 62 |
+
allocated = torch.cuda.memory_allocated(i) / (1024**3)
|
| 63 |
+
free = total - allocated
|
| 64 |
+
logger.info(f"GPU {i}: {torch.cuda.get_device_name(i)} - {free:.1f}GB livre de {total:.1f}GB")
|
| 65 |
+
if free > best_free:
|
| 66 |
+
best_free = free
|
| 67 |
+
best_gpu = i
|
| 68 |
+
|
| 69 |
+
torch.cuda.set_device(best_gpu)
|
| 70 |
+
logger.info(f"✅ Usando GPU {best_gpu}: {torch.cuda.get_device_name(best_gpu)}")
|
| 71 |
+
logger.info(f" Memória livre: {best_free:.1f}GB")
|
| 72 |
+
|
| 73 |
+
if best_free < 3.0: # Ultravox 1B precisa ~3GB
|
| 74 |
+
logger.warning(f"⚠️ Pouca memória GPU disponível ({best_free:.1f}GB). Recomendado: 3GB+")
|
| 75 |
+
|
| 76 |
+
# Configuração do modelo usando Transformers Pipeline
|
| 77 |
+
self.model_config = {
|
| 78 |
+
'model_path': "fixie-ai/ultravox-v0_5-llama-3_2-1b", # Modelo v0.5 com Llama-3.2-1B (funcionando com vLLM)
|
| 79 |
+
'device': f"cuda:{best_gpu}", # GPU específica
|
| 80 |
+
'max_new_tokens': 200,
|
| 81 |
+
'temperature': 0.7, # Temperatura para respostas mais naturais
|
| 82 |
+
'token': os.getenv('HF_TOKEN', '') # Token HuggingFace via env var
|
| 83 |
+
}
|
| 84 |
+
|
| 85 |
+
# Pipeline de transformers (API estável)
|
| 86 |
+
self.pipeline = None
|
| 87 |
+
self.conversation_states = {} # Estado por sessão
|
| 88 |
+
|
| 89 |
+
# Métricas
|
| 90 |
+
self.total_requests = 0
|
| 91 |
+
self.active_sessions = 0
|
| 92 |
+
self.total_tokens_generated = 0
|
| 93 |
+
self._start_time = time.time()
|
| 94 |
+
|
| 95 |
+
# Inicializar modelo
|
| 96 |
+
self._initialize_model()
|
| 97 |
+
|
| 98 |
+
def _initialize_model(self):
|
| 99 |
+
"""Inicializa o modelo Ultravox usando vLLM ou Transformers"""
|
| 100 |
+
try:
|
| 101 |
+
start_time = time.time()
|
| 102 |
+
|
| 103 |
+
if not VLLM_AVAILABLE:
|
| 104 |
+
logger.error("❌ vLLM NÃO está instalado! Este servidor REQUER vLLM.")
|
| 105 |
+
logger.error("Instale com: pip install vllm")
|
| 106 |
+
raise RuntimeError("vLLM é obrigatório para este servidor")
|
| 107 |
+
|
| 108 |
+
# USAR APENAS vLLM - SEM FALLBACK
|
| 109 |
+
logger.info("🚀 Carregando modelo Ultravox via vLLM (OBRIGATÓRIO)...")
|
| 110 |
+
|
| 111 |
+
# vLLM para modelos multimodais
|
| 112 |
+
self.vllm_model = LLM(
|
| 113 |
+
model=self.model_config['model_path'],
|
| 114 |
+
trust_remote_code=True,
|
| 115 |
+
dtype="bfloat16",
|
| 116 |
+
gpu_memory_utilization=0.30, # 30% (~7.2GB) para ter memória suficiente
|
| 117 |
+
max_model_len=128, # Reduzir contexto para 128 tokens para economizar memória
|
| 118 |
+
enforce_eager=True, # Desabilitar CUDA graphs para modelos customizados
|
| 119 |
+
enable_prefix_caching=False, # Desabilitar cache de prefixo
|
| 120 |
+
)
|
| 121 |
+
# Parâmetros otimizados baseados nos testes
|
| 122 |
+
self.sampling_params = SamplingParams(
|
| 123 |
+
temperature=0.3, # Mais conservador para respostas consistentes
|
| 124 |
+
max_tokens=50, # Respostas mais concisas
|
| 125 |
+
repetition_penalty=1.1, # Evitar repetições
|
| 126 |
+
stop=[".", "!", "?", "\n\n"] # Parar em pontuação natural
|
| 127 |
+
)
|
| 128 |
+
self.pipeline = None # Não usar pipeline do Transformers
|
| 129 |
+
|
| 130 |
+
load_time = time.time() - start_time
|
| 131 |
+
logger.info(f"✅ Modelo carregado em {load_time:.2f}s via vLLM")
|
| 132 |
+
logger.info("🎯 Usando vLLM para inferência acelerada!")
|
| 133 |
+
|
| 134 |
+
except Exception as e:
|
| 135 |
+
logger.error(f"Erro ao carregar modelo: {e}")
|
| 136 |
+
raise
|
| 137 |
+
|
| 138 |
+
def _get_conversation_state(self, session_id: str):
|
| 139 |
+
"""Obtém ou cria estado de conversação para sessão"""
|
| 140 |
+
if session_id not in self.conversation_states:
|
| 141 |
+
self.conversation_states[session_id] = {
|
| 142 |
+
'created_at': time.time(),
|
| 143 |
+
'turn_count': 0,
|
| 144 |
+
'conversation_history': []
|
| 145 |
+
}
|
| 146 |
+
logger.info(f"Estado de conversação criado para sessão: {session_id}")
|
| 147 |
+
|
| 148 |
+
return self.conversation_states[session_id]
|
| 149 |
+
|
| 150 |
+
def _cleanup_old_sessions(self, max_age: int = 1800): # 30 minutos
|
| 151 |
+
"""Remove sessões antigas"""
|
| 152 |
+
current_time = time.time()
|
| 153 |
+
expired_sessions = [
|
| 154 |
+
sid for sid, state in self.conversation_states.items()
|
| 155 |
+
if current_time - state['created_at'] > max_age
|
| 156 |
+
]
|
| 157 |
+
|
| 158 |
+
for sid in expired_sessions:
|
| 159 |
+
del self.conversation_states[sid]
|
| 160 |
+
logger.info(f"Sessão expirada removida: {sid}")
|
| 161 |
+
|
| 162 |
+
async def StreamingRecognize(self,
|
| 163 |
+
request_iterator,
|
| 164 |
+
context: grpc.ServicerContext) -> Iterator[speech_pb2.TranscriptToken]:
|
| 165 |
+
"""
|
| 166 |
+
Processa stream de áudio usando a arquitetura Ultravox completa
|
| 167 |
+
|
| 168 |
+
Args:
|
| 169 |
+
request_iterator: Iterator de chunks de áudio
|
| 170 |
+
context: Contexto gRPC
|
| 171 |
+
|
| 172 |
+
Yields:
|
| 173 |
+
Tokens de transcrição + resposta do LLM
|
| 174 |
+
"""
|
| 175 |
+
session_id = None
|
| 176 |
+
start_time = time.time()
|
| 177 |
+
self.total_requests += 1
|
| 178 |
+
|
| 179 |
+
try:
|
| 180 |
+
# Coletar todo o áudio primeiro (como no Gradio)
|
| 181 |
+
audio_chunks = []
|
| 182 |
+
sample_rate = 16000
|
| 183 |
+
prompt = None # Será obtido do metadata ou usado padrão
|
| 184 |
+
|
| 185 |
+
# Processar chunks de entrada
|
| 186 |
+
async for audio_chunk in request_iterator:
|
| 187 |
+
if not session_id:
|
| 188 |
+
session_id = audio_chunk.session_id or f"session_{self.total_requests}"
|
| 189 |
+
logger.info(f"Nova sessão Ultravox: {session_id}")
|
| 190 |
+
self.active_sessions += 1
|
| 191 |
+
|
| 192 |
+
# DEBUG: Log todos os campos recebidos
|
| 193 |
+
logger.info(f"DEBUG - Chunk recebido para {session_id}:")
|
| 194 |
+
logger.info(f" - audio_data: {len(audio_chunk.audio_data)} bytes")
|
| 195 |
+
logger.info(f" - sample_rate: {audio_chunk.sample_rate}")
|
| 196 |
+
logger.info(f" - is_final_chunk: {audio_chunk.is_final_chunk}")
|
| 197 |
+
|
| 198 |
+
# Obter prompt do campo system_prompt
|
| 199 |
+
if not prompt and audio_chunk.system_prompt:
|
| 200 |
+
prompt = audio_chunk.system_prompt
|
| 201 |
+
logger.info(f"✅ PROMPT DINÂMICO recebido: {prompt[:100]}...")
|
| 202 |
+
elif not audio_chunk.system_prompt:
|
| 203 |
+
logger.info(f"DEBUG - Sem system_prompt no chunk")
|
| 204 |
+
|
| 205 |
+
sample_rate = audio_chunk.sample_rate or 16000
|
| 206 |
+
|
| 207 |
+
# CRUCIAL: Converter de bytes para numpy float32 (como descoberto no Gradio)
|
| 208 |
+
audio_data = np.frombuffer(audio_chunk.audio_data, dtype=np.float32)
|
| 209 |
+
audio_chunks.append(audio_data)
|
| 210 |
+
|
| 211 |
+
# Se é chunk final, processar
|
| 212 |
+
if audio_chunk.is_final_chunk:
|
| 213 |
+
break
|
| 214 |
+
|
| 215 |
+
if not audio_chunks:
|
| 216 |
+
logger.warning(f"Nenhum áudio recebido para sessão {session_id}")
|
| 217 |
+
return
|
| 218 |
+
|
| 219 |
+
# SEMPRE incluir o token de áudio, mesmo com system_prompt
|
| 220 |
+
if prompt and "<|audio|>" not in prompt:
|
| 221 |
+
# Se tem prompt mas não tem o token de áudio, adicionar
|
| 222 |
+
prompt = f"{prompt}\n<|audio|>"
|
| 223 |
+
logger.info(f"Adicionando token <|audio|> ao prompt customizado")
|
| 224 |
+
elif not prompt:
|
| 225 |
+
# ⚠️ FORMATO SIMPLES QUE FUNCIONA COM ULTRAVOX v0.5! ⚠️
|
| 226 |
+
# O token <|audio|> é substituído pelo áudio automaticamente
|
| 227 |
+
prompt = "<|audio|>"
|
| 228 |
+
logger.info("Usando prompt simples com apenas token de áudio")
|
| 229 |
+
|
| 230 |
+
# Concatenar todo o áudio
|
| 231 |
+
full_audio = np.concatenate(audio_chunks)
|
| 232 |
+
logger.info(f"Áudio processado: {len(full_audio)} samples @ {sample_rate}Hz para sessão {session_id}")
|
| 233 |
+
|
| 234 |
+
# Obter estado de conversação
|
| 235 |
+
conv_state = self._get_conversation_state(session_id)
|
| 236 |
+
conv_state['turn_count'] += 1
|
| 237 |
+
|
| 238 |
+
# Processar com vLLM ou Transformers
|
| 239 |
+
backend = "vLLM" if self.vllm_model else "Transformers"
|
| 240 |
+
logger.info(f"Iniciando inferência {backend} para sessão {session_id}")
|
| 241 |
+
inference_start = time.time()
|
| 242 |
+
|
| 243 |
+
try:
|
| 244 |
+
# USAR APENAS vLLM - SEM FALLBACK
|
| 245 |
+
if not self.vllm_model:
|
| 246 |
+
raise RuntimeError("vLLM não está carregado! Este servidor REQUER vLLM.")
|
| 247 |
+
|
| 248 |
+
# Usar vLLM para inferência acelerada (v0.10+ suporta Ultravox!)
|
| 249 |
+
from vllm import SamplingParams
|
| 250 |
+
|
| 251 |
+
# USAR PROMPT DIRETO - Ultravox v0.5 com vLLM funciona melhor assim
|
| 252 |
+
# O token <|audio|> é substituído automaticamente pelo áudio
|
| 253 |
+
vllm_prompt = prompt
|
| 254 |
+
|
| 255 |
+
# 🔍 LOG DETALHADO DO PROMPT PARA DEBUG
|
| 256 |
+
logger.info(f"🔍 PROMPT COMPLETO enviado para vLLM:")
|
| 257 |
+
logger.info(f" 🎯 Prompt: '{vllm_prompt[:200]}...'")
|
| 258 |
+
logger.info(f" 🎵 Áudio shape: {full_audio.shape}, dtype: {full_audio.dtype}")
|
| 259 |
+
logger.info(f" 📊 Áudio stats: min={full_audio.min():.3f}, max={full_audio.max():.3f}")
|
| 260 |
+
logger.info("=" * 80)
|
| 261 |
+
|
| 262 |
+
vllm_input = {
|
| 263 |
+
"prompt": vllm_prompt,
|
| 264 |
+
"multi_modal_data": {
|
| 265 |
+
"audio": full_audio # numpy array já em 16kHz
|
| 266 |
+
}
|
| 267 |
+
}
|
| 268 |
+
|
| 269 |
+
# Fazer inferência com vLLM
|
| 270 |
+
outputs = self.vllm_model.generate(
|
| 271 |
+
prompts=[vllm_input],
|
| 272 |
+
sampling_params=self.sampling_params
|
| 273 |
+
)
|
| 274 |
+
|
| 275 |
+
inference_time = time.time() - inference_start
|
| 276 |
+
logger.info(f"⚡ Inferência vLLM concluída em {inference_time*1000:.0f}ms")
|
| 277 |
+
|
| 278 |
+
# 🔍 LOG DETALHADO DA RESPOSTA vLLM
|
| 279 |
+
logger.info(f"🔍 RESPOSTA DETALHADA do vLLM:")
|
| 280 |
+
logger.info(f" 📤 Outputs count: {len(outputs)}")
|
| 281 |
+
logger.info(f" 📤 Outputs[0].outputs count: {len(outputs[0].outputs)}")
|
| 282 |
+
|
| 283 |
+
# Extrair resposta
|
| 284 |
+
response_text = outputs[0].outputs[0].text
|
| 285 |
+
logger.info(f" 📝 Resposta RAW: '{response_text}'")
|
| 286 |
+
logger.info(f" 📏 Tamanho resposta: {len(response_text)} chars")
|
| 287 |
+
|
| 288 |
+
if not response_text:
|
| 289 |
+
response_text = "Desculpe, não consegui processar o áudio. Poderia repetir?"
|
| 290 |
+
logger.info(f" ⚠️ Resposta vazia, usando fallback")
|
| 291 |
+
else:
|
| 292 |
+
logger.info(f" ✅ Resposta válida recebida")
|
| 293 |
+
|
| 294 |
+
logger.info(f" 🎯 Resposta final: '{response_text[:100]}...'")
|
| 295 |
+
logger.info("=" * 80)
|
| 296 |
+
|
| 297 |
+
# Sem else - SEMPRE usar vLLM
|
| 298 |
+
|
| 299 |
+
# Simular streaming dividindo a resposta em tokens
|
| 300 |
+
words = response_text.split()
|
| 301 |
+
token_count = 0
|
| 302 |
+
|
| 303 |
+
for word in words:
|
| 304 |
+
# Criar token de resposta
|
| 305 |
+
token = speech_pb2.TranscriptToken()
|
| 306 |
+
token.text = word + " "
|
| 307 |
+
token.confidence = 0.95
|
| 308 |
+
token.is_final = False
|
| 309 |
+
token.timestamp_ms = int((time.time() - start_time) * 1000)
|
| 310 |
+
|
| 311 |
+
# Metadados de emoção
|
| 312 |
+
token.emotion.emotion = speech_pb2.EmotionMetadata.NEUTRAL
|
| 313 |
+
token.emotion.confidence = 0.8
|
| 314 |
+
|
| 315 |
+
# Metadados de prosódia
|
| 316 |
+
token.prosody.speech_rate = 120.0
|
| 317 |
+
token.prosody.pitch_mean = 150.0
|
| 318 |
+
token.prosody.energy = -20.0
|
| 319 |
+
token.prosody.pitch_variance = 50.0
|
| 320 |
+
|
| 321 |
+
token_count += 1
|
| 322 |
+
self.total_tokens_generated += 1
|
| 323 |
+
|
| 324 |
+
logger.debug(f"Token {token_count}: '{word}' para sessão {session_id}")
|
| 325 |
+
|
| 326 |
+
yield token
|
| 327 |
+
|
| 328 |
+
# Pequena pausa para simular streaming
|
| 329 |
+
await asyncio.sleep(0.05)
|
| 330 |
+
|
| 331 |
+
# Token final
|
| 332 |
+
final_token = speech_pb2.TranscriptToken()
|
| 333 |
+
final_token.text = "" # Token vazio indica fim
|
| 334 |
+
final_token.confidence = 1.0
|
| 335 |
+
final_token.is_final = True
|
| 336 |
+
final_token.timestamp_ms = int((time.time() - start_time) * 1000)
|
| 337 |
+
|
| 338 |
+
logger.info(f"✅ Processamento completo: {token_count} tokens, {inference_time*1000:.0f}ms")
|
| 339 |
+
|
| 340 |
+
yield final_token
|
| 341 |
+
|
| 342 |
+
except Exception as model_error:
|
| 343 |
+
logger.error(f"Erro no modelo Transformers: {model_error}")
|
| 344 |
+
# Retornar erro como token
|
| 345 |
+
error_token = speech_pb2.TranscriptToken()
|
| 346 |
+
error_token.text = f"Erro no processamento: {str(model_error)}"
|
| 347 |
+
error_token.confidence = 0.0
|
| 348 |
+
error_token.is_final = True
|
| 349 |
+
error_token.timestamp_ms = int((time.time() - start_time) * 1000)
|
| 350 |
+
|
| 351 |
+
yield error_token
|
| 352 |
+
|
| 353 |
+
# Limpar sessões antigas periodicamente
|
| 354 |
+
if self.total_requests % 10 == 0:
|
| 355 |
+
self._cleanup_old_sessions()
|
| 356 |
+
|
| 357 |
+
except Exception as e:
|
| 358 |
+
logger.error(f"Erro na transcrição para sessão {session_id}: {e}")
|
| 359 |
+
# Enviar token de erro
|
| 360 |
+
error_token = speech_pb2.TranscriptToken()
|
| 361 |
+
error_token.text = ""
|
| 362 |
+
error_token.confidence = 0.0
|
| 363 |
+
error_token.is_final = True
|
| 364 |
+
error_token.timestamp_ms = int((time.time() - start_time) * 1000)
|
| 365 |
+
yield error_token
|
| 366 |
+
|
| 367 |
+
finally:
|
| 368 |
+
if session_id:
|
| 369 |
+
self.active_sessions = max(0, self.active_sessions - 1)
|
| 370 |
+
processing_time = time.time() - start_time
|
| 371 |
+
logger.info(f"Sessão {session_id} concluída. Latência: {processing_time*1000:.2f}ms")
|
| 372 |
+
|
| 373 |
+
async def GetMetrics(self, request: speech_pb2.Empty,
|
| 374 |
+
context: grpc.ServicerContext) -> speech_pb2.Metrics:
|
| 375 |
+
"""Retorna métricas do serviço"""
|
| 376 |
+
import psutil
|
| 377 |
+
import torch
|
| 378 |
+
|
| 379 |
+
metrics = speech_pb2.Metrics()
|
| 380 |
+
metrics.total_requests = self.total_requests
|
| 381 |
+
metrics.active_sessions = self.active_sessions
|
| 382 |
+
|
| 383 |
+
# Latência média (placeholder)
|
| 384 |
+
metrics.average_latency_ms = 500.0
|
| 385 |
+
|
| 386 |
+
# Uso de GPU (sempre GPU conforme solicitado)
|
| 387 |
+
try:
|
| 388 |
+
metrics.gpu_usage_percent = float(torch.cuda.utilization())
|
| 389 |
+
metrics.memory_usage_mb = float(torch.cuda.memory_allocated() / (1024 * 1024))
|
| 390 |
+
except:
|
| 391 |
+
metrics.gpu_usage_percent = 0.0
|
| 392 |
+
metrics.memory_usage_mb = 0.0
|
| 393 |
+
|
| 394 |
+
# Tokens por segundo (deve ser int64 conforme protobuf)
|
| 395 |
+
metrics.tokens_per_second = int(self.total_tokens_generated / max(1, time.time() - self._start_time))
|
| 396 |
+
|
| 397 |
+
return metrics
|
| 398 |
+
|
| 399 |
+
|
| 400 |
+
async def serve():
|
| 401 |
+
"""Inicia servidor gRPC"""
|
| 402 |
+
# Configurar servidor
|
| 403 |
+
server = grpc.aio.server(
|
| 404 |
+
futures.ThreadPoolExecutor(max_workers=10),
|
| 405 |
+
options=[
|
| 406 |
+
('grpc.max_send_message_length', 10 * 1024 * 1024),
|
| 407 |
+
('grpc.max_receive_message_length', 10 * 1024 * 1024),
|
| 408 |
+
('grpc.keepalive_time_ms', 30000),
|
| 409 |
+
('grpc.keepalive_timeout_ms', 10000),
|
| 410 |
+
('grpc.http2.min_time_between_pings_ms', 30000),
|
| 411 |
+
]
|
| 412 |
+
)
|
| 413 |
+
|
| 414 |
+
# Adicionar serviço
|
| 415 |
+
speech_pb2_grpc.add_SpeechServiceServicer_to_server(
|
| 416 |
+
UltravoxServicer(), server
|
| 417 |
+
)
|
| 418 |
+
|
| 419 |
+
# Configurar porta
|
| 420 |
+
port = os.getenv('ULTRAVOX_PORT', '50051')
|
| 421 |
+
# Bind dual stack - IPv4 e IPv6 para compatibilidade
|
| 422 |
+
server.add_insecure_port(f'0.0.0.0:{port}') # IPv4
|
| 423 |
+
server.add_insecure_port(f'[::]:{port}') # IPv6
|
| 424 |
+
|
| 425 |
+
logger.info(f"Ultravox Server iniciando na porta {port}...")
|
| 426 |
+
await server.start()
|
| 427 |
+
logger.info(f"Ultravox Server rodando na porta {port}")
|
| 428 |
+
|
| 429 |
+
try:
|
| 430 |
+
await server.wait_for_termination()
|
| 431 |
+
except KeyboardInterrupt:
|
| 432 |
+
logger.info("Parando servidor...")
|
| 433 |
+
await server.stop(grace_period=5)
|
| 434 |
+
|
| 435 |
+
|
| 436 |
+
def main():
|
| 437 |
+
"""Função principal"""
|
| 438 |
+
try:
|
| 439 |
+
asyncio.run(serve())
|
| 440 |
+
except Exception as e:
|
| 441 |
+
logger.error(f"Erro fatal: {e}")
|
| 442 |
+
sys.exit(1)
|
| 443 |
+
|
| 444 |
+
|
| 445 |
+
if __name__ == "__main__":
|
| 446 |
+
main()
|
ultravox/server_vllm_090_broken.py
ADDED
|
@@ -0,0 +1,447 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Servidor Ultravox gRPC - Implementação com vLLM para aceleração
|
| 4 |
+
Usa vLLM quando disponível, fallback para Transformers
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import grpc
|
| 8 |
+
import asyncio
|
| 9 |
+
import logging
|
| 10 |
+
import numpy as np
|
| 11 |
+
import time
|
| 12 |
+
import sys
|
| 13 |
+
import os
|
| 14 |
+
import torch
|
| 15 |
+
import transformers
|
| 16 |
+
from typing import Iterator, Optional
|
| 17 |
+
from concurrent import futures
|
| 18 |
+
|
| 19 |
+
# Tentar importar vLLM
|
| 20 |
+
try:
|
| 21 |
+
from vllm import LLM, SamplingParams
|
| 22 |
+
VLLM_AVAILABLE = True
|
| 23 |
+
logger_vllm = logging.getLogger("vllm")
|
| 24 |
+
logger_vllm.info("✅ vLLM disponível - usando inferência acelerada")
|
| 25 |
+
except ImportError:
|
| 26 |
+
VLLM_AVAILABLE = False
|
| 27 |
+
logger_vllm = logging.getLogger("vllm")
|
| 28 |
+
logger_vllm.warning("⚠️ vLLM não disponível - usando Transformers padrão")
|
| 29 |
+
|
| 30 |
+
# Adicionar paths para protos
|
| 31 |
+
sys.path.append('/workspace/ultravox-pipeline/services/ultravox')
|
| 32 |
+
sys.path.append('/workspace/ultravox-pipeline/protos/generated')
|
| 33 |
+
|
| 34 |
+
import speech_pb2
|
| 35 |
+
import speech_pb2_grpc
|
| 36 |
+
|
| 37 |
+
logging.basicConfig(
|
| 38 |
+
level=logging.INFO,
|
| 39 |
+
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
| 40 |
+
)
|
| 41 |
+
logger = logging.getLogger(__name__)
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
class UltravoxServicer(speech_pb2_grpc.SpeechServiceServicer):
|
| 45 |
+
"""Implementação gRPC do Ultravox usando a arquitetura correta"""
|
| 46 |
+
|
| 47 |
+
def __init__(self):
|
| 48 |
+
"""Inicializa o serviço"""
|
| 49 |
+
logger.info("Inicializando Ultravox Service...")
|
| 50 |
+
|
| 51 |
+
# Verificar GPU antes de inicializar
|
| 52 |
+
if not torch.cuda.is_available():
|
| 53 |
+
logger.error("❌ GPU não disponível! Ultravox requer GPU para funcionar.")
|
| 54 |
+
logger.error("Verifique se CUDA está instalado e funcionando.")
|
| 55 |
+
raise RuntimeError("GPU não disponível. Ultravox não pode funcionar sem GPU.")
|
| 56 |
+
|
| 57 |
+
# Forçar uso da GPU com mais memória livre
|
| 58 |
+
best_gpu = 0
|
| 59 |
+
best_free = 0
|
| 60 |
+
for i in range(torch.cuda.device_count()):
|
| 61 |
+
total = torch.cuda.get_device_properties(i).total_memory / (1024**3)
|
| 62 |
+
allocated = torch.cuda.memory_allocated(i) / (1024**3)
|
| 63 |
+
free = total - allocated
|
| 64 |
+
logger.info(f"GPU {i}: {torch.cuda.get_device_name(i)} - {free:.1f}GB livre de {total:.1f}GB")
|
| 65 |
+
if free > best_free:
|
| 66 |
+
best_free = free
|
| 67 |
+
best_gpu = i
|
| 68 |
+
|
| 69 |
+
torch.cuda.set_device(best_gpu)
|
| 70 |
+
logger.info(f"✅ Usando GPU {best_gpu}: {torch.cuda.get_device_name(best_gpu)}")
|
| 71 |
+
logger.info(f" Memória livre: {best_free:.1f}GB")
|
| 72 |
+
|
| 73 |
+
if best_free < 3.0: # Ultravox 1B precisa ~3GB
|
| 74 |
+
logger.warning(f"⚠️ Pouca memória GPU disponível ({best_free:.1f}GB). Recomendado: 3GB+")
|
| 75 |
+
|
| 76 |
+
# Configuração do modelo usando Transformers Pipeline
|
| 77 |
+
self.model_config = {
|
| 78 |
+
'model_path': "fixie-ai/ultravox-v0_5-llama-3_2-1b", # Modelo v0.5 com Llama-3.2-1B
|
| 79 |
+
'device': f"cuda:{best_gpu}", # GPU específica
|
| 80 |
+
'max_new_tokens': 200,
|
| 81 |
+
'temperature': 0.7, # Temperatura para respostas mais naturais
|
| 82 |
+
'token': os.getenv('HF_TOKEN', '') # Token HuggingFace via env var
|
| 83 |
+
}
|
| 84 |
+
|
| 85 |
+
# Pipeline de transformers (API estável)
|
| 86 |
+
self.pipeline = None
|
| 87 |
+
self.conversation_states = {} # Estado por sessão
|
| 88 |
+
|
| 89 |
+
# Métricas
|
| 90 |
+
self.total_requests = 0
|
| 91 |
+
self.active_sessions = 0
|
| 92 |
+
self.total_tokens_generated = 0
|
| 93 |
+
self._start_time = time.time()
|
| 94 |
+
|
| 95 |
+
# Inicializar modelo
|
| 96 |
+
self._initialize_model()
|
| 97 |
+
|
| 98 |
+
def _initialize_model(self):
|
| 99 |
+
"""Inicializa o modelo Ultravox usando vLLM ou Transformers"""
|
| 100 |
+
try:
|
| 101 |
+
start_time = time.time()
|
| 102 |
+
|
| 103 |
+
if not VLLM_AVAILABLE:
|
| 104 |
+
logger.error("❌ vLLM NÃO está instalado! Este servidor REQUER vLLM.")
|
| 105 |
+
logger.error("Instale com: pip install vllm")
|
| 106 |
+
raise RuntimeError("vLLM é obrigatório para este servidor")
|
| 107 |
+
|
| 108 |
+
# USAR APENAS vLLM - SEM FALLBACK
|
| 109 |
+
logger.info("🚀 Carregando modelo Ultravox via vLLM (OBRIGATÓRIO)...")
|
| 110 |
+
|
| 111 |
+
# vLLM para modelos multimodais - Gemma 3 27B com quantização INT4
|
| 112 |
+
self.vllm_model = LLM(
|
| 113 |
+
model=self.model_config['model_path'],
|
| 114 |
+
trust_remote_code=True,
|
| 115 |
+
dtype="bfloat16",
|
| 116 |
+
gpu_memory_utilization=0.60, # Usar 60% da GPU para o modelo 27B quantizado
|
| 117 |
+
max_model_len=256, # Reduzir contexto para 256 tokens
|
| 118 |
+
enforce_eager=True, # Desabilitar CUDA graphs para modelos customizados
|
| 119 |
+
enable_prefix_caching=False, # Desabilitar cache de prefixo
|
| 120 |
+
)
|
| 121 |
+
# Parâmetros otimizados baseados nos testes
|
| 122 |
+
self.sampling_params = SamplingParams(
|
| 123 |
+
temperature=0.3, # Mais conservador para respostas consistentes
|
| 124 |
+
max_tokens=50, # Respostas mais concisas
|
| 125 |
+
repetition_penalty=1.1, # Evitar repeti��ões
|
| 126 |
+
stop=[".", "!", "?", "\n\n"] # Parar em pontuação natural
|
| 127 |
+
)
|
| 128 |
+
self.pipeline = None # Não usar pipeline do Transformers
|
| 129 |
+
|
| 130 |
+
load_time = time.time() - start_time
|
| 131 |
+
logger.info(f"✅ Modelo carregado em {load_time:.2f}s via vLLM")
|
| 132 |
+
logger.info("🎯 Usando vLLM para inferência acelerada!")
|
| 133 |
+
|
| 134 |
+
except Exception as e:
|
| 135 |
+
logger.error(f"Erro ao carregar modelo: {e}")
|
| 136 |
+
raise
|
| 137 |
+
|
| 138 |
+
def _get_conversation_state(self, session_id: str):
|
| 139 |
+
"""Obtém ou cria estado de conversação para sessão"""
|
| 140 |
+
if session_id not in self.conversation_states:
|
| 141 |
+
self.conversation_states[session_id] = {
|
| 142 |
+
'created_at': time.time(),
|
| 143 |
+
'turn_count': 0,
|
| 144 |
+
'conversation_history': []
|
| 145 |
+
}
|
| 146 |
+
logger.info(f"Estado de conversação criado para sessão: {session_id}")
|
| 147 |
+
|
| 148 |
+
return self.conversation_states[session_id]
|
| 149 |
+
|
| 150 |
+
def _cleanup_old_sessions(self, max_age: int = 1800): # 30 minutos
|
| 151 |
+
"""Remove sessões antigas"""
|
| 152 |
+
current_time = time.time()
|
| 153 |
+
expired_sessions = [
|
| 154 |
+
sid for sid, state in self.conversation_states.items()
|
| 155 |
+
if current_time - state['created_at'] > max_age
|
| 156 |
+
]
|
| 157 |
+
|
| 158 |
+
for sid in expired_sessions:
|
| 159 |
+
del self.conversation_states[sid]
|
| 160 |
+
logger.info(f"Sessão expirada removida: {sid}")
|
| 161 |
+
|
| 162 |
+
async def StreamingRecognize(self,
|
| 163 |
+
request_iterator,
|
| 164 |
+
context: grpc.ServicerContext) -> Iterator[speech_pb2.TranscriptToken]:
|
| 165 |
+
"""
|
| 166 |
+
Processa stream de áudio usando a arquitetura Ultravox completa
|
| 167 |
+
|
| 168 |
+
Args:
|
| 169 |
+
request_iterator: Iterator de chunks de áudio
|
| 170 |
+
context: Contexto gRPC
|
| 171 |
+
|
| 172 |
+
Yields:
|
| 173 |
+
Tokens de transcrição + resposta do LLM
|
| 174 |
+
"""
|
| 175 |
+
session_id = None
|
| 176 |
+
start_time = time.time()
|
| 177 |
+
self.total_requests += 1
|
| 178 |
+
|
| 179 |
+
try:
|
| 180 |
+
# Coletar todo o áudio primeiro (como no Gradio)
|
| 181 |
+
audio_chunks = []
|
| 182 |
+
sample_rate = 16000
|
| 183 |
+
prompt = None # Será obtido do metadata ou usado padrão
|
| 184 |
+
|
| 185 |
+
# Processar chunks de entrada
|
| 186 |
+
async for audio_chunk in request_iterator:
|
| 187 |
+
if not session_id:
|
| 188 |
+
session_id = audio_chunk.session_id or f"session_{self.total_requests}"
|
| 189 |
+
logger.info(f"Nova sessão Ultravox: {session_id}")
|
| 190 |
+
self.active_sessions += 1
|
| 191 |
+
|
| 192 |
+
# DEBUG: Log todos os campos recebidos
|
| 193 |
+
logger.info(f"DEBUG - Chunk recebido para {session_id}:")
|
| 194 |
+
logger.info(f" - audio_data: {len(audio_chunk.audio_data)} bytes")
|
| 195 |
+
logger.info(f" - sample_rate: {audio_chunk.sample_rate}")
|
| 196 |
+
logger.info(f" - is_final_chunk: {audio_chunk.is_final_chunk}")
|
| 197 |
+
|
| 198 |
+
# Obter prompt do campo system_prompt
|
| 199 |
+
if not prompt and audio_chunk.system_prompt:
|
| 200 |
+
prompt = audio_chunk.system_prompt
|
| 201 |
+
logger.info(f"✅ PROMPT DINÂMICO recebido: {prompt[:100]}...")
|
| 202 |
+
elif not audio_chunk.system_prompt:
|
| 203 |
+
logger.info(f"DEBUG - Sem system_prompt no chunk")
|
| 204 |
+
|
| 205 |
+
sample_rate = audio_chunk.sample_rate or 16000
|
| 206 |
+
|
| 207 |
+
# CRUCIAL: Converter de bytes para numpy float32 (como descoberto no Gradio)
|
| 208 |
+
audio_data = np.frombuffer(audio_chunk.audio_data, dtype=np.float32)
|
| 209 |
+
audio_chunks.append(audio_data)
|
| 210 |
+
|
| 211 |
+
# Se é chunk final, processar
|
| 212 |
+
if audio_chunk.is_final_chunk:
|
| 213 |
+
break
|
| 214 |
+
|
| 215 |
+
if not audio_chunks:
|
| 216 |
+
logger.warning(f"Nenhum áudio recebido para sessão {session_id}")
|
| 217 |
+
return
|
| 218 |
+
|
| 219 |
+
# Usar prompt padrão otimizado (formato que funciona!)
|
| 220 |
+
if not prompt:
|
| 221 |
+
# IMPORTANTE: Incluir o token <|audio|> que o Ultravox espera
|
| 222 |
+
# FALLBACK: Usar inglês simples que o modelo entende bem
|
| 223 |
+
prompt = "You are a helpful assistant. <|audio|>\nRespond in Portuguese:"
|
| 224 |
+
logger.info("Usando prompt SIMPLES em inglês com instrução para responder em português")
|
| 225 |
+
|
| 226 |
+
# Concatenar todo o áudio
|
| 227 |
+
full_audio = np.concatenate(audio_chunks)
|
| 228 |
+
logger.info(f"Áudio processado: {len(full_audio)} samples @ {sample_rate}Hz para sessão {session_id}")
|
| 229 |
+
|
| 230 |
+
# Obter estado de conversação
|
| 231 |
+
conv_state = self._get_conversation_state(session_id)
|
| 232 |
+
conv_state['turn_count'] += 1
|
| 233 |
+
|
| 234 |
+
# Processar com vLLM ou Transformers
|
| 235 |
+
backend = "vLLM" if self.vllm_model else "Transformers"
|
| 236 |
+
logger.info(f"Iniciando inferência {backend} para sessão {session_id}")
|
| 237 |
+
inference_start = time.time()
|
| 238 |
+
|
| 239 |
+
try:
|
| 240 |
+
# USAR APENAS vLLM - SEM FALLBACK
|
| 241 |
+
if not self.vllm_model:
|
| 242 |
+
raise RuntimeError("vLLM não está carregado! Este servidor REQUER vLLM.")
|
| 243 |
+
|
| 244 |
+
# Usar vLLM para inferência acelerada (v0.10+ suporta Ultravox!)
|
| 245 |
+
from vllm import SamplingParams
|
| 246 |
+
|
| 247 |
+
# Preparar entrada para vLLM com áudio
|
| 248 |
+
# Formato otimizado que funciona com Ultravox v0.5
|
| 249 |
+
# GARANTIR que o prompt tenha o token <|audio|>
|
| 250 |
+
if "<|audio|>" not in prompt:
|
| 251 |
+
# Adicionar o token se não estiver presente
|
| 252 |
+
vllm_prompt = prompt.rstrip() + " <|audio|>\nResponda em português:"
|
| 253 |
+
logger.warning(f"Token <|audio|> não encontrado no prompt, adicionando automaticamente")
|
| 254 |
+
else:
|
| 255 |
+
vllm_prompt = prompt
|
| 256 |
+
|
| 257 |
+
# 🔍 LOG DETALHADO DO PROMPT PARA DEBUG
|
| 258 |
+
logger.info(f"🔍 PROMPT COMPLETO enviado para vLLM:")
|
| 259 |
+
logger.info(f" 📝 Prompt original recebido: '{prompt[:200]}...'")
|
| 260 |
+
logger.info(f" 🎯 Prompt formatado final: '{vllm_prompt[:200]}...'")
|
| 261 |
+
logger.info(f" 🎵 Áudio shape: {full_audio.shape}, dtype: {full_audio.dtype}")
|
| 262 |
+
logger.info(f" 📊 Áudio stats: min={full_audio.min():.3f}, max={full_audio.max():.3f}")
|
| 263 |
+
logger.info("=" * 80)
|
| 264 |
+
vllm_input = {
|
| 265 |
+
"prompt": vllm_prompt,
|
| 266 |
+
"multi_modal_data": {
|
| 267 |
+
"audio": full_audio # numpy array já em 16kHz
|
| 268 |
+
}
|
| 269 |
+
}
|
| 270 |
+
|
| 271 |
+
# Fazer inferência com vLLM
|
| 272 |
+
outputs = self.vllm_model.generate(
|
| 273 |
+
prompts=[vllm_input],
|
| 274 |
+
sampling_params=self.sampling_params
|
| 275 |
+
)
|
| 276 |
+
|
| 277 |
+
inference_time = time.time() - inference_start
|
| 278 |
+
logger.info(f"⚡ Inferência vLLM concluída em {inference_time*1000:.0f}ms")
|
| 279 |
+
|
| 280 |
+
# 🔍 LOG DETALHADO DA RESPOSTA vLLM
|
| 281 |
+
logger.info(f"🔍 RESPOSTA DETALHADA do vLLM:")
|
| 282 |
+
logger.info(f" 📤 Outputs count: {len(outputs)}")
|
| 283 |
+
logger.info(f" 📤 Outputs[0].outputs count: {len(outputs[0].outputs)}")
|
| 284 |
+
|
| 285 |
+
# Extrair resposta
|
| 286 |
+
response_text = outputs[0].outputs[0].text
|
| 287 |
+
logger.info(f" 📝 Resposta RAW: '{response_text}'")
|
| 288 |
+
logger.info(f" 📏 Tamanho resposta: {len(response_text)} chars")
|
| 289 |
+
|
| 290 |
+
if not response_text:
|
| 291 |
+
response_text = "Desculpe, não consegui processar o áudio. Poderia repetir?"
|
| 292 |
+
logger.info(f" ⚠️ Resposta vazia, usando fallback")
|
| 293 |
+
else:
|
| 294 |
+
logger.info(f" ✅ Resposta válida recebida")
|
| 295 |
+
|
| 296 |
+
logger.info(f" 🎯 Resposta final: '{response_text[:100]}...'")
|
| 297 |
+
logger.info("=" * 80)
|
| 298 |
+
|
| 299 |
+
# Sem else - SEMPRE usar vLLM
|
| 300 |
+
|
| 301 |
+
# Simular streaming dividindo a resposta em tokens
|
| 302 |
+
words = response_text.split()
|
| 303 |
+
token_count = 0
|
| 304 |
+
|
| 305 |
+
for word in words:
|
| 306 |
+
# Criar token de resposta
|
| 307 |
+
token = speech_pb2.TranscriptToken()
|
| 308 |
+
token.text = word + " "
|
| 309 |
+
token.confidence = 0.95
|
| 310 |
+
token.is_final = False
|
| 311 |
+
token.timestamp_ms = int((time.time() - start_time) * 1000)
|
| 312 |
+
|
| 313 |
+
# Metadados de emoção
|
| 314 |
+
token.emotion.emotion = speech_pb2.EmotionMetadata.NEUTRAL
|
| 315 |
+
token.emotion.confidence = 0.8
|
| 316 |
+
|
| 317 |
+
# Metadados de prosódia
|
| 318 |
+
token.prosody.speech_rate = 120.0
|
| 319 |
+
token.prosody.pitch_mean = 150.0
|
| 320 |
+
token.prosody.energy = -20.0
|
| 321 |
+
token.prosody.pitch_variance = 50.0
|
| 322 |
+
|
| 323 |
+
token_count += 1
|
| 324 |
+
self.total_tokens_generated += 1
|
| 325 |
+
|
| 326 |
+
logger.debug(f"Token {token_count}: '{word}' para sessão {session_id}")
|
| 327 |
+
|
| 328 |
+
yield token
|
| 329 |
+
|
| 330 |
+
# Pequena pausa para simular streaming
|
| 331 |
+
await asyncio.sleep(0.05)
|
| 332 |
+
|
| 333 |
+
# Token final
|
| 334 |
+
final_token = speech_pb2.TranscriptToken()
|
| 335 |
+
final_token.text = "" # Token vazio indica fim
|
| 336 |
+
final_token.confidence = 1.0
|
| 337 |
+
final_token.is_final = True
|
| 338 |
+
final_token.timestamp_ms = int((time.time() - start_time) * 1000)
|
| 339 |
+
|
| 340 |
+
logger.info(f"✅ Processamento completo: {token_count} tokens, {inference_time*1000:.0f}ms")
|
| 341 |
+
|
| 342 |
+
yield final_token
|
| 343 |
+
|
| 344 |
+
except Exception as model_error:
|
| 345 |
+
logger.error(f"Erro no modelo Transformers: {model_error}")
|
| 346 |
+
# Retornar erro como token
|
| 347 |
+
error_token = speech_pb2.TranscriptToken()
|
| 348 |
+
error_token.text = f"Erro no processamento: {str(model_error)}"
|
| 349 |
+
error_token.confidence = 0.0
|
| 350 |
+
error_token.is_final = True
|
| 351 |
+
error_token.timestamp_ms = int((time.time() - start_time) * 1000)
|
| 352 |
+
|
| 353 |
+
yield error_token
|
| 354 |
+
|
| 355 |
+
# Limpar sessões antigas periodicamente
|
| 356 |
+
if self.total_requests % 10 == 0:
|
| 357 |
+
self._cleanup_old_sessions()
|
| 358 |
+
|
| 359 |
+
except Exception as e:
|
| 360 |
+
logger.error(f"Erro na transcrição para sessão {session_id}: {e}")
|
| 361 |
+
# Enviar token de erro
|
| 362 |
+
error_token = speech_pb2.TranscriptToken()
|
| 363 |
+
error_token.text = ""
|
| 364 |
+
error_token.confidence = 0.0
|
| 365 |
+
error_token.is_final = True
|
| 366 |
+
error_token.timestamp_ms = int((time.time() - start_time) * 1000)
|
| 367 |
+
yield error_token
|
| 368 |
+
|
| 369 |
+
finally:
|
| 370 |
+
if session_id:
|
| 371 |
+
self.active_sessions = max(0, self.active_sessions - 1)
|
| 372 |
+
processing_time = time.time() - start_time
|
| 373 |
+
logger.info(f"Sessão {session_id} concluída. Latência: {processing_time*1000:.2f}ms")
|
| 374 |
+
|
| 375 |
+
async def GetMetrics(self, request: speech_pb2.Empty,
|
| 376 |
+
context: grpc.ServicerContext) -> speech_pb2.Metrics:
|
| 377 |
+
"""Retorna métricas do serviço"""
|
| 378 |
+
import psutil
|
| 379 |
+
import torch
|
| 380 |
+
|
| 381 |
+
metrics = speech_pb2.Metrics()
|
| 382 |
+
metrics.total_requests = self.total_requests
|
| 383 |
+
metrics.active_sessions = self.active_sessions
|
| 384 |
+
|
| 385 |
+
# Latência média (placeholder)
|
| 386 |
+
metrics.average_latency_ms = 500.0
|
| 387 |
+
|
| 388 |
+
# Uso de GPU (sempre GPU conforme solicitado)
|
| 389 |
+
try:
|
| 390 |
+
metrics.gpu_usage_percent = float(torch.cuda.utilization())
|
| 391 |
+
metrics.memory_usage_mb = float(torch.cuda.memory_allocated() / (1024 * 1024))
|
| 392 |
+
except:
|
| 393 |
+
metrics.gpu_usage_percent = 0.0
|
| 394 |
+
metrics.memory_usage_mb = 0.0
|
| 395 |
+
|
| 396 |
+
# Tokens por segundo (deve ser int64 conforme protobuf)
|
| 397 |
+
metrics.tokens_per_second = int(self.total_tokens_generated / max(1, time.time() - self._start_time))
|
| 398 |
+
|
| 399 |
+
return metrics
|
| 400 |
+
|
| 401 |
+
|
| 402 |
+
async def serve():
|
| 403 |
+
"""Inicia servidor gRPC"""
|
| 404 |
+
# Configurar servidor
|
| 405 |
+
server = grpc.aio.server(
|
| 406 |
+
futures.ThreadPoolExecutor(max_workers=10),
|
| 407 |
+
options=[
|
| 408 |
+
('grpc.max_send_message_length', 10 * 1024 * 1024),
|
| 409 |
+
('grpc.max_receive_message_length', 10 * 1024 * 1024),
|
| 410 |
+
('grpc.keepalive_time_ms', 30000),
|
| 411 |
+
('grpc.keepalive_timeout_ms', 10000),
|
| 412 |
+
('grpc.http2.min_time_between_pings_ms', 30000),
|
| 413 |
+
]
|
| 414 |
+
)
|
| 415 |
+
|
| 416 |
+
# Adicionar serviço
|
| 417 |
+
speech_pb2_grpc.add_SpeechServiceServicer_to_server(
|
| 418 |
+
UltravoxServicer(), server
|
| 419 |
+
)
|
| 420 |
+
|
| 421 |
+
# Configurar porta (IPv4 e IPv6)
|
| 422 |
+
port = os.getenv('ULTRAVOX_PORT', '50051')
|
| 423 |
+
server.add_insecure_port(f'0.0.0.0:{port}') # IPv4
|
| 424 |
+
server.add_insecure_port(f'[::]:{port}') # IPv6
|
| 425 |
+
|
| 426 |
+
logger.info(f"Ultravox Server iniciando na porta {port}...")
|
| 427 |
+
await server.start()
|
| 428 |
+
logger.info(f"Ultravox Server rodando na porta {port}")
|
| 429 |
+
|
| 430 |
+
try:
|
| 431 |
+
await server.wait_for_termination()
|
| 432 |
+
except KeyboardInterrupt:
|
| 433 |
+
logger.info("Parando servidor...")
|
| 434 |
+
await server.stop(grace_period=5)
|
| 435 |
+
|
| 436 |
+
|
| 437 |
+
def main():
|
| 438 |
+
"""Função principal"""
|
| 439 |
+
try:
|
| 440 |
+
asyncio.run(serve())
|
| 441 |
+
except Exception as e:
|
| 442 |
+
logger.error(f"Erro fatal: {e}")
|
| 443 |
+
sys.exit(1)
|
| 444 |
+
|
| 445 |
+
|
| 446 |
+
if __name__ == "__main__":
|
| 447 |
+
main()
|
ultravox/server_working_original.py
ADDED
|
@@ -0,0 +1,440 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Servidor Ultravox gRPC - Implementação com vLLM para aceleração
|
| 4 |
+
Usa vLLM quando disponível, fallback para Transformers
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import grpc
|
| 8 |
+
import asyncio
|
| 9 |
+
import logging
|
| 10 |
+
import numpy as np
|
| 11 |
+
import time
|
| 12 |
+
import sys
|
| 13 |
+
import os
|
| 14 |
+
import torch
|
| 15 |
+
import transformers
|
| 16 |
+
from typing import Iterator, Optional
|
| 17 |
+
from concurrent import futures
|
| 18 |
+
|
| 19 |
+
# Tentar importar vLLM
|
| 20 |
+
try:
|
| 21 |
+
from vllm import LLM, SamplingParams
|
| 22 |
+
VLLM_AVAILABLE = True
|
| 23 |
+
logger_vllm = logging.getLogger("vllm")
|
| 24 |
+
logger_vllm.info("✅ vLLM disponível - usando inferência acelerada")
|
| 25 |
+
except ImportError:
|
| 26 |
+
VLLM_AVAILABLE = False
|
| 27 |
+
logger_vllm = logging.getLogger("vllm")
|
| 28 |
+
logger_vllm.warning("⚠️ vLLM não disponível - usando Transformers padrão")
|
| 29 |
+
|
| 30 |
+
# Adicionar paths para protos
|
| 31 |
+
sys.path.append('/workspace/ultravox-pipeline/services/ultravox')
|
| 32 |
+
sys.path.append('/workspace/ultravox-pipeline/protos/generated')
|
| 33 |
+
|
| 34 |
+
import speech_pb2
|
| 35 |
+
import speech_pb2_grpc
|
| 36 |
+
|
| 37 |
+
logging.basicConfig(
|
| 38 |
+
level=logging.INFO,
|
| 39 |
+
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
| 40 |
+
)
|
| 41 |
+
logger = logging.getLogger(__name__)
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
class UltravoxServicer(speech_pb2_grpc.SpeechServiceServicer):
|
| 45 |
+
"""Implementação gRPC do Ultravox usando a arquitetura correta"""
|
| 46 |
+
|
| 47 |
+
def __init__(self):
|
| 48 |
+
"""Inicializa o serviço"""
|
| 49 |
+
logger.info("Inicializando Ultravox Service...")
|
| 50 |
+
|
| 51 |
+
# Verificar GPU antes de inicializar
|
| 52 |
+
if not torch.cuda.is_available():
|
| 53 |
+
logger.error("❌ GPU não disponível! Ultravox requer GPU para funcionar.")
|
| 54 |
+
logger.error("Verifique se CUDA está instalado e funcionando.")
|
| 55 |
+
raise RuntimeError("GPU não disponível. Ultravox não pode funcionar sem GPU.")
|
| 56 |
+
|
| 57 |
+
# Forçar uso da GPU com mais memória livre
|
| 58 |
+
best_gpu = 0
|
| 59 |
+
best_free = 0
|
| 60 |
+
for i in range(torch.cuda.device_count()):
|
| 61 |
+
total = torch.cuda.get_device_properties(i).total_memory / (1024**3)
|
| 62 |
+
allocated = torch.cuda.memory_allocated(i) / (1024**3)
|
| 63 |
+
free = total - allocated
|
| 64 |
+
logger.info(f"GPU {i}: {torch.cuda.get_device_name(i)} - {free:.1f}GB livre de {total:.1f}GB")
|
| 65 |
+
if free > best_free:
|
| 66 |
+
best_free = free
|
| 67 |
+
best_gpu = i
|
| 68 |
+
|
| 69 |
+
torch.cuda.set_device(best_gpu)
|
| 70 |
+
logger.info(f"✅ Usando GPU {best_gpu}: {torch.cuda.get_device_name(best_gpu)}")
|
| 71 |
+
logger.info(f" Memória livre: {best_free:.1f}GB")
|
| 72 |
+
|
| 73 |
+
if best_free < 3.0: # Ultravox 1B precisa ~3GB
|
| 74 |
+
logger.warning(f"⚠️ Pouca memória GPU disponível ({best_free:.1f}GB). Recomendado: 3GB+")
|
| 75 |
+
|
| 76 |
+
# Configuração do modelo usando Transformers Pipeline
|
| 77 |
+
self.model_config = {
|
| 78 |
+
'model_path': "fixie-ai/ultravox-v0_5-llama-3_2-1b", # Modelo v0.5 com Llama-3.2-1B (funcionando com vLLM)
|
| 79 |
+
'device': f"cuda:{best_gpu}", # GPU específica
|
| 80 |
+
'max_new_tokens': 200,
|
| 81 |
+
'temperature': 0.7, # Temperatura para respostas mais naturais
|
| 82 |
+
'token': os.getenv('HF_TOKEN', '') # Token HuggingFace via env var
|
| 83 |
+
}
|
| 84 |
+
|
| 85 |
+
# Pipeline de transformers (API estável)
|
| 86 |
+
self.pipeline = None
|
| 87 |
+
self.conversation_states = {} # Estado por sessão
|
| 88 |
+
|
| 89 |
+
# Métricas
|
| 90 |
+
self.total_requests = 0
|
| 91 |
+
self.active_sessions = 0
|
| 92 |
+
self.total_tokens_generated = 0
|
| 93 |
+
self._start_time = time.time()
|
| 94 |
+
|
| 95 |
+
# Inicializar modelo
|
| 96 |
+
self._initialize_model()
|
| 97 |
+
|
| 98 |
+
def _initialize_model(self):
|
| 99 |
+
"""Inicializa o modelo Ultravox usando vLLM ou Transformers"""
|
| 100 |
+
try:
|
| 101 |
+
start_time = time.time()
|
| 102 |
+
|
| 103 |
+
if not VLLM_AVAILABLE:
|
| 104 |
+
logger.error("❌ vLLM NÃO está instalado! Este servidor REQUER vLLM.")
|
| 105 |
+
logger.error("Instale com: pip install vllm")
|
| 106 |
+
raise RuntimeError("vLLM é obrigatório para este servidor")
|
| 107 |
+
|
| 108 |
+
# USAR APENAS vLLM - SEM FALLBACK
|
| 109 |
+
logger.info("🚀 Carregando modelo Ultravox via vLLM (OBRIGATÓRIO)...")
|
| 110 |
+
|
| 111 |
+
# vLLM para modelos multimodais
|
| 112 |
+
self.vllm_model = LLM(
|
| 113 |
+
model=self.model_config['model_path'],
|
| 114 |
+
trust_remote_code=True,
|
| 115 |
+
dtype="bfloat16",
|
| 116 |
+
gpu_memory_utilization=0.30, # Aumentar para 30% (~7.2GB de 24GB)
|
| 117 |
+
max_model_len=256, # Reduzir contexto para 256 tokens
|
| 118 |
+
enforce_eager=True, # Desabilitar CUDA graphs para modelos customizados
|
| 119 |
+
enable_prefix_caching=False, # Desabilitar cache de prefixo
|
| 120 |
+
)
|
| 121 |
+
# Parâmetros otimizados baseados nos testes
|
| 122 |
+
self.sampling_params = SamplingParams(
|
| 123 |
+
temperature=0.3, # Mais conservador para respostas consistentes
|
| 124 |
+
max_tokens=50, # Respostas mais concisas
|
| 125 |
+
repetition_penalty=1.1, # Evitar repetições
|
| 126 |
+
stop=[".", "!", "?", "\n\n"] # Parar em pontuação natural
|
| 127 |
+
)
|
| 128 |
+
self.pipeline = None # Não usar pipeline do Transformers
|
| 129 |
+
|
| 130 |
+
load_time = time.time() - start_time
|
| 131 |
+
logger.info(f"✅ Modelo carregado em {load_time:.2f}s via vLLM")
|
| 132 |
+
logger.info("🎯 Usando vLLM para inferência acelerada!")
|
| 133 |
+
|
| 134 |
+
except Exception as e:
|
| 135 |
+
logger.error(f"Erro ao carregar modelo: {e}")
|
| 136 |
+
raise
|
| 137 |
+
|
| 138 |
+
def _get_conversation_state(self, session_id: str):
|
| 139 |
+
"""Obtém ou cria estado de conversação para sessão"""
|
| 140 |
+
if session_id not in self.conversation_states:
|
| 141 |
+
self.conversation_states[session_id] = {
|
| 142 |
+
'created_at': time.time(),
|
| 143 |
+
'turn_count': 0,
|
| 144 |
+
'conversation_history': []
|
| 145 |
+
}
|
| 146 |
+
logger.info(f"Estado de conversação criado para sessão: {session_id}")
|
| 147 |
+
|
| 148 |
+
return self.conversation_states[session_id]
|
| 149 |
+
|
| 150 |
+
def _cleanup_old_sessions(self, max_age: int = 1800): # 30 minutos
|
| 151 |
+
"""Remove sessões antigas"""
|
| 152 |
+
current_time = time.time()
|
| 153 |
+
expired_sessions = [
|
| 154 |
+
sid for sid, state in self.conversation_states.items()
|
| 155 |
+
if current_time - state['created_at'] > max_age
|
| 156 |
+
]
|
| 157 |
+
|
| 158 |
+
for sid in expired_sessions:
|
| 159 |
+
del self.conversation_states[sid]
|
| 160 |
+
logger.info(f"Sessão expirada removida: {sid}")
|
| 161 |
+
|
| 162 |
+
async def StreamingRecognize(self,
|
| 163 |
+
request_iterator,
|
| 164 |
+
context: grpc.ServicerContext) -> Iterator[speech_pb2.TranscriptToken]:
|
| 165 |
+
"""
|
| 166 |
+
Processa stream de áudio usando a arquitetura Ultravox completa
|
| 167 |
+
|
| 168 |
+
Args:
|
| 169 |
+
request_iterator: Iterator de chunks de áudio
|
| 170 |
+
context: Contexto gRPC
|
| 171 |
+
|
| 172 |
+
Yields:
|
| 173 |
+
Tokens de transcrição + resposta do LLM
|
| 174 |
+
"""
|
| 175 |
+
session_id = None
|
| 176 |
+
start_time = time.time()
|
| 177 |
+
self.total_requests += 1
|
| 178 |
+
|
| 179 |
+
try:
|
| 180 |
+
# Coletar todo o áudio primeiro (como no Gradio)
|
| 181 |
+
audio_chunks = []
|
| 182 |
+
sample_rate = 16000
|
| 183 |
+
prompt = None # Será obtido do metadata ou usado padrão
|
| 184 |
+
|
| 185 |
+
# Processar chunks de entrada
|
| 186 |
+
async for audio_chunk in request_iterator:
|
| 187 |
+
if not session_id:
|
| 188 |
+
session_id = audio_chunk.session_id or f"session_{self.total_requests}"
|
| 189 |
+
logger.info(f"Nova sessão Ultravox: {session_id}")
|
| 190 |
+
self.active_sessions += 1
|
| 191 |
+
|
| 192 |
+
# DEBUG: Log todos os campos recebidos
|
| 193 |
+
logger.info(f"DEBUG - Chunk recebido para {session_id}:")
|
| 194 |
+
logger.info(f" - audio_data: {len(audio_chunk.audio_data)} bytes")
|
| 195 |
+
logger.info(f" - sample_rate: {audio_chunk.sample_rate}")
|
| 196 |
+
logger.info(f" - is_final_chunk: {audio_chunk.is_final_chunk}")
|
| 197 |
+
|
| 198 |
+
# Obter prompt do campo system_prompt
|
| 199 |
+
if not prompt and audio_chunk.system_prompt:
|
| 200 |
+
prompt = audio_chunk.system_prompt
|
| 201 |
+
logger.info(f"✅ PROMPT DINÂMICO recebido: {prompt[:100]}...")
|
| 202 |
+
elif not audio_chunk.system_prompt:
|
| 203 |
+
logger.info(f"DEBUG - Sem system_prompt no chunk")
|
| 204 |
+
|
| 205 |
+
sample_rate = audio_chunk.sample_rate or 16000
|
| 206 |
+
|
| 207 |
+
# CRUCIAL: Converter de bytes para numpy float32 (como descoberto no Gradio)
|
| 208 |
+
audio_data = np.frombuffer(audio_chunk.audio_data, dtype=np.float32)
|
| 209 |
+
audio_chunks.append(audio_data)
|
| 210 |
+
|
| 211 |
+
# Se é chunk final, processar
|
| 212 |
+
if audio_chunk.is_final_chunk:
|
| 213 |
+
break
|
| 214 |
+
|
| 215 |
+
if not audio_chunks:
|
| 216 |
+
logger.warning(f"Nenhum áudio recebido para sessão {session_id}")
|
| 217 |
+
return
|
| 218 |
+
|
| 219 |
+
# Usar prompt padrão otimizado (formato que funciona!)
|
| 220 |
+
if not prompt:
|
| 221 |
+
prompt = """Você é um assistente brasileiro útil e conversacional.
|
| 222 |
+
Responda à pergunta que ouviu em português de forma natural e direta."""
|
| 223 |
+
logger.info("Usando prompt padrão")
|
| 224 |
+
|
| 225 |
+
# Concatenar todo o áudio
|
| 226 |
+
full_audio = np.concatenate(audio_chunks)
|
| 227 |
+
logger.info(f"Áudio processado: {len(full_audio)} samples @ {sample_rate}Hz para sessão {session_id}")
|
| 228 |
+
|
| 229 |
+
# Obter estado de conversação
|
| 230 |
+
conv_state = self._get_conversation_state(session_id)
|
| 231 |
+
conv_state['turn_count'] += 1
|
| 232 |
+
|
| 233 |
+
# Processar com vLLM ou Transformers
|
| 234 |
+
backend = "vLLM" if self.vllm_model else "Transformers"
|
| 235 |
+
logger.info(f"Iniciando inferência {backend} para sessão {session_id}")
|
| 236 |
+
inference_start = time.time()
|
| 237 |
+
|
| 238 |
+
try:
|
| 239 |
+
# USAR APENAS vLLM - SEM FALLBACK
|
| 240 |
+
if not self.vllm_model:
|
| 241 |
+
raise RuntimeError("vLLM não está carregado! Este servidor REQUER vLLM.")
|
| 242 |
+
|
| 243 |
+
# Usar vLLM para inferência acelerada (v0.10+ suporta Ultravox!)
|
| 244 |
+
from vllm import SamplingParams
|
| 245 |
+
|
| 246 |
+
# Preparar entrada para vLLM com áudio
|
| 247 |
+
# Formato otimizado que funciona com Ultravox v0.5
|
| 248 |
+
# O prompt já vem formatado do cliente, usar diretamente
|
| 249 |
+
vllm_prompt = prompt
|
| 250 |
+
|
| 251 |
+
# 🔍 LOG DETALHADO DO PROMPT PARA DEBUG
|
| 252 |
+
logger.info(f"🔍 PROMPT COMPLETO enviado para vLLM:")
|
| 253 |
+
logger.info(f" 📝 Prompt original recebido: '{prompt[:200]}...'")
|
| 254 |
+
logger.info(f" 🎯 Prompt formatado final: '{vllm_prompt[:200]}...'")
|
| 255 |
+
logger.info(f" 🎵 Áudio shape: {full_audio.shape}, dtype: {full_audio.dtype}")
|
| 256 |
+
logger.info(f" 📊 Áudio stats: min={full_audio.min():.3f}, max={full_audio.max():.3f}")
|
| 257 |
+
logger.info("=" * 80)
|
| 258 |
+
vllm_input = {
|
| 259 |
+
"prompt": vllm_prompt,
|
| 260 |
+
"multi_modal_data": {
|
| 261 |
+
"audio": full_audio # numpy array já em 16kHz
|
| 262 |
+
}
|
| 263 |
+
}
|
| 264 |
+
|
| 265 |
+
# Fazer inferência com vLLM
|
| 266 |
+
outputs = self.vllm_model.generate(
|
| 267 |
+
prompts=[vllm_input],
|
| 268 |
+
sampling_params=self.sampling_params
|
| 269 |
+
)
|
| 270 |
+
|
| 271 |
+
inference_time = time.time() - inference_start
|
| 272 |
+
logger.info(f"⚡ Inferência vLLM concluída em {inference_time*1000:.0f}ms")
|
| 273 |
+
|
| 274 |
+
# 🔍 LOG DETALHADO DA RESPOSTA vLLM
|
| 275 |
+
logger.info(f"🔍 RESPOSTA DETALHADA do vLLM:")
|
| 276 |
+
logger.info(f" 📤 Outputs count: {len(outputs)}")
|
| 277 |
+
logger.info(f" 📤 Outputs[0].outputs count: {len(outputs[0].outputs)}")
|
| 278 |
+
|
| 279 |
+
# Extrair resposta
|
| 280 |
+
response_text = outputs[0].outputs[0].text
|
| 281 |
+
logger.info(f" 📝 Resposta RAW: '{response_text}'")
|
| 282 |
+
logger.info(f" 📏 Tamanho resposta: {len(response_text)} chars")
|
| 283 |
+
|
| 284 |
+
if not response_text:
|
| 285 |
+
response_text = "Desculpe, não consegui processar o áudio. Poderia repetir?"
|
| 286 |
+
logger.info(f" ⚠️ Resposta vazia, usando fallback")
|
| 287 |
+
else:
|
| 288 |
+
logger.info(f" ✅ Resposta válida recebida")
|
| 289 |
+
|
| 290 |
+
logger.info(f" 🎯 Resposta final: '{response_text[:100]}...'")
|
| 291 |
+
logger.info("=" * 80)
|
| 292 |
+
|
| 293 |
+
# Sem else - SEMPRE usar vLLM
|
| 294 |
+
|
| 295 |
+
# Simular streaming dividindo a resposta em tokens
|
| 296 |
+
words = response_text.split()
|
| 297 |
+
token_count = 0
|
| 298 |
+
|
| 299 |
+
for word in words:
|
| 300 |
+
# Criar token de resposta
|
| 301 |
+
token = speech_pb2.TranscriptToken()
|
| 302 |
+
token.text = word + " "
|
| 303 |
+
token.confidence = 0.95
|
| 304 |
+
token.is_final = False
|
| 305 |
+
token.timestamp_ms = int((time.time() - start_time) * 1000)
|
| 306 |
+
|
| 307 |
+
# Metadados de emoção
|
| 308 |
+
token.emotion.emotion = speech_pb2.EmotionMetadata.NEUTRAL
|
| 309 |
+
token.emotion.confidence = 0.8
|
| 310 |
+
|
| 311 |
+
# Metadados de prosódia
|
| 312 |
+
token.prosody.speech_rate = 120.0
|
| 313 |
+
token.prosody.pitch_mean = 150.0
|
| 314 |
+
token.prosody.energy = -20.0
|
| 315 |
+
token.prosody.pitch_variance = 50.0
|
| 316 |
+
|
| 317 |
+
token_count += 1
|
| 318 |
+
self.total_tokens_generated += 1
|
| 319 |
+
|
| 320 |
+
logger.debug(f"Token {token_count}: '{word}' para sessão {session_id}")
|
| 321 |
+
|
| 322 |
+
yield token
|
| 323 |
+
|
| 324 |
+
# Pequena pausa para simular streaming
|
| 325 |
+
await asyncio.sleep(0.05)
|
| 326 |
+
|
| 327 |
+
# Token final
|
| 328 |
+
final_token = speech_pb2.TranscriptToken()
|
| 329 |
+
final_token.text = "" # Token vazio indica fim
|
| 330 |
+
final_token.confidence = 1.0
|
| 331 |
+
final_token.is_final = True
|
| 332 |
+
final_token.timestamp_ms = int((time.time() - start_time) * 1000)
|
| 333 |
+
|
| 334 |
+
logger.info(f"✅ Processamento completo: {token_count} tokens, {inference_time*1000:.0f}ms")
|
| 335 |
+
|
| 336 |
+
yield final_token
|
| 337 |
+
|
| 338 |
+
except Exception as model_error:
|
| 339 |
+
logger.error(f"Erro no modelo Transformers: {model_error}")
|
| 340 |
+
# Retornar erro como token
|
| 341 |
+
error_token = speech_pb2.TranscriptToken()
|
| 342 |
+
error_token.text = f"Erro no processamento: {str(model_error)}"
|
| 343 |
+
error_token.confidence = 0.0
|
| 344 |
+
error_token.is_final = True
|
| 345 |
+
error_token.timestamp_ms = int((time.time() - start_time) * 1000)
|
| 346 |
+
|
| 347 |
+
yield error_token
|
| 348 |
+
|
| 349 |
+
# Limpar sessões antigas periodicamente
|
| 350 |
+
if self.total_requests % 10 == 0:
|
| 351 |
+
self._cleanup_old_sessions()
|
| 352 |
+
|
| 353 |
+
except Exception as e:
|
| 354 |
+
logger.error(f"Erro na transcrição para sessão {session_id}: {e}")
|
| 355 |
+
# Enviar token de erro
|
| 356 |
+
error_token = speech_pb2.TranscriptToken()
|
| 357 |
+
error_token.text = ""
|
| 358 |
+
error_token.confidence = 0.0
|
| 359 |
+
error_token.is_final = True
|
| 360 |
+
error_token.timestamp_ms = int((time.time() - start_time) * 1000)
|
| 361 |
+
yield error_token
|
| 362 |
+
|
| 363 |
+
finally:
|
| 364 |
+
if session_id:
|
| 365 |
+
self.active_sessions = max(0, self.active_sessions - 1)
|
| 366 |
+
processing_time = time.time() - start_time
|
| 367 |
+
logger.info(f"Sessão {session_id} concluída. Latência: {processing_time*1000:.2f}ms")
|
| 368 |
+
|
| 369 |
+
async def GetMetrics(self, request: speech_pb2.Empty,
|
| 370 |
+
context: grpc.ServicerContext) -> speech_pb2.Metrics:
|
| 371 |
+
"""Retorna métricas do serviço"""
|
| 372 |
+
import psutil
|
| 373 |
+
import torch
|
| 374 |
+
|
| 375 |
+
metrics = speech_pb2.Metrics()
|
| 376 |
+
metrics.total_requests = self.total_requests
|
| 377 |
+
metrics.active_sessions = self.active_sessions
|
| 378 |
+
|
| 379 |
+
# Latência média (placeholder)
|
| 380 |
+
metrics.average_latency_ms = 500.0
|
| 381 |
+
|
| 382 |
+
# Uso de GPU (sempre GPU conforme solicitado)
|
| 383 |
+
try:
|
| 384 |
+
metrics.gpu_usage_percent = float(torch.cuda.utilization())
|
| 385 |
+
metrics.memory_usage_mb = float(torch.cuda.memory_allocated() / (1024 * 1024))
|
| 386 |
+
except:
|
| 387 |
+
metrics.gpu_usage_percent = 0.0
|
| 388 |
+
metrics.memory_usage_mb = 0.0
|
| 389 |
+
|
| 390 |
+
# Tokens por segundo (deve ser int64 conforme protobuf)
|
| 391 |
+
metrics.tokens_per_second = int(self.total_tokens_generated / max(1, time.time() - self._start_time))
|
| 392 |
+
|
| 393 |
+
return metrics
|
| 394 |
+
|
| 395 |
+
|
| 396 |
+
async def serve():
|
| 397 |
+
"""Inicia servidor gRPC"""
|
| 398 |
+
# Configurar servidor
|
| 399 |
+
server = grpc.aio.server(
|
| 400 |
+
futures.ThreadPoolExecutor(max_workers=10),
|
| 401 |
+
options=[
|
| 402 |
+
('grpc.max_send_message_length', 10 * 1024 * 1024),
|
| 403 |
+
('grpc.max_receive_message_length', 10 * 1024 * 1024),
|
| 404 |
+
('grpc.keepalive_time_ms', 30000),
|
| 405 |
+
('grpc.keepalive_timeout_ms', 10000),
|
| 406 |
+
('grpc.http2.min_time_between_pings_ms', 30000),
|
| 407 |
+
]
|
| 408 |
+
)
|
| 409 |
+
|
| 410 |
+
# Adicionar serviço
|
| 411 |
+
speech_pb2_grpc.add_SpeechServiceServicer_to_server(
|
| 412 |
+
UltravoxServicer(), server
|
| 413 |
+
)
|
| 414 |
+
|
| 415 |
+
# Configurar porta
|
| 416 |
+
port = os.getenv('ULTRAVOX_PORT', '50051')
|
| 417 |
+
server.add_insecure_port(f'[::]:{port}')
|
| 418 |
+
|
| 419 |
+
logger.info(f"Ultravox Server iniciando na porta {port}...")
|
| 420 |
+
await server.start()
|
| 421 |
+
logger.info(f"Ultravox Server rodando na porta {port}")
|
| 422 |
+
|
| 423 |
+
try:
|
| 424 |
+
await server.wait_for_termination()
|
| 425 |
+
except KeyboardInterrupt:
|
| 426 |
+
logger.info("Parando servidor...")
|
| 427 |
+
await server.stop(grace_period=5)
|
| 428 |
+
|
| 429 |
+
|
| 430 |
+
def main():
|
| 431 |
+
"""Função principal"""
|
| 432 |
+
try:
|
| 433 |
+
asyncio.run(serve())
|
| 434 |
+
except Exception as e:
|
| 435 |
+
logger.error(f"Erro fatal: {e}")
|
| 436 |
+
sys.exit(1)
|
| 437 |
+
|
| 438 |
+
|
| 439 |
+
if __name__ == "__main__":
|
| 440 |
+
main()
|
ultravox/speech.proto
ADDED
|
@@ -0,0 +1,94 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
syntax = "proto3";
|
| 2 |
+
|
| 3 |
+
package speech;
|
| 4 |
+
|
| 5 |
+
service SpeechService {
|
| 6 |
+
// Streaming bidirecional para reconhecimento de fala
|
| 7 |
+
rpc StreamingRecognize(stream AudioChunk) returns (stream TranscriptToken);
|
| 8 |
+
|
| 9 |
+
// Endpoint para métricas
|
| 10 |
+
rpc GetMetrics(Empty) returns (Metrics);
|
| 11 |
+
}
|
| 12 |
+
|
| 13 |
+
// Chunk de áudio enviado pelo cliente
|
| 14 |
+
message AudioChunk {
|
| 15 |
+
bytes audio_data = 1; // PCM float32
|
| 16 |
+
int32 sample_rate = 2; // Taxa de amostragem (16000)
|
| 17 |
+
int64 timestamp_ms = 3; // Timestamp em millisegundos
|
| 18 |
+
int32 sequence_number = 4; // Número de sequência
|
| 19 |
+
bool is_final_chunk = 5; // Indica fim do áudio
|
| 20 |
+
|
| 21 |
+
// Metadados opcionais
|
| 22 |
+
float voice_activity_probability = 6; // Probabilidade de atividade de voz
|
| 23 |
+
string session_id = 7; // ID da sessão
|
| 24 |
+
string system_prompt = 8; // Prompt do sistema para contexto dinâmico
|
| 25 |
+
string user_prompt = 9; // Prompt do usuário (instrução específica)
|
| 26 |
+
}
|
| 27 |
+
|
| 28 |
+
// Token de transcrição retornado
|
| 29 |
+
message TranscriptToken {
|
| 30 |
+
string text = 1; // Texto transcrito
|
| 31 |
+
float confidence = 2; // Confiança da transcrição
|
| 32 |
+
bool is_final = 3; // Token final da frase
|
| 33 |
+
int64 timestamp_ms = 4; // Timestamp
|
| 34 |
+
|
| 35 |
+
// Metadados contextuais
|
| 36 |
+
EmotionMetadata emotion = 5; // Emoção detectada
|
| 37 |
+
ProsodyMetadata prosody = 6; // Prosódia detectada
|
| 38 |
+
|
| 39 |
+
// Validação e diagnóstico
|
| 40 |
+
ValidationResult validation = 7; // Resultado da validação
|
| 41 |
+
}
|
| 42 |
+
|
| 43 |
+
// Resultado da validação com erros específicos
|
| 44 |
+
message ValidationResult {
|
| 45 |
+
enum ValidationStatus {
|
| 46 |
+
VALID = 0; // Resposta válida
|
| 47 |
+
EMPTY_RESPONSE = 1; // Resposta vazia ou muito curta
|
| 48 |
+
GENERIC_ERROR = 2; // Resposta genérica de erro
|
| 49 |
+
AUDIO_QUALITY_ISSUE = 3; // Problemas de qualidade do áudio
|
| 50 |
+
PROMPT_FORMAT_ERROR = 4; // Formato de prompt inválido
|
| 51 |
+
MODEL_ERROR = 5; // Erro interno do modelo
|
| 52 |
+
RETRY_SUCCESSFUL = 6; // Retry foi bem-sucedido
|
| 53 |
+
}
|
| 54 |
+
|
| 55 |
+
ValidationStatus status = 1; // Status da validação
|
| 56 |
+
string error_message = 2; // Mensagem de erro específica
|
| 57 |
+
string diagnostic_info = 3; // Informações técnicas de diagnóstico
|
| 58 |
+
bool retry_attempted = 4; // Se foi tentado um retry
|
| 59 |
+
}
|
| 60 |
+
|
| 61 |
+
// Metadados de emoção
|
| 62 |
+
message EmotionMetadata {
|
| 63 |
+
enum Emotion {
|
| 64 |
+
NEUTRAL = 0;
|
| 65 |
+
HAPPY = 1;
|
| 66 |
+
SAD = 2;
|
| 67 |
+
ANGRY = 3;
|
| 68 |
+
SURPRISED = 4;
|
| 69 |
+
FEARFUL = 5;
|
| 70 |
+
}
|
| 71 |
+
Emotion emotion = 1;
|
| 72 |
+
float confidence = 2;
|
| 73 |
+
}
|
| 74 |
+
|
| 75 |
+
// Metadados de prosódia
|
| 76 |
+
message ProsodyMetadata {
|
| 77 |
+
float speech_rate = 1; // Palavras por minuto
|
| 78 |
+
float pitch_mean = 2; // Pitch médio em Hz
|
| 79 |
+
float energy = 3; // Energia em dB
|
| 80 |
+
float pitch_variance = 4; // Variância do pitch
|
| 81 |
+
}
|
| 82 |
+
|
| 83 |
+
// Métricas do serviço
|
| 84 |
+
message Metrics {
|
| 85 |
+
int64 total_requests = 1;
|
| 86 |
+
int64 active_sessions = 2;
|
| 87 |
+
float average_latency_ms = 3;
|
| 88 |
+
float gpu_usage_percent = 4;
|
| 89 |
+
float memory_usage_mb = 5;
|
| 90 |
+
int64 tokens_per_second = 6;
|
| 91 |
+
}
|
| 92 |
+
|
| 93 |
+
// Mensagem vazia
|
| 94 |
+
message Empty {}
|
ultravox/start_ultravox.sh
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
|
| 3 |
+
# Script para iniciar o servidor Ultravox com limpeza de processos órfãos
|
| 4 |
+
# Evita problemas de memória GPU ocupada por processos vLLM antigos
|
| 5 |
+
|
| 6 |
+
echo "🔧 Iniciando servidor Ultravox..."
|
| 7 |
+
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
| 8 |
+
|
| 9 |
+
# 1. Limpar processos órfãos vLLM/EngineCore
|
| 10 |
+
echo "🧹 Limpando processos órfãos..."
|
| 11 |
+
pkill -f "VLLM::EngineCore" 2>/dev/null
|
| 12 |
+
pkill -f "vllm.*engine" 2>/dev/null
|
| 13 |
+
pkill -f "multiprocessing.resource_tracker.*ultravox" 2>/dev/null
|
| 14 |
+
pkill -f "python.*server.py" 2>/dev/null
|
| 15 |
+
sleep 2
|
| 16 |
+
|
| 17 |
+
# 2. Verificar memória GPU antes de iniciar
|
| 18 |
+
echo "📊 Verificando GPU..."
|
| 19 |
+
GPU_FREE=$(nvidia-smi --query-gpu=memory.free --format=csv,noheader,nounits 2>/dev/null | head -1)
|
| 20 |
+
GPU_TOTAL=$(nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits 2>/dev/null | head -1)
|
| 21 |
+
|
| 22 |
+
if [ -n "$GPU_FREE" ] && [ -n "$GPU_TOTAL" ]; then
|
| 23 |
+
echo " GPU: ${GPU_FREE}MB livres de ${GPU_TOTAL}MB"
|
| 24 |
+
|
| 25 |
+
# Verificar se tem pelo menos 20GB livres
|
| 26 |
+
if [ "$GPU_FREE" -lt "20000" ]; then
|
| 27 |
+
echo "⚠️ AVISO: Menos de 20GB livres na GPU!"
|
| 28 |
+
echo " Tentando limpar mais processos..."
|
| 29 |
+
|
| 30 |
+
# Limpar mais agressivamente
|
| 31 |
+
pkill -9 -f "vllm" 2>/dev/null
|
| 32 |
+
pkill -9 -f "EngineCore" 2>/dev/null
|
| 33 |
+
sleep 3
|
| 34 |
+
|
| 35 |
+
# Verificar novamente
|
| 36 |
+
GPU_FREE=$(nvidia-smi --query-gpu=memory.free --format=csv,noheader,nounits 2>/dev/null | head -1)
|
| 37 |
+
echo " GPU após limpeza: ${GPU_FREE}MB livres"
|
| 38 |
+
fi
|
| 39 |
+
fi
|
| 40 |
+
|
| 41 |
+
# 3. Verificar se a porta está livre
|
| 42 |
+
if lsof -i :50051 >/dev/null 2>&1; then
|
| 43 |
+
echo "⚠️ Porta 50051 em uso. Matando processo..."
|
| 44 |
+
kill -9 $(lsof -t -i:50051) 2>/dev/null
|
| 45 |
+
sleep 2
|
| 46 |
+
fi
|
| 47 |
+
|
| 48 |
+
# 4. Ativar ambiente virtual
|
| 49 |
+
echo "🐍 Ativando ambiente Python..."
|
| 50 |
+
cd /workspace/ultravox-pipeline/ultravox
|
| 51 |
+
source venv/bin/activate
|
| 52 |
+
|
| 53 |
+
# 5. Iniciar servidor
|
| 54 |
+
echo "🚀 Iniciando servidor Ultravox..."
|
| 55 |
+
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
| 56 |
+
echo " Modelo: Ultravox v0.5 Llama 3.1-8B"
|
| 57 |
+
echo " Porta: 50051"
|
| 58 |
+
echo " GPU: 90% utilization"
|
| 59 |
+
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
| 60 |
+
echo ""
|
| 61 |
+
echo "📝 Logs do servidor:"
|
| 62 |
+
echo ""
|
| 63 |
+
|
| 64 |
+
# Executar servidor com trap para limpeza ao sair
|
| 65 |
+
trap 'echo "🛑 Parando servidor..."; pkill -f "VLLM::EngineCore"; pkill -f "python.*server.py"' INT TERM
|
| 66 |
+
|
| 67 |
+
python server.py
|
ultravox/stop_ultravox.sh
ADDED
|
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
|
| 3 |
+
# Script para parar o servidor Ultravox e limpar todos os processos relacionados
|
| 4 |
+
|
| 5 |
+
echo "🛑 Parando servidor Ultravox..."
|
| 6 |
+
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
| 7 |
+
|
| 8 |
+
# 1. Parar servidor principal
|
| 9 |
+
echo "📍 Parando processo principal..."
|
| 10 |
+
pkill -f "python.*server.py" 2>/dev/null
|
| 11 |
+
|
| 12 |
+
# 2. Limpar processos vLLM
|
| 13 |
+
echo "🧹 Limpando processos vLLM..."
|
| 14 |
+
pkill -f "VLLM::EngineCore" 2>/dev/null
|
| 15 |
+
pkill -f "vllm.*engine" 2>/dev/null
|
| 16 |
+
pkill -f "multiprocessing.resource_tracker.*ultravox" 2>/dev/null
|
| 17 |
+
|
| 18 |
+
# 3. Verificar porta 50051
|
| 19 |
+
echo "🔍 Verificando porta 50051..."
|
| 20 |
+
if lsof -i :50051 >/dev/null 2>&1; then
|
| 21 |
+
echo " ⚠️ Porta ainda em uso, forçando encerramento..."
|
| 22 |
+
kill -9 $(lsof -t -i:50051) 2>/dev/null
|
| 23 |
+
fi
|
| 24 |
+
|
| 25 |
+
# 4. Limpar processos órfãos mais agressivamente
|
| 26 |
+
echo "🔨 Limpeza final de processos..."
|
| 27 |
+
pkill -9 -f "VLLM::EngineCore" 2>/dev/null
|
| 28 |
+
pkill -9 -f "vllm" 2>/dev/null
|
| 29 |
+
pkill -9 -f "ultravox.*python" 2>/dev/null
|
| 30 |
+
|
| 31 |
+
# 5. Aguardar liberação
|
| 32 |
+
sleep 3
|
| 33 |
+
|
| 34 |
+
# 6. Verificar memória GPU
|
| 35 |
+
echo ""
|
| 36 |
+
echo "📊 Status da GPU após limpeza:"
|
| 37 |
+
GPU_FREE=$(nvidia-smi --query-gpu=memory.free --format=csv,noheader,nounits 2>/dev/null | head -1)
|
| 38 |
+
GPU_TOTAL=$(nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits 2>/dev/null | head -1)
|
| 39 |
+
GPU_USED=$(nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits 2>/dev/null | head -1)
|
| 40 |
+
|
| 41 |
+
if [ -n "$GPU_FREE" ]; then
|
| 42 |
+
echo " ✅ GPU: ${GPU_FREE}MB livres / ${GPU_USED}MB usados / ${GPU_TOTAL}MB total"
|
| 43 |
+
else
|
| 44 |
+
echo " ❌ Não foi possível verificar GPU"
|
| 45 |
+
fi
|
| 46 |
+
|
| 47 |
+
# 7. Verificar processos restantes
|
| 48 |
+
echo ""
|
| 49 |
+
echo "🔍 Verificando processos restantes..."
|
| 50 |
+
REMAINING=$(ps aux | grep -E "vllm|ultravox|EngineCore" | grep -v grep | wc -l)
|
| 51 |
+
if [ "$REMAINING" -eq "0" ]; then
|
| 52 |
+
echo " ✅ Todos os processos foram encerrados"
|
| 53 |
+
else
|
| 54 |
+
echo " ⚠️ Ainda existem $REMAINING processos relacionados:"
|
| 55 |
+
ps aux | grep -E "vllm|ultravox|EngineCore" | grep -v grep
|
| 56 |
+
fi
|
| 57 |
+
|
| 58 |
+
echo ""
|
| 59 |
+
echo "✅ Servidor Ultravox parado!"
|
| 60 |
+
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
ultravox/test-tts.py
ADDED
|
@@ -0,0 +1,121 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Script de teste para Ultravox com TTS
|
| 4 |
+
Envia uma pergunta via áudio sintetizado e verifica a resposta
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import grpc
|
| 8 |
+
import numpy as np
|
| 9 |
+
import asyncio
|
| 10 |
+
import time
|
| 11 |
+
from gtts import gTTS
|
| 12 |
+
from pydub import AudioSegment
|
| 13 |
+
import io
|
| 14 |
+
import sys
|
| 15 |
+
import os
|
| 16 |
+
|
| 17 |
+
# Adicionar o path para os protobuffers
|
| 18 |
+
sys.path.append('/workspace/ultravox-pipeline/ultravox')
|
| 19 |
+
import speech_pb2
|
| 20 |
+
import speech_pb2_grpc
|
| 21 |
+
|
| 22 |
+
async def test_ultravox_with_tts():
|
| 23 |
+
"""Testa o Ultravox enviando áudio TTS com a pergunta 'Quanto é 2 + 2?'"""
|
| 24 |
+
|
| 25 |
+
print("🎤 Iniciando teste do Ultravox com TTS...")
|
| 26 |
+
|
| 27 |
+
# 1. Gerar áudio TTS com a pergunta
|
| 28 |
+
print("🔊 Gerando áudio TTS: 'Quanto é dois mais dois?'")
|
| 29 |
+
tts = gTTS(text="Quanto é dois mais dois?", lang='pt-br')
|
| 30 |
+
|
| 31 |
+
# Salvar em buffer de memória
|
| 32 |
+
mp3_buffer = io.BytesIO()
|
| 33 |
+
tts.write_to_fp(mp3_buffer)
|
| 34 |
+
mp3_buffer.seek(0)
|
| 35 |
+
|
| 36 |
+
# Converter MP3 para PCM 16kHz
|
| 37 |
+
audio = AudioSegment.from_mp3(mp3_buffer)
|
| 38 |
+
audio = audio.set_frame_rate(16000).set_channels(1).set_sample_width(2)
|
| 39 |
+
|
| 40 |
+
# Converter para numpy array float32
|
| 41 |
+
samples = np.array(audio.get_array_of_samples()).astype(np.float32) / 32768.0
|
| 42 |
+
|
| 43 |
+
print(f"✅ Áudio gerado: {len(samples)} samples @ 16kHz")
|
| 44 |
+
print(f" Duração: {len(samples)/16000:.2f} segundos")
|
| 45 |
+
|
| 46 |
+
# 2. Conectar ao servidor Ultravox
|
| 47 |
+
print("\n📡 Conectando ao Ultravox na porta 50051...")
|
| 48 |
+
|
| 49 |
+
try:
|
| 50 |
+
channel = grpc.aio.insecure_channel('localhost:50051')
|
| 51 |
+
stub = speech_pb2_grpc.UltravoxServiceStub(channel)
|
| 52 |
+
|
| 53 |
+
# 3. Criar request com o áudio
|
| 54 |
+
session_id = f"test_{int(time.time())}"
|
| 55 |
+
|
| 56 |
+
async def audio_generator():
|
| 57 |
+
"""Gera chunks de áudio para enviar"""
|
| 58 |
+
request = speech_pb2.AudioRequest()
|
| 59 |
+
request.session_id = session_id
|
| 60 |
+
request.audio_data = samples.tobytes()
|
| 61 |
+
request.sample_rate = 16000
|
| 62 |
+
request.is_final_chunk = True
|
| 63 |
+
request.system_prompt = "Responda em português de forma simples e direta"
|
| 64 |
+
|
| 65 |
+
print(f"📤 Enviando áudio para sessão: {session_id}")
|
| 66 |
+
yield request
|
| 67 |
+
|
| 68 |
+
# 4. Enviar e receber resposta
|
| 69 |
+
print("\n⏳ Aguardando resposta do Ultravox...")
|
| 70 |
+
start_time = time.time()
|
| 71 |
+
|
| 72 |
+
response_text = ""
|
| 73 |
+
token_count = 0
|
| 74 |
+
|
| 75 |
+
async for response in stub.TranscribeStream(audio_generator()):
|
| 76 |
+
if response.text:
|
| 77 |
+
response_text += response.text
|
| 78 |
+
token_count += 1
|
| 79 |
+
print(f" Token {token_count}: '{response.text.strip()}'")
|
| 80 |
+
|
| 81 |
+
if response.is_final:
|
| 82 |
+
break
|
| 83 |
+
|
| 84 |
+
elapsed = time.time() - start_time
|
| 85 |
+
|
| 86 |
+
# 5. Verificar resposta
|
| 87 |
+
print(f"\n📝 Resposta completa: '{response_text.strip()}'")
|
| 88 |
+
print(f"⏱️ Tempo de resposta: {elapsed:.2f}s")
|
| 89 |
+
print(f"📊 Tokens recebidos: {token_count}")
|
| 90 |
+
|
| 91 |
+
# Verificar se a resposta contém "4" ou "quatro"
|
| 92 |
+
if "4" in response_text.lower() or "quatro" in response_text.lower():
|
| 93 |
+
print("\n✅ SUCESSO! O Ultravox respondeu corretamente!")
|
| 94 |
+
else:
|
| 95 |
+
print("\n⚠️ AVISO: A resposta não contém '4' ou 'quatro'")
|
| 96 |
+
|
| 97 |
+
await channel.close()
|
| 98 |
+
|
| 99 |
+
except grpc.RpcError as e:
|
| 100 |
+
print(f"\n❌ Erro gRPC: {e.code()} - {e.details()}")
|
| 101 |
+
return False
|
| 102 |
+
except Exception as e:
|
| 103 |
+
print(f"\n❌ Erro: {e}")
|
| 104 |
+
return False
|
| 105 |
+
|
| 106 |
+
return True
|
| 107 |
+
|
| 108 |
+
if __name__ == "__main__":
|
| 109 |
+
print("=" * 60)
|
| 110 |
+
print("TESTE ULTRAVOX COM TTS")
|
| 111 |
+
print("=" * 60)
|
| 112 |
+
|
| 113 |
+
# Executar teste
|
| 114 |
+
success = asyncio.run(test_ultravox_with_tts())
|
| 115 |
+
|
| 116 |
+
if success:
|
| 117 |
+
print("\n🎉 Teste concluído com sucesso!")
|
| 118 |
+
else:
|
| 119 |
+
print("\n❌ Teste falhou!")
|
| 120 |
+
|
| 121 |
+
print("=" * 60)
|
ultravox/test_audio_coherence.py
ADDED
|
@@ -0,0 +1,193 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Script de teste para verificar coerência das respostas do Ultravox
|
| 4 |
+
Envia áudio sintético com perguntas específicas e verifica as respostas
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import grpc
|
| 8 |
+
import numpy as np
|
| 9 |
+
import sys
|
| 10 |
+
import time
|
| 11 |
+
from pathlib import Path
|
| 12 |
+
|
| 13 |
+
# Adiciona o diretório ao path
|
| 14 |
+
sys.path.append(str(Path(__file__).parent))
|
| 15 |
+
|
| 16 |
+
import ultravox_service_pb2
|
| 17 |
+
import ultravox_service_pb2_grpc
|
| 18 |
+
|
| 19 |
+
def create_test_audio(text_prompt, duration=2.0, sample_rate=16000):
|
| 20 |
+
"""
|
| 21 |
+
Cria áudio de teste sintético simulando fala
|
| 22 |
+
Em produção, isso seria áudio real gravado
|
| 23 |
+
"""
|
| 24 |
+
# Simula padrão de fala com modulação
|
| 25 |
+
t = np.linspace(0, duration, int(sample_rate * duration))
|
| 26 |
+
|
| 27 |
+
# Frequências típicas da voz humana (100-300 Hz fundamental)
|
| 28 |
+
base_freq = 150 + 50 * np.sin(2 * np.pi * 0.5 * t) # Modulação lenta
|
| 29 |
+
|
| 30 |
+
# Gera sinal complexo simulando voz
|
| 31 |
+
audio = np.zeros_like(t)
|
| 32 |
+
|
| 33 |
+
# Adiciona harmônicos
|
| 34 |
+
for harmonic in range(1, 8):
|
| 35 |
+
freq = base_freq * harmonic
|
| 36 |
+
amplitude = 1.0 / harmonic # Harmônicos mais altos têm menor amplitude
|
| 37 |
+
audio += amplitude * np.sin(2 * np.pi * freq * t)
|
| 38 |
+
|
| 39 |
+
# Adiciona envelope de amplitude (simula palavras)
|
| 40 |
+
envelope = 0.5 + 0.5 * np.sin(2 * np.pi * 2 * t)
|
| 41 |
+
audio *= envelope
|
| 42 |
+
|
| 43 |
+
# Normaliza para float32 entre -1 e 1
|
| 44 |
+
audio = audio / (np.max(np.abs(audio)) + 1e-10)
|
| 45 |
+
|
| 46 |
+
return audio.astype(np.float32)
|
| 47 |
+
|
| 48 |
+
def test_ultravox_coherence():
|
| 49 |
+
"""Testa a coerência das respostas do Ultravox"""
|
| 50 |
+
|
| 51 |
+
print("=" * 60)
|
| 52 |
+
print("🎯 TESTE DE COERÊNCIA DO ULTRAVOX")
|
| 53 |
+
print("=" * 60)
|
| 54 |
+
|
| 55 |
+
# Conecta ao servidor
|
| 56 |
+
try:
|
| 57 |
+
channel = grpc.insecure_channel('localhost:50051')
|
| 58 |
+
stub = ultravox_service_pb2_grpc.UltravoxServiceStub(channel)
|
| 59 |
+
print("✅ Conectado ao Ultravox em localhost:50051")
|
| 60 |
+
except Exception as e:
|
| 61 |
+
print(f"❌ Erro ao conectar: {e}")
|
| 62 |
+
return False
|
| 63 |
+
|
| 64 |
+
# Define perguntas de teste e respostas esperadas (palavras-chave)
|
| 65 |
+
test_cases = [
|
| 66 |
+
{
|
| 67 |
+
"pergunta": "Qual é o seu nome?",
|
| 68 |
+
"audio_duration": 1.5,
|
| 69 |
+
"keywords_pt": ["nome", "assistente", "sou", "chamo"],
|
| 70 |
+
"keywords_wrong": ["今天", "আজ", "weather", "time"] # Chinês, Bengali, Inglês
|
| 71 |
+
},
|
| 72 |
+
{
|
| 73 |
+
"pergunta": "Que horas são agora?",
|
| 74 |
+
"audio_duration": 1.8,
|
| 75 |
+
"keywords_pt": ["hora", "tempo", "agora", "momento"],
|
| 76 |
+
"keywords_wrong": ["名字", "নাম", "name", "call"]
|
| 77 |
+
},
|
| 78 |
+
{
|
| 79 |
+
"pergunta": "O que você fez hoje?",
|
| 80 |
+
"audio_duration": 2.0,
|
| 81 |
+
"keywords_pt": ["hoje", "fiz", "fez", "dia"],
|
| 82 |
+
"keywords_wrong": ["明天", "আগামীকাল", "tomorrow", "yesterday"]
|
| 83 |
+
}
|
| 84 |
+
]
|
| 85 |
+
|
| 86 |
+
results = []
|
| 87 |
+
session_id = f"test_{int(time.time())}"
|
| 88 |
+
|
| 89 |
+
for i, test in enumerate(test_cases, 1):
|
| 90 |
+
print(f"\n📝 Teste {i}: '{test['pergunta']}'")
|
| 91 |
+
print("-" * 40)
|
| 92 |
+
|
| 93 |
+
# Cria áudio sintético
|
| 94 |
+
audio = create_test_audio(test['pergunta'], test['audio_duration'])
|
| 95 |
+
print(f" 🎤 Áudio criado: {len(audio)} samples @ 16kHz")
|
| 96 |
+
|
| 97 |
+
# Prepara requisição
|
| 98 |
+
request = ultravox_service_pb2.ProcessRequest(
|
| 99 |
+
session_id=session_id,
|
| 100 |
+
audio_data=audio.tobytes(),
|
| 101 |
+
system_prompt="" # Deixa vazio para usar o prompt padrão
|
| 102 |
+
)
|
| 103 |
+
|
| 104 |
+
try:
|
| 105 |
+
# Envia e recebe resposta
|
| 106 |
+
response_text = ""
|
| 107 |
+
start_time = time.time()
|
| 108 |
+
|
| 109 |
+
for response in stub.ProcessAudioStream([request]):
|
| 110 |
+
if response.token:
|
| 111 |
+
response_text += response.token
|
| 112 |
+
|
| 113 |
+
latency = (time.time() - start_time) * 1000
|
| 114 |
+
|
| 115 |
+
print(f" 📝 Resposta: '{response_text}'")
|
| 116 |
+
print(f" ⏱️ Latência: {latency:.0f}ms")
|
| 117 |
+
|
| 118 |
+
# Analisa a resposta
|
| 119 |
+
response_lower = response_text.lower()
|
| 120 |
+
|
| 121 |
+
# Verifica se está em português
|
| 122 |
+
has_portuguese = any(kw in response_lower for kw in test['keywords_pt'])
|
| 123 |
+
has_wrong_lang = any(kw in response_text for kw in test['keywords_wrong'])
|
| 124 |
+
|
| 125 |
+
# Detecta idioma pela presença de caracteres específicos
|
| 126 |
+
has_chinese = any('\u4e00' <= char <= '\u9fff' for char in response_text)
|
| 127 |
+
has_bengali = any('\u0980' <= char <= '\u09ff' for char in response_text)
|
| 128 |
+
|
| 129 |
+
# Resultado do teste
|
| 130 |
+
if has_chinese:
|
| 131 |
+
status = "❌ FALHOU - Resposta em CHINÊS"
|
| 132 |
+
success = False
|
| 133 |
+
elif has_bengali:
|
| 134 |
+
status = "❌ FALHOU - Resposta em BENGALI"
|
| 135 |
+
success = False
|
| 136 |
+
elif not response_text:
|
| 137 |
+
status = "❌ FALHOU - Resposta vazia"
|
| 138 |
+
success = False
|
| 139 |
+
elif has_portuguese:
|
| 140 |
+
status = "✅ PASSOU - Resposta coerente em português"
|
| 141 |
+
success = True
|
| 142 |
+
else:
|
| 143 |
+
status = "⚠️ INCERTO - Resposta não identificada"
|
| 144 |
+
success = False
|
| 145 |
+
|
| 146 |
+
print(f" {status}")
|
| 147 |
+
|
| 148 |
+
results.append({
|
| 149 |
+
"pergunta": test['pergunta'],
|
| 150 |
+
"resposta": response_text,
|
| 151 |
+
"success": success,
|
| 152 |
+
"status": status,
|
| 153 |
+
"latency": latency
|
| 154 |
+
})
|
| 155 |
+
|
| 156 |
+
except Exception as e:
|
| 157 |
+
print(f" ❌ Erro no teste: {e}")
|
| 158 |
+
results.append({
|
| 159 |
+
"pergunta": test['pergunta'],
|
| 160 |
+
"resposta": f"ERRO: {e}",
|
| 161 |
+
"success": False,
|
| 162 |
+
"status": "❌ ERRO",
|
| 163 |
+
"latency": 0
|
| 164 |
+
})
|
| 165 |
+
|
| 166 |
+
# Resumo dos resultados
|
| 167 |
+
print("\n" + "=" * 60)
|
| 168 |
+
print("📊 RESUMO DOS TESTES")
|
| 169 |
+
print("=" * 60)
|
| 170 |
+
|
| 171 |
+
passed = sum(1 for r in results if r['success'])
|
| 172 |
+
total = len(results)
|
| 173 |
+
|
| 174 |
+
for r in results:
|
| 175 |
+
emoji = "✅" if r['success'] else "❌"
|
| 176 |
+
print(f"{emoji} '{r['pergunta']}' -> {r['status']}")
|
| 177 |
+
if r['resposta'] and not r['success']:
|
| 178 |
+
print(f" Resposta recebida: '{r['resposta'][:100]}...'")
|
| 179 |
+
|
| 180 |
+
print(f"\n📈 Taxa de sucesso: {passed}/{total} ({100*passed/total:.0f}%)")
|
| 181 |
+
|
| 182 |
+
if passed == total:
|
| 183 |
+
print("🎉 TODOS OS TESTES PASSARAM! Ultravox respondendo coerentemente em português!")
|
| 184 |
+
elif passed > 0:
|
| 185 |
+
print("⚠️ PARCIAL: Alguns testes passaram, mas ainda há problemas de idioma")
|
| 186 |
+
else:
|
| 187 |
+
print("❌ FALHA TOTAL: Nenhum teste passou - respostas em idioma incorreto")
|
| 188 |
+
|
| 189 |
+
return passed == total
|
| 190 |
+
|
| 191 |
+
if __name__ == "__main__":
|
| 192 |
+
success = test_ultravox_coherence()
|
| 193 |
+
sys.exit(0 if success else 1)
|