Spaces:

pgits
/

stt-gpu-service-moshi-v4

Runtime error

App Files Files Community

stt-gpu-service-moshi-v4 / README.md

Peter Michael Gits

Initial commit: stt-gpu-service-moshi-v4 using official moshi-server

a6d9e69 about 2 months ago

preview code

raw

history blame contribute delete

1.93 kB

metadata

title: Kyutai STT GPU Service Moshi v4
emoji: 🎤
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: apache-2.0
hardware: t4-small
app_port: 7860

Kyutai STT GPU Service Moshi v4

Official Moshi-Server Implementation - A streaming Speech-to-Text service using the official moshi-server from Kyutai with proven protocols.

Features

Official moshi-server with MessagePack protocol
Real-time streaming via /api/asr-streaming endpoint
Proven performance - 64 concurrent streams on L40S, 400 on H100
125ms processing time for real-time transcription
Word-level timestamps and Voice Activity Detection
Multilingual support - kyutai/stt-1b-en_fr model

Architecture

This Space uses the official moshi-server binary instead of custom implementations:

cargo install --features cuda moshi-server
moshi-server worker --config configs/config-stt-en_fr-hf.toml

WebSocket API

Official Protocol

Endpoint: /api/asr-streaming
Protocol: MessagePack (not JSON)
Headers: kyutai-api-key: your-key

Message Format

# Send audio (80ms blocks, 1920 samples at 24kHz)
chunk = {"type": "Audio", "pcm": [float(x) for x in audio_data]}
msg = msgpack.packb(chunk, use_bin_type=True, use_single_float=True)

Response Types

"Step" messages: Voice Activity Detection
"Word" messages: Transcribed text with timestamps

Performance

Model: kyutai/stt-1b-en_fr (~1B params, 0.5s delay)
Processing: ~125ms per audio chunk
Concurrency: 64 streams per L40S GPU
Memory: ~2.5GB VRAM required

Development

Based on the official Kyutai delayed-streams-modeling framework with proven streaming protocols used in production by Unmute.sh.

Cost Management

Auto-sleep: 30 minutes inactivity
T4 GPU: $0.40/hour when active
Estimated: ~$29/month for 10 hours/week usage