Gnani Prisma v2.5: Indian-Language Speech STT
Prisma v2.5 is Gnani's hosted Speech to Text platform built for Indian languages. It covers speech-to-text across 10 Indian languages plus Hinglish code-switching, with inference served from Gnani's servers. No model weights are shipped — you authenticate with an API key and call the pipeline.
Gnani Prisma v2.5 is trained on over 14 million hours of Indic speech data, the largest training corpus for Indian languages in production today. The models are built for conditions that generic ASR systems handle poorly: telephony-grade audio (8kHz, PSTN), noisy field environments, regional accents across tier 2 and rural India, and natural code-switching between Hindi and English in the same utterance.
Supported interaction patterns include REST (file-based) and real-time WebSocket streaming for STT, with auto language detection across all 10 supported languages.
Performance
Gnani Prisma v2.5 delivers 10–20% lower Word Error Rate compared to leading alternatives on Indic language benchmarks, with the gap widening on noisy audio — call center recordings, field environments, and telephony captures where background noise and channel distortion are typical.
Key specs:
- STT latency: P95 < 200ms (streaming)
- Languages: 10 Indian languages + Hinglish code-mixed and Latin-script variants
- Audio input: optimized for both broadband and 8kHz telephony
Single repo · Hosted inference · No weights shipped
This is the Hugging Face integration for Gnani Prisma v2.5 (STT) and Gnani Timbre v2.0 (TTS) speech platform. All inference happens on Gnani's servers — no model weights, tokenizers, or processor files are included. You just need an API key.
Get Your API Key
- Sign up at gnani.ai
- Or email speechstack@gnani.ai
Installation
pip install gnani-vachana transformers
STT — Speech-to-Text (Gnani Prisma v2.5)
REST (file-based)
import os
os.environ["GNANI_API_KEY"] = "your-api-key"
from transformers import pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="gnani-ai/vachana",
trust_remote_code=True,
)
result = pipe("audio.wav", language_code="hi-IN")
print(result["text"])
Realtime (WebSocket streaming)
import os
os.environ["GNANI_API_KEY"] = "your-api-key"
from transformers import pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="gnani-ai/vachana",
trust_remote_code=True,
)
result = pipe("audio.wav", language_code="hi-IN", use_streaming=True)
print(result["text"])
Note: All STT modes (REST and realtime) only require
GNANI_API_KEY.
TTS — Text-to-Speech (Gnani Timbre v2.0)
Use tts_mode to select the transport:
tts_mode |
Transport | Latency | Best for |
|---|---|---|---|
"rest" |
HTTP POST | Higher | Batch synthesis, simple integrations |
"sse" |
SSE stream | Lower | Progressive playback, moderate latency |
"realtime" |
WebSocket | Lowest | Live agents, conversational apps |
REST (synchronous)
import os
os.environ["GNANI_API_KEY"] = "your-api-key"
from transformers import pipeline
pipe = pipeline(
"text-to-speech",
model="gnani-ai/vachana",
trust_remote_code=True,
)
result = pipe(
"नमस्ते, आप कैसे हैं?",
voice="Karan",
tts_mode="rest",
sample_rate=22050,
container="wav",
encoding="linear_pcm",
)
with open("output.wav", "wb") as f:
f.write(result["audio"])
Streaming (SSE)
import os
os.environ["GNANI_API_KEY"] = "your-api-key"
from transformers import pipeline
pipe = pipeline(
"text-to-speech",
model="gnani-ai/vachana",
trust_remote_code=True,
)
result = pipe(
"Hello, how are you?",
voice="Simran",
tts_mode="sse",
sample_rate=22050,
container="wav",
encoding="linear_pcm",
)
with open("output.wav", "wb") as f:
f.write(result["audio"])
Realtime (WebSocket)
import os
os.environ["GNANI_API_KEY"] = "your-api-key"
from transformers import pipeline
pipe = pipeline(
"text-to-speech",
model="gnani-ai/vachana",
trust_remote_code=True,
)
result = pipe(
"Hello, how are you?",
voice="Raju",
tts_mode="realtime",
sample_rate=22050,
container="wav",
encoding="linear_pcm",
)
with open("output.wav", "wb") as f:
f.write(result["audio"])
Available Voices
| Voice | Style |
|---|---|
| Karan | Male — Bold, Trustworthy |
| Simran | Female — Confident, Bright |
| Nara | Female — Gentle, Expressive |
| Riya | Female — Cheerful, Energetic |
| Viraj | Male — Commanding, Dynamic |
| Raju | Male — Grounded, Conversational |
Supported Languages
Gnani Prisma v2.5 (STT) and Gnani Timbre v2.0 (TTS) support 10+ Indian languages.
- STT languages: Supported STT Languages
- TTS languages: Supported TTS Languages
Environment Variables
| Variable | Required For | Description |
|---|---|---|
GNANI_API_KEY |
All endpoints | Your Gnani API key |
Links
- Platform: gnani.ai
- Full docs: docs.gnani.ai
- Quick start: Quick Start Guide
Speech-to-Text (Gnani Prisma v2.5)
- STT REST: Transcribe audio files
- STT Realtime: WebSocket streaming
- STT Batch: Async batch transcription
Text-to-Speech (Gnani Timbre v2.0)
- TTS REST: Synchronous synthesis
- TTS Streaming: SSE streaming
- TTS Realtime: WebSocket streaming
- TTS Input Formatting: SSML & formatting
Use Case Guides
- Call Analytics Pipeline
- Podcast Transcription with Speaker Labels
- Real-Time Quality & Compliance Monitoring
Intended Use
Gnani Prisma v2.5 is built for production speech applications in Indian language contexts. Primary use cases:
- Contact center and IVR automation : optimized for telephony-grade audio (8kHz, PSTN/VoIP), the dominant deployment environment for Indian enterprise voice
- Conversational AI and voice agents : real-time streaming STT with Hinglish code-switching support for consumer-facing bots where speakers mix Hindi and English naturally mid-sentence
- Field and mobile applications : robust to ambient noise, low-quality microphones, and regional accent variation across tier 2 and rural India
- Multilingual transcription pipelines : batch or streaming transcription for content, compliance, or analytics workflows across 10 Indian languages
Gnani Prisma v2.5 performs well on audio that typically degrades generic ASR: noisy environments, narrow-band telephony, accented regional speech, and code-mixed utterances. These are supported use cases, not edge cases.
Out-of-Scope Use
- Languages outside the supported 10 Indian languages and Hinglish variants
- High-accuracy transcription of non-Indian English accents (use
en-INfor Indian English specifically) - Offline or on-device inference : all inference runs on Gnani's hosted infrastructure and requires an active API key and network connectivity
- Applications requiring model fine-tuning, weight access, or custom vocabulary injection at the architecture level : Gnani Prisma v2.5 is a hosted API, not an open model
- Medical, legal, or safety-critical transcription without human review — as with any ASR system, outputs should be validated before use in high-stakes decisions
- Downloads last month
- 65