Speech-to-text (STT) for voice AI
Real-time transcription that turns the caller’s speech into text the agent can act on.
Supported providers
Deepgram
Nova-3 multilingual (real-time code-switching); sub-300 ms partials.Sarvam (Saaras v3)
India-native ASR — 11 Indian languages + Hinglish code-mixing, streaming.ElevenLabs Scribe v2 Realtime
Streaming multilingual STT (90+ languages), ~150 ms first partials.AWS Transcribe
Streaming Hindi + Indian-English with en/hi code-switch; Mumbai region for data residency.Soniox
Real-time multilingual STT — strong Hindi + code-switching, token-streaming.OpenAI Whisper
Higher accuracy, non-streaming — better for batch.Groq (Whisper-Large)
Whisper-large running on Groq — much faster than OpenAI Whisper. ₹300.00 free credit on signup
Build a voice agent with speech-to-text (stt)
Sign up free and get ₹300.00 in credit — no card required. Connect your number, pick a template, and go live in minutes.
