Cloud APICommercialSelf-hostable

Deepgram Nova-3

Fastest streaming ASR — 75ms, 36 languages, Voice Agent API $4.50/hr, on-premise option

Website Docs

75ms

Latency (best case) ?

200ms

Latency (typical) ?

7.2%

WER (general audio) ?

$0.0036/min

Price per minute

Comparative Scores

Accuracy (WER)?8/10

Streaming latency?10/10

Multilingual7/10

Sovereignty?5/10

Price accessibility7/10

Streaming quality?10/10

Architecture

ArchitectureEnd-to-end deep learning (proprietary), streaming-native

ParametersN/A (cloud)

Languages36+

Self-hostable Yes

Streaming ? Yes

WER clean audio ?5.2%

GamiWays

Phase 1 MVP — ASR streaming

Primary candidate for Phase 1 MVP ASR. 75ms latency is critical for sub-2s pipeline. On-premise option aligns with Swiss sovereignty requirements. Audiogami (Gamilab) already in production — Deepgram as fallback/comparison.

Analysis

Deepgram Nova-3 is the industry reference for real-time voice agent ASR. 75ms P90 streaming latency, 36 languages, built-in VAD and endpointing. Voice Agent API at $4.50/hr provides a complete STT+LLM+TTS pipeline in a single WebSocket, with BYO LLM (GPT-4, Claude, Gemini) and function calling for RAG integration. No native voice cloning — Aura-2 TTS offers 36 preset voices; external TTS (ElevenLabs, Cartesia) can be integrated. On-premise deployment available for Enterprise (partial sovereignty). EU endpoint available. Used by Tavus, Simli, and most commercial avatar platforms.

Strengths

75ms P90 streaming latency
Voice Agent API $4.50/hr — BYO LLM + function calling
Built-in VAD + endpointing
36 languages
On-premise Enterprise option
EU endpoint available
LiveKit/Pipecat native integration
Speaker diarization + word timestamps

Weaknesses

Cloud-first (sovereignty limited — on-premise Enterprise only)
7.2% WER on English (not best-in-class)
French/German WER higher (~15%)
No native voice cloning — Aura-2 preset voices only
No native RAG — requires external function calling infrastructure
No open-weights

STT Capabilities

Streaming ? Yes

WebSocket streaming. 75ms P90 latency. Interim results with endpointing. Voice Activity Detection (VAD) built-in.

Diarization ? Yes

Custom Vocabulary Yes

Word Timestamps Yes

Auto Punctuation Yes

Multilingual Yes

36+ languages

Pricing

Price / minute

$0.0036

Price / hour

$0.216

Free tier

$200 credit on signup

$0.0036/min (Pay-as-you-go). $0.0024/min (Growth plan). On-premise: custom pricing.

Sovereignty & Compliance

On-premise Yes

Self-hosted Docker deployment. Enterprise on-premise available. Partial sovereignty.

GDPR ? Compliant

Data residency: US (default). EU data residency available.

On-premise Yes

On-premise deployment available (enterprise). Self-hosted via Docker.

Self-hosted Deployment

On-premise deployment available (enterprise). Self-hosted via Docker.

Strategic & Business Analysis

Deepgram Nova-3 — Strategic Positioning

Beyond technical specs: where does this tool sit in the ecosystem, what are the risks and strategic implications for GamiWays?

Deepgram Nova-3 is the enterprise STT leader with 54.2% lower WER on noisy audio — but its VPC-only stance (no full on-premise) limits sovereignty appeal for regulated European deployments.

Cloud + VPC

Lock-in risk:Medium

Sovereignty fit:Medium

Open-source threat:Medium

Pricing:Commoditizing ↓↓

A. Strategic Positioning

Target customer: Developer / Enterprise — voice agents, real-time transcription

Unified STT+TTS+LLM platform with 54.2% lower WER on noisy audio vs competitors — the voice AI infrastructure backbone.

B. Competitive Moat

54.2% lower WER on noisy audio vs competitors including hyperscalers
Unified STT+TTS+LLM API — reduces integration complexity and end-to-end latency
Series C $130M (Jan 2026) — $1.3B valuation — financial strength for R&D

Vulnerability: Open-source models (Whisper, Voxtral) catching up. No full on-premise STT option (VPC only). Pricing pressure from competitors.

E. Strategic Questions for GamiWays

Sovereignty fit

EU data residency via VPC available. No full on-premise STT. Moderate sovereignty fit — better than pure cloud, worse than self-hosted.

Build vs. Buy

Buy for Phase 1 (best accuracy, unified platform). Evaluate Whisper/Voxtral self-hosted for Phase 2 sovereignty.

Lock-in risk

Unified STT+TTS platform creates integration lock-in. VPC deployment and competitive pricing reduce dependency risk.

Roadmap alignment

Good for Phase 1 voice agents. Phase 2 sovereignty requires on-premise STT — consider Whisper or Voxtral self-hosted.

Back to Speech Recognition View in Benchmarks

Data Freshness

Updated 2 May 2026

Inworld benchmark 2026 + Koenecke et al.

Update note: Voice Agent API $4.50/hr confirmé (mai 2026). Nova-3 TTFA 75ms P90 confirmé. Pricing PAYG: $0.0036/min STT. BYO LLM disponible. On-premise Enterprise. EU endpoint disponible. Clonage vocal : NON natif — Aura-2 TTS 36 voix prédéfinies, intégration TTS externe possible (ElevenLabs, Cartesia). RAG : via function calling JSON Schema (pas de RAG natif).

Reference Sources

Deepgram Pricing (mai 2026)pricing Deepgram Voice Agent API Docsdocs Deepgram Nova-3 Modelsdocs Artificial Analysis STT Benchmarkbenchmark Inworld STT Benchmark 2026benchmark Deepgram Self-Hosted Docsdocs