Cloud APICommercial

Google Speech-to-Text v2

Chirp 3 GA (mai 2026) — 125 langues, WER 2.7%, MedASR, offline expansion

200ms
Latency (best case) ?
650ms
Latency (typical) ?
2.7%
WER (general audio) ?
$0.0060/min
Price per minute

Comparative Scores

Accuracy (WER)?8/10
Streaming latency?6/10
Multilingual10/10
Sovereignty?2/10
Price accessibility6/10
Streaming quality?7/10

Architecture

ArchitectureChirp 3 (USM — Universal Speech Model, 2B+ params, GA mai 2026)
Parameters2B+ (USM Chirp 3)
Languages125+
Self-hostable No
Streaming ? Yes
WER clean audio ?0.7000000000000002%
GamiWays
Multilingue — option secondaire

Useful for multilingual GamiWays deployments requiring 100+ language support. EU data residency partially addresses Swiss sovereignty. Not recommended for Phase 1 MVP due to higher latency than Deepgram.

Analysis

Google Speech-to-Text v2 avec Chirp 3 (GA mai 2026) : 2.7% WER EN (Artificial Analysis), 125+ langues, streaming gRPC bidirectionnel. MedASR open-source lancé fin 2025 (domaine médical). Expansion offline majeure (avril 2026). Pas de clonage vocal natif (via Google TTS séparé). RAG via tool calling + Vertex AI Search. Cloud uniquement, pas d'on-premise.

Strengths

  • 2.7% WER (Chirp 3, mai 2026) — meilleure précision cloud
  • 125+ langues — couverture la plus large
  • EU data residency available
  • gRPC streaming bidirectionnel
  • MedASR open-source (domaine médical)
  • Google ecosystem integration (Dialogflow, Vertex AI)

Weaknesses

  • ~650ms latence typique (3× Deepgram)
  • Cloud uniquement, pas de souveraineté
  • Tarification complexe (Chirp 3 = $0.016/min vs Chirp 2 = $0.006/min)
  • Pas de clonage vocal natif
  • Pas d'open-weights

STT Capabilities

Streaming ? Yes

Bidirectional streaming gRPC. 200ms typical latency. Interim results available.

Diarization ? Yes
Custom Vocabulary Yes
Word Timestamps Yes
Auto Punctuation Yes
Multilingual Yes

125+ languages

Pricing

Price / minute
$0.0060
Price / hour
$0.360
Free tier
60 minutes/month

$0.016/min (Chirp 3, 0–500k min). $0.006/min (Chirp 2). $0.004/min (standard). Free: 60 min/month.

Sovereignty & Compliance

On-premise No

GCP cloud only.

GDPR ? Compliant

Data residency: EU region available (Belgium, Netherlands).

On-premise No

Cloud only (GCP). No on-premise.

Strategic & Business Analysis

Google Speech-to-Text v2 — Strategic Positioning

Beyond technical specs: where does this tool sit in the ecosystem, what are the risks and strategic implications for GamiWays?

Google Chirp 2 offers top multilingual accuracy at global scale with extensive compliance certifications — but its cloud-only stance and deep Google Cloud lock-in make it a Phase 1 tool, not a Phase 2 sovereignty choice.

Cloud SaaS only
Lock-in risk:High
Sovereignty fit:Low
Open-source threat:High
Pricing:Falling ↓

A. Strategic Positioning

Target customer: Enterprise — multilingual, global scale, Google Cloud ecosystem

Chirp 2 model with top multilingual accuracy at global scale — deep Google Cloud integration for enterprise workflows.

B. Competitive Moat

  • Chirp 2 — top multilingual accuracy across 100+ languages at global scale
  • Deep Google Cloud ecosystem integration — Vertex AI, Gemini Enterprise
  • Extensive compliance: SOC2, HIPAA, GDPR, ISO 27001, FedRAMP

Vulnerability: Vendor lock-in risk with Google Cloud. Open-source models catching up. No on-premise option outside specific partnerships.

E. Strategic Questions for GamiWays

Sovereignty fit

EU continental boundary available but cloud-only. Google Cloud dependency creates sovereignty risk for Swiss/EU regulated deployments.

Build vs. Buy

Buy for Phase 1 multilingual requirements. For Phase 2 sovereignty, switch to Whisper/Voxtral self-hosted to eliminate Google dependency.

Lock-in risk

Deep Google Cloud ecosystem integration creates strong lock-in. Switching costs are high if Vertex AI or Gemini are also used.

Roadmap alignment

Good for Phase 1 multilingual transcription. Incompatible with Phase 2 sovereignty requirements without major architectural changes.

Data Freshness

Updated 3 May 2026

Artificial Analysis STT mai 2026 + Google Cloud docs

Update note: Chirp 3 GA mai 2026 : WER 2.7% EN (Artificial Analysis), 125+ langues. Prix $0.016/min (Chirp 3). MedASR open-source lancé fin 2025. Expansion offline majeure avril 2026. Clonage vocal : NON natif. RAG : via tool calling + Vertex AI Search.