Cloud APICommercial

Google Speech-to-Text v2

Chirp 3 GA (mai 2026) — 125 langues, WER 2.7%, MedASR, offline expansion

Website Docs

200ms

Latency (best case) ?

650ms

Latency (typical) ?

2.7%

WER (general audio) ?

$0.0060/min

Price per minute

Comparative Scores

Accuracy (WER)?8/10

Streaming latency?6/10

Multilingual10/10

Sovereignty?2/10

Price accessibility6/10

Streaming quality?7/10

Architecture

ArchitectureChirp 3 (USM — Universal Speech Model, 2B+ params, GA mai 2026)

Parameters2B+ (USM Chirp 3)

Languages125+

Self-hostable No

Streaming ? Yes

WER clean audio ?0.7000000000000002%

GamiWays

Multilingue — option secondaire

Useful for multilingual GamiWays deployments requiring 100+ language support. EU data residency partially addresses Swiss sovereignty. Not recommended for Phase 1 MVP due to higher latency than Deepgram.

Analysis

Google Speech-to-Text v2 avec Chirp 3 (GA mai 2026) : 2.7% WER EN (Artificial Analysis), 125+ langues, streaming gRPC bidirectionnel. MedASR open-source lancé fin 2025 (domaine médical). Expansion offline majeure (avril 2026). Pas de clonage vocal natif (via Google TTS séparé). RAG via tool calling + Vertex AI Search. Cloud uniquement, pas d'on-premise.

Strengths

2.7% WER (Chirp 3, mai 2026) — meilleure précision cloud
125+ langues — couverture la plus large
EU data residency available
gRPC streaming bidirectionnel
MedASR open-source (domaine médical)
Google ecosystem integration (Dialogflow, Vertex AI)

Weaknesses

~650ms latence typique (3× Deepgram)
Cloud uniquement, pas de souveraineté
Tarification complexe (Chirp 3 = $0.016/min vs Chirp 2 = $0.006/min)
Pas de clonage vocal natif
Pas d'open-weights

STT Capabilities

Streaming ? Yes

Bidirectional streaming gRPC. 200ms typical latency. Interim results available.

Diarization ? Yes

Custom Vocabulary Yes

Word Timestamps Yes

Auto Punctuation Yes

Multilingual Yes

125+ languages

Pricing

Price / minute

$0.0060

Price / hour

$0.360

Free tier

60 minutes/month

$0.016/min (Chirp 3, 0–500k min). $0.006/min (Chirp 2). $0.004/min (standard). Free: 60 min/month.

Sovereignty & Compliance

On-premise No

GCP cloud only.

GDPR ? Compliant

Data residency: EU region available (Belgium, Netherlands).

On-premise No

Cloud only (GCP). No on-premise.

Strategic & Business Analysis

Google Speech-to-Text v2 — Strategic Positioning

Beyond technical specs: where does this tool sit in the ecosystem, what are the risks and strategic implications for GamiWays?

Google Chirp 2 offers top multilingual accuracy at global scale with extensive compliance certifications — but its cloud-only stance and deep Google Cloud lock-in make it a Phase 1 tool, not a Phase 2 sovereignty choice.

Cloud SaaS only

Lock-in risk:High

Sovereignty fit:Low

Open-source threat:High

Pricing:Falling ↓

A. Strategic Positioning

Target customer: Enterprise — multilingual, global scale, Google Cloud ecosystem

Chirp 2 model with top multilingual accuracy at global scale — deep Google Cloud integration for enterprise workflows.

B. Competitive Moat

Chirp 2 — top multilingual accuracy across 100+ languages at global scale
Deep Google Cloud ecosystem integration — Vertex AI, Gemini Enterprise
Extensive compliance: SOC2, HIPAA, GDPR, ISO 27001, FedRAMP

Vulnerability: Vendor lock-in risk with Google Cloud. Open-source models catching up. No on-premise option outside specific partnerships.

E. Strategic Questions for GamiWays

Sovereignty fit

EU continental boundary available but cloud-only. Google Cloud dependency creates sovereignty risk for Swiss/EU regulated deployments.

Build vs. Buy

Buy for Phase 1 multilingual requirements. For Phase 2 sovereignty, switch to Whisper/Voxtral self-hosted to eliminate Google dependency.

Lock-in risk

Deep Google Cloud ecosystem integration creates strong lock-in. Switching costs are high if Vertex AI or Gemini are also used.

Roadmap alignment

Good for Phase 1 multilingual transcription. Incompatible with Phase 2 sovereignty requirements without major architectural changes.

Back to Speech Recognition View in Benchmarks

Data Freshness

Updated 3 May 2026

Artificial Analysis STT mai 2026 + Google Cloud docs

Update note: Chirp 3 GA mai 2026 : WER 2.7% EN (Artificial Analysis), 125+ langues. Prix $0.016/min (Chirp 3). MedASR open-source lancé fin 2025. Expansion offline majeure avril 2026. Clonage vocal : NON natif. RAG : via tool calling + Vertex AI Search.

Reference Sources

Google STT Pricingpricing Google Chirp 3 GA (mai 2026)news Google STT Docsdocs Artificial Analysis STT Benchmarksbenchmark Google STT Release Notesdocs