Research Preview

TTS-2 has been in research preview since May 5, 2026. TTS-1.5 Max and Mini remain available for production use. TTS-2 features (Voice Direction, Conversational Awareness) may evolve before general availability (GA).

Cloud API#1 Artificial AnalysisCommercial (training framework open-sourced)

Inworld TTS-2 + Realtime API

#1 Artificial Analysis Arena — TTS-2 (research preview) : Voice Direction, Conversational Awareness, 100+ langues

130ms
TTFA (best case) ?
250ms
TTFA (typical) ?
$35/1M
Price per million chars
1160
ELO Score ?

Comparative Scores

Voice quality?10/10
Latency?8/10
Voice cloning?9/10
Expressiveness?10/10
Sovereignty?6/10
Price accessibility7/10
Multilingual10/10

Architecture

ArchitectureSpeechLM (streaming-native, quantization-aware) — TTS-2 rebuilt for realtime conversation
ParametersN/A (cloud)
Languages100
Self-hostable Yes
Streaming Yes
GamiWays
Phase 1 MVP — Qualité + Pipeline conversationnel complet + 100+ langues

Top candidate for GamiWays pipeline. TTS-2 Conversational Awareness directly addresses the challenge of emotionally coherent multi-turn avatars. Voice Direction enables per-scene delivery control without re-recording. 100+ crosslingual support covers all target markets. Viseme timestamps directly usable for avatar lip-sync. On-premise on Enterprise aligns with sovereignty requirements.

Analysis

Inworld TTS-2 (research preview, May 2026) is a new generation voice model built for realtime conversation. TTS 1.5 already ranks #1 on Artificial Analysis Speech Arena (ahead of Google and ElevenLabs). TTS-2 adds four capabilities: Voice Direction (natural language delivery instructions), Conversational Awareness (model hears prior audio turns), Crosslingual (one voice identity across 100+ languages, mid-utterance switching), and Advanced Voice Design (voice from prose description). Pricing: $0.035/min On-Demand (same as TTS 1.5 Max). Full platform: TTS + STT (inworld-stt-1) + Realtime API (WebSocket S2S, tool calling) + LLM Router (220+ models). On-premise on Enterprise. Integrations: Cloudflare, DeepInfra, LiveKit, Stream, VoiceRun.

Strengths

  • TTS 1.5 #1 Artificial Analysis Speech Arena (devant Google, ElevenLabs)
  • TTS-2 : Voice Direction — instructions de livraison en langage naturel
  • TTS-2 : Conversational Awareness — entend les tours audio précédents
  • TTS-2 : 100+ langues, switch mid-phrase, une seule identité vocale
  • TTS-2 : Advanced Voice Design — voix depuis description en prose
  • Plateforme complète : TTS + STT + Realtime S2S + LLM Router (220+ modèles)
  • Sub-200ms median TTFA (TTS seul)
  • Viseme timestamps pour avatar lip-sync
  • On-premise sur Enterprise (GDPR, HIPAA, ZDR, SOC2 Type II)

Weaknesses

  • TTS-2 en research preview (pas encore GA)
  • On-premise uniquement sur Enterprise (tarif custom)
  • Prix On-Demand TTS-2 = $0.035/min (3.5× plus cher que TTS-1.5 Mini)
  • LLM Router model list changes frequently

Voice Capabilities

Voice Cloning ? Yes

Instant voice cloning (free). Professional voice cloning (Growth/Enterprise add-on). Advanced Voice Design: create a voice from a prose description — no reference audio needed. Up to 5 custom voices (On-Demand), 100 (Creator), 1,000 (Developer), 3,000 (Growth). ElevenLabs migration tool available.

Emotion Control Yes

TTS-2 Voice Direction: natural language delivery instructions inline (e.g. [speak sadly, as if something bad just happened]). Inline non-verbals: [laugh], [sigh], [breathe], [clear_throat], [cough]. Conversational Awareness: model hears prior audio turns — tone, pacing, emotional state carry forward automatically. Word/char/phoneme/viseme timestamps.

Streaming ? Yes

Streaming-native via WebSocket and REST. Sub-200ms median TTFA for TTS alone. Realtime API: full-duplex WebSocket speech-to-speech with turn detection, tool calling mid-session, provider-agnostic LLM routing (220+ models). Backchannel fillers stream in parallel during LLM reasoning. Partial responses reach user before full sentence is composed.

Lip-sync Data ? Yes

Word, character, phoneme, and viseme-level timestamps. Unity/Unreal SDKs with lipsync templates.

Pricing

Price / 1M chars
$35
Price / minute
$0.0350
Free tier
Up to 40 min TTS included on On-Demand (free)

TTS-2 & TTS-1.5 Max: $35/1M chars ($0.035/min) On-Demand. TTS-1.5 Mini: $25/1M chars ($0.025/min). Developer plan: TTS-2 at $30/1M. Growth plan: TTS-2 at $25/1M. Enterprise: as low as $10/1M (TTS-2 & Max), $5/1M (Mini). Free tier: up to 40 min TTS included.

PlanTTS-2 & 1.5 MaxTTS-1.5 MiniIncluded
On-Demand (free)$35/1M$25/1M40 min TTS
Developer$30/1M$20/1M$20/mo
Creator$30/1M$20/1M$50/mo
Growth$25/1M$15/1M$200/mo
Enterprisefrom $10/1Mfrom $5/1MOn-premise, ZDR, BAA

On-Demand rates in $/1M characters. Monthly plans include credits. Enterprise: custom pricing, on-premise, ZDR, BAA, EU/India data residency.

Sovereignty & Compliance

On-premise Yes

On-premise deployment available on Enterprise plan. EU + India data residency. SOC2 Type II, GDPR, HIPAA, ZDR, BAA compliant.

GDPR ? Compliant

Data residency: US, EU, India

Certifications & Compliance

SOC 2 Type II GDPR HIPAA ZDR BAA

HIPAA, ZDR (Zero Data Retention) and BAA available on Growth and Enterprise plans. Enterprise on-premise: EU + India data residency.

Strategic & Business Analysis

Inworld TTS-2 + Realtime API — Strategic Positioning

Beyond technical specs: where does this tool sit in the ecosystem, what are the risks and strategic implications for GamiWays?

Inworld TTS-2 (research preview, May 2026): rebuilt for realtime conversation. Voice Direction lets you steer delivery like a director. Conversational Awareness makes the voice emotionally coherent across turns. 100+ languages, one voice identity. #1 on Artificial Analysis Speech Arena.

Cloud + On-premise
Lock-in risk:Medium
Sovereignty fit:Medium
Open-source threat:Medium
Pricing:Stable →

A. Strategic Positioning

Target customer: Enterprise / Developer — regulated industries, gaming, real-time agents

TTS-2 (research preview, May 2026): ELO #1 quality rebuilt for realtime conversation. Voice Direction, Conversational Awareness, 100+ languages, Advanced Voice Design. On-premise on Enterprise.

B. Competitive Moat

  • TTS 1.5 #1 Artificial Analysis Speech Arena (ahead of Google and ElevenLabs)
  • TTS-2 : Voice Direction — natural language delivery instructions inline (unique in market)
  • TTS-2 : Conversational Awareness — model hears prior audio turns, tone and pacing carry forward
  • TTS-2 : 100+ languages, one voice identity, mid-utterance language switching
  • Full platform: TTS + STT + Realtime API (WebSocket S2S) + LLM Router (220+ models)

Vulnerability: TTS-2 still in research preview (not GA). On-premise restricted to Enterprise (custom pricing). Open-source models (Chatterbox, Kokoro) closing the quality gap.

E. Strategic Questions for GamiWays

Sovereignty fit

On-premise restricted to Enterprise plan (custom pricing). EU + India data residency available. ZDR, HIPAA, BAA on Growth/Enterprise. Strong compliance posture but on-premise not accessible on standard plans.

Build vs. Buy

Strong buy for Phase 1 MVP (best quality, Voice Direction for avatar delivery control, full pipeline). For Phase 2 sovereignty, requires Enterprise plan for on-premise.

Lock-in risk

Proprietary models and full-stack platform create some lock-in. Drop-in OpenAI Realtime API compatibility reduces switching cost. Competitive pricing reduces dependency risk.

Roadmap alignment

Excellent alignment for GamiWays: TTS-2 Conversational Awareness addresses multi-turn avatar coherence. Voice Direction enables per-scene delivery without re-recording. 100+ languages covers all target markets.

Data Freshness

Updated 6 May 2026

Artificial Analysis Speech Arena (TTS 1.5 #1, May 2026). TTS-2 research preview announced May 5, 2026.

Update note: TTS-2 research preview launched May 5, 2026. 4 new capabilities: Voice Direction, Conversational Awareness, Crosslingual (100+ langs), Advanced Voice Design. Pricing: $0.035/min On-Demand (same as TTS 1.5 Max). TTS 1.5 Max/Mini still available. On-premise Enterprise only.