TTS-2 has been in research preview since May 5, 2026. TTS-1.5 Max and Mini remain available for production use. TTS-2 features (Voice Direction, Conversational Awareness) may evolve before general availability (GA).
Inworld TTS-2 + Realtime API
#1 Artificial Analysis Arena — TTS-2 (research preview) : Voice Direction, Conversational Awareness, 100+ langues
Comparative Scores
Architecture
Top candidate for GamiWays pipeline. TTS-2 Conversational Awareness directly addresses the challenge of emotionally coherent multi-turn avatars. Voice Direction enables per-scene delivery control without re-recording. 100+ crosslingual support covers all target markets. Viseme timestamps directly usable for avatar lip-sync. On-premise on Enterprise aligns with sovereignty requirements.
Analysis
Inworld TTS-2 (research preview, May 2026) is a new generation voice model built for realtime conversation. TTS 1.5 already ranks #1 on Artificial Analysis Speech Arena (ahead of Google and ElevenLabs). TTS-2 adds four capabilities: Voice Direction (natural language delivery instructions), Conversational Awareness (model hears prior audio turns), Crosslingual (one voice identity across 100+ languages, mid-utterance switching), and Advanced Voice Design (voice from prose description). Pricing: $0.035/min On-Demand (same as TTS 1.5 Max). Full platform: TTS + STT (inworld-stt-1) + Realtime API (WebSocket S2S, tool calling) + LLM Router (220+ models). On-premise on Enterprise. Integrations: Cloudflare, DeepInfra, LiveKit, Stream, VoiceRun.
Strengths
- TTS 1.5 #1 Artificial Analysis Speech Arena (devant Google, ElevenLabs)
- TTS-2 : Voice Direction — instructions de livraison en langage naturel
- TTS-2 : Conversational Awareness — entend les tours audio précédents
- TTS-2 : 100+ langues, switch mid-phrase, une seule identité vocale
- TTS-2 : Advanced Voice Design — voix depuis description en prose
- Plateforme complète : TTS + STT + Realtime S2S + LLM Router (220+ modèles)
- Sub-200ms median TTFA (TTS seul)
- Viseme timestamps pour avatar lip-sync
- On-premise sur Enterprise (GDPR, HIPAA, ZDR, SOC2 Type II)
Weaknesses
- TTS-2 en research preview (pas encore GA)
- On-premise uniquement sur Enterprise (tarif custom)
- Prix On-Demand TTS-2 = $0.035/min (3.5× plus cher que TTS-1.5 Mini)
- LLM Router model list changes frequently
Voice Capabilities
Instant voice cloning (free). Professional voice cloning (Growth/Enterprise add-on). Advanced Voice Design: create a voice from a prose description — no reference audio needed. Up to 5 custom voices (On-Demand), 100 (Creator), 1,000 (Developer), 3,000 (Growth). ElevenLabs migration tool available.
TTS-2 Voice Direction: natural language delivery instructions inline (e.g. [speak sadly, as if something bad just happened]). Inline non-verbals: [laugh], [sigh], [breathe], [clear_throat], [cough]. Conversational Awareness: model hears prior audio turns — tone, pacing, emotional state carry forward automatically. Word/char/phoneme/viseme timestamps.
Streaming-native via WebSocket and REST. Sub-200ms median TTFA for TTS alone. Realtime API: full-duplex WebSocket speech-to-speech with turn detection, tool calling mid-session, provider-agnostic LLM routing (220+ models). Backchannel fillers stream in parallel during LLM reasoning. Partial responses reach user before full sentence is composed.
Word, character, phoneme, and viseme-level timestamps. Unity/Unreal SDKs with lipsync templates.
Pricing
TTS-2 & TTS-1.5 Max: $35/1M chars ($0.035/min) On-Demand. TTS-1.5 Mini: $25/1M chars ($0.025/min). Developer plan: TTS-2 at $30/1M. Growth plan: TTS-2 at $25/1M. Enterprise: as low as $10/1M (TTS-2 & Max), $5/1M (Mini). Free tier: up to 40 min TTS included.
| Plan | TTS-2 & 1.5 Max | TTS-1.5 Mini | Included |
|---|---|---|---|
| On-Demand (free) | $35/1M | $25/1M | 40 min TTS |
| Developer | $30/1M | $20/1M | $20/mo |
| Creator | $30/1M | $20/1M | $50/mo |
| Growth | $25/1M | $15/1M | $200/mo |
| Enterprise | from $10/1M | from $5/1M | On-premise, ZDR, BAA |
On-Demand rates in $/1M characters. Monthly plans include credits. Enterprise: custom pricing, on-premise, ZDR, BAA, EU/India data residency.
Sovereignty & Compliance
On-premise deployment available on Enterprise plan. EU + India data residency. SOC2 Type II, GDPR, HIPAA, ZDR, BAA compliant.
Data residency: US, EU, India
Certifications & Compliance
HIPAA, ZDR (Zero Data Retention) and BAA available on Growth and Enterprise plans. Enterprise on-premise: EU + India data residency.
Inworld TTS-2 + Realtime API — Strategic Positioning
Beyond technical specs: where does this tool sit in the ecosystem, what are the risks and strategic implications for GamiWays?
Inworld TTS-2 (research preview, May 2026): rebuilt for realtime conversation. Voice Direction lets you steer delivery like a director. Conversational Awareness makes the voice emotionally coherent across turns. 100+ languages, one voice identity. #1 on Artificial Analysis Speech Arena.
A. Strategic Positioning
Target customer: Enterprise / Developer — regulated industries, gaming, real-time agents
TTS-2 (research preview, May 2026): ELO #1 quality rebuilt for realtime conversation. Voice Direction, Conversational Awareness, 100+ languages, Advanced Voice Design. On-premise on Enterprise.
B. Competitive Moat
- TTS 1.5 #1 Artificial Analysis Speech Arena (ahead of Google and ElevenLabs)
- TTS-2 : Voice Direction — natural language delivery instructions inline (unique in market)
- TTS-2 : Conversational Awareness — model hears prior audio turns, tone and pacing carry forward
- TTS-2 : 100+ languages, one voice identity, mid-utterance language switching
- Full platform: TTS + STT + Realtime API (WebSocket S2S) + LLM Router (220+ models)
Vulnerability: TTS-2 still in research preview (not GA). On-premise restricted to Enterprise (custom pricing). Open-source models (Chatterbox, Kokoro) closing the quality gap.
E. Strategic Questions for GamiWays
Sovereignty fit
On-premise restricted to Enterprise plan (custom pricing). EU + India data residency available. ZDR, HIPAA, BAA on Growth/Enterprise. Strong compliance posture but on-premise not accessible on standard plans.
Build vs. Buy
Strong buy for Phase 1 MVP (best quality, Voice Direction for avatar delivery control, full pipeline). For Phase 2 sovereignty, requires Enterprise plan for on-premise.
Lock-in risk
Proprietary models and full-stack platform create some lock-in. Drop-in OpenAI Realtime API compatibility reduces switching cost. Competitive pricing reduces dependency risk.
Roadmap alignment
Excellent alignment for GamiWays: TTS-2 Conversational Awareness addresses multi-turn avatar coherence. Voice Direction enables per-scene delivery without re-recording. 100+ languages covers all target markets.
Data Freshness
Artificial Analysis Speech Arena (TTS 1.5 #1, May 2026). TTS-2 research preview announced May 5, 2026.
Update note: TTS-2 research preview launched May 5, 2026. 4 new capabilities: Voice Direction, Conversational Awareness, Crosslingual (100+ langs), Advanced Voice Design. Pricing: $0.035/min On-Demand (same as TTS 1.5 Max). TTS 1.5 Max/Mini still available. On-premise Enterprise only.