Architecture & Tech Stack

4-layer architecture of the GamiWays Core engine — real stack extracted from the gami-digidouble-core repo. Every technology choice is justified by latency, sovereignty and maintainability constraints.

A
4-Layer Architecture
01
API LayerFastifyHTTP / WebSocket

Single entry point of the engine. Exposes REST and WebSocket endpoints. Fastify is chosen for its native performance (3× faster than Express), TypeScript-first plugin ecosystem, and native SSE streaming support — critical for real-time LLM responses.

02
Application LayerUse CasesOrchestration

Business use case layer: StartSession, SendMessage, ResumeSession, SwitchAvatar. Each use case orchestrates domain services without knowing infrastructure details. This separation guarantees testability and long-term maintainability.

03
Domain LayerBusiness LogicEngine Core

The engine core: Avatar (AI persona), Game Master (async director), Memory System v3 (3 layers), Context Manager (3-dimension assembly), Knowledge Pipeline (RAG). No infrastructure dependency — the domain is pure and portable.

04
Infrastructure LayerPostgreSQL · Redis · LangfusePersistence & Observability

Infrastructure adapters: PostgreSQL + pgvector for relational persistence and RAG embeddings, Redis for session cache and working memory, Langfuse self-hosted for LLM observability (traces, costs, latencies). All adapters are swappable via interfaces.

AvailableR&DMemoways InternalCritical bottleneckUSERVoice/TextAVAILABLE — GamilabSovereign ASR + STTSwiss-hosted · HITL optional~300ms targetR&DAXIS 1Memory3-layer arch.AXIS 2aExpressive TTSPersonalized prosodyAXIS 2b ⚠Avatar GenerationBehavioral fidelity⚠ <500ms targetAXIS 3OrchestrationDeterministic-organicArchitecture challengeINTERNAL — MemowaysNode EditorConversation graphConfigurable PlayerMode pédagogique / Mode narratifEXPERIENCE<2s targetTARGET LATENCY BUDGET<300msASR+STT<200msOrchestration<500msSLM+LLM<200msTTS<500msAvatar (R&D)<300msStreaming= <2stotal targetAll values are R&D targets — end-to-end benchmarks planned spring 2026
Click to expand
B
Memory System v3 — 3 Layers

Memory System v3 is one of the engine's core innovations. It solves the token explosion problem in long sessions by distributing memory across 3 levels with deterministic selection policies.

L1Working Memory
RedisPer conversation (ephemeral)

Active context window: recent messages, GM state, available tokens. Sub-ms access.

L2Episodic Persistence
PostgreSQLPer session (durable)

Conversation summaries, extracted facts, scenario progression. Hydrated at session start.

L3Long-term User Facts
PostgreSQL + pgvectorCross-sessions (permanent)

Persistent user profile: preferences, learning history, biographical facts. Semantic retrieval via pgvector.

C
Tech Stack — Choices & Rationale

Every technology choice is driven by concrete constraints: <2s latency, data sovereignty, no vendor lock-in, and long-term maintainability by a small team.

Runtime & Monorepo
Node.js LTS + TypeScript strictEnd-to-end strict typing, mature LLM ecosystem, native streaming
pnpm + TurborepoMulti-package monorepo with incremental build cache — essential for separating core, back-office and future SDKs
API
Fastify3× faster than Express, TypeScript-first plugins, native SSE for LLM streaming
WebSocket + SSE fallbackWebSocket for real-time streaming, SSE as CDN/proxy-compatible fallback
Persistence
PostgreSQL + pgvectorRelational DB for sessions/conversations + vector extension for RAG embeddings — single DB, no separate vector service
RedisSession cache and working memory — sub-millisecond access for hot conversation data
LLM
OpenAI / Anthropic / MistralInternal abstraction (ObservedLlmAdapter) — no vendor lock-in, switch with one config line
Langfuse (self-hosted)Full LLM observability: traces, per-request costs, latencies, quality evaluation — full data sovereignty
Deployment
Docker ComposeFull reproducible stack in one command — PostgreSQL, Redis, Langfuse, API core
Next.js (back-office)Administration and runtime inspection interface — separated from core to decouple deployment cycles
D
Latency Budget — Cognitive Thresholds

The <2s constraint structures every choice

Latency is not just a technical problem — it is an experience problem. Beyond 2 seconds, users lose their train of thought, the avatar stops being a presence. The goal: <2s end-to-end, first sound within 500ms. This is why Fastify, Redis and SSE streaming are non-negotiable.

ThresholdQualificationUX ImpactStatus
<500msPerceptive fluidityPerceptive fluidity threshold. User perceives slight delay but interaction remains natural. Target for TTS first audio.✓ Target
1sAcceptableConversational comfort threshold. Beyond this, users start anticipating the wait. Target for TTFB (first video frame).✓ Target
2sNatural limitConversational naturalness threshold (Nielsen 1993). Beyond this, conversation becomes a series of waits. GamiWays TTFR target.R&D Goal
6–12sEngagement breakCurrent prototype latency (HeyGem OS). User loses the thread, avatar stops being a presence. This is the problem to solve.Current problem