GamiWays Research

Architecture & Tech Stack

4-layer architecture of the GamiWays Core engine — real stack extracted from the gami-digidouble-core repo. Every technology choice is justified by latency, sovereignty and maintainability constraints.

4-Layer Architecture

API LayerFastifyHTTP / WebSocket

Single entry point of the engine. Exposes REST and WebSocket endpoints. Fastify is chosen for its native performance (3× faster than Express), TypeScript-first plugin ecosystem, and native SSE streaming support — critical for real-time LLM responses.

Application LayerUse CasesOrchestration

Business use case layer: StartSession, SendMessage, ResumeSession, SwitchAvatar. Each use case orchestrates domain services without knowing infrastructure details. This separation guarantees testability and long-term maintainability.

Domain LayerBusiness LogicEngine Core

The engine core: Avatar (AI persona), Game Master (async director), Memory System v3 (3 layers), Context Manager (3-dimension assembly), Knowledge Pipeline (RAG). No infrastructure dependency — the domain is pure and portable.

Infrastructure LayerPostgreSQL · Redis · LangfusePersistence & Observability

Infrastructure adapters: PostgreSQL + pgvector for relational persistence and RAG embeddings, Redis for session cache and working memory, Langfuse self-hosted for LLM observability (traces, costs, latencies). All adapters are swappable via interfaces.

Click to expand

Memory System v3 — 3 Layers

Memory System v3 is one of the engine's core innovations. It solves the token explosion problem in long sessions by distributing memory across 3 levels with deterministic selection policies.

L1Working Memory

RedisPer conversation (ephemeral)

Active context window: recent messages, GM state, available tokens. Sub-ms access.

L2Episodic Persistence

PostgreSQLPer session (durable)

Conversation summaries, extracted facts, scenario progression. Hydrated at session start.

L3Long-term User Facts

PostgreSQL + pgvectorCross-sessions (permanent)

Persistent user profile: preferences, learning history, biographical facts. Semantic retrieval via pgvector.

Tech Stack — Choices & Rationale

Every technology choice is driven by concrete constraints: <2s latency, data sovereignty, no vendor lock-in, and long-term maintainability by a small team.

Runtime & Monorepo

Node.js LTS + TypeScript strictEnd-to-end strict typing, mature LLM ecosystem, native streaming

pnpm + TurborepoMulti-package monorepo with incremental build cache — essential for separating core, back-office and future SDKs

API

Fastify3× faster than Express, TypeScript-first plugins, native SSE for LLM streaming

WebSocket + SSE fallbackWebSocket for real-time streaming, SSE as CDN/proxy-compatible fallback

Persistence

PostgreSQL + pgvectorRelational DB for sessions/conversations + vector extension for RAG embeddings — single DB, no separate vector service

RedisSession cache and working memory — sub-millisecond access for hot conversation data

LLM

OpenAI / Anthropic / MistralInternal abstraction (ObservedLlmAdapter) — no vendor lock-in, switch with one config line

Langfuse (self-hosted)Full LLM observability: traces, per-request costs, latencies, quality evaluation — full data sovereignty

Deployment

Docker ComposeFull reproducible stack in one command — PostgreSQL, Redis, Langfuse, API core

Next.js (back-office)Administration and runtime inspection interface — separated from core to decouple deployment cycles

Latency Budget — Cognitive Thresholds

The <2s constraint structures every choice

Latency is not just a technical problem — it is an experience problem. Beyond 2 seconds, users lose their train of thought, the avatar stops being a presence. The goal: <2s end-to-end, first sound within 500ms. This is why Fastify, Redis and SSE streaming are non-negotiable.

Threshold	Qualification	UX Impact	Status
<500ms	Perceptive fluidity	Perceptive fluidity threshold. User perceives slight delay but interaction remains natural. Target for TTS first audio.	✓ Target
1s	Acceptable	Conversational comfort threshold. Beyond this, users start anticipating the wait. Target for TTFB (first video frame).	✓ Target
2s	Natural limit	Conversational naturalness threshold (Nielsen 1993). Beyond this, conversation becomes a series of waits. GamiWays TTFR target.	R&D Goal
6–12s	Engagement break	Current prototype latency (HeyGem OS). User loses the thread, avatar stops being a presence. This is the problem to solve.	Current problem

← Technical Challenges Build Status →Pipeline Phase 1 →GitHub Repo →