Architecture

Layered request flow, agent runtime, retrieval backends, and shared infrastructure

Connecting

Live in production

Multi-tenant

Multi-cloud

One streaming endpoint. Layered enforcement. Pluggable retrieval.

Every request to /v1/run passes through five enforcement layers before any retrieval or generation work happens. The agent runtime then routes retrieval across pgvector, Bedrock KB, or Vertex AI Search — picked per-corpus from a single config field. Stateful infrastructure (Postgres, Redis, ARQ workers) is shared across the platform with envelope-encrypted secrets and a tamper-evident audit log.

Request authorization pipeline

Six-step decision order for every document action. Order is non-negotiable: tenant boundary first, default-deny last.

1
Tenant boundary
Non-bypassable tenant scoping. The principal's tenant_id is checked against every resource's tenant_id.
↳ TENANT_MISMATCH — request rejected before any policy runs.
2
Kill switches / maintenance gates
Per-feature kill switches and maintenance windows take precedence over RBAC and ABAC.
↳ FEATURE_DISABLED or MAINTENANCE_MODE — short-circuit denial.
3
RBAC role gate
Endpoint-level role check. The role matrix in README pins minimum role per route.
↳ ROLE_INSUFFICIENT — denied with the required role surfaced.
4
Document ACL evaluation
Explicit grants (read / write / admin) for the principal. Expired grants are ignored. Owner = automatic admin.
↳ ACL_DENIED — no matching grant or only expired grants found.
5
ABAC policy evaluation
Deny-first then allow-then. Higher priority wins. Conditions evaluated against principal, resource labels, and request context.
↳ POLICY_DENIED — surfaced with the matching policy name.
6
Default deny
If no allow policy matched, the configurable default is applied (default deny in production).
↳ DEFAULT_DENY — terminal step; flag controls behavior.

Edge & Auth

Every request passes through a layered gate before any retrieval or generation work happens.

Rate limit

Redis · token-bucket

Per-key + per-tenant dual enforcement with stable 429 schema, retry hints, and audit events for throttling and degraded mode.

Flag: RATE_LIMIT_ENABLED
Per route-class policies
Stable 429 schema with throttling headers

Auth

API key + SSO/SCIM

Hashed API key storage; OIDC SSO with PKCE + state/nonce replay protection; SCIM 2.0 token-authenticated provisioning.

Flag: AUTH_ENABLED, SSO_ENABLED, SCIM_ENABLED
Optional dev bypass via AUTH_DEV_BYPASS
Inactive key denial: AUTH_INACTIVE_KEY

RBAC

reader / editor / admin

Role gate on every protected endpoint. Admin role does not bypass document ACLs unless AUTHZ_ADMIN_BYPASS_DOCUMENT_ACL=true.

Tenant binding from authenticated principal
Role matrix in README (per endpoint)

ABAC

policy engine

Priority-aware deny-first / allow-then policies with simulation API and DSL conditions (eq, time_between, var).

Flag: AUTHZ_ABAC_ENABLED
Wildcards gated by AUTHZ_ALLOW_WILDCARDS
Simulation endpoint: POST /v1/admin/authz/policies/{id}/simulate

Doc ACL

creator-owner default

Per-document grants (read / write / admin) with expiring grants ignored. Default-deny posture configurable.

Flag: AUTHZ_DEFAULT_DENY
Document creators receive owner ACL on create
Grant API: POST /v1/admin/authz/documents/{id}/permissions

Agent Runtime

Stateful LangGraph agent serving the public /v1/run endpoint with progressive SSE streaming.

LangGraph agent

/v1/run · SSE

State graph orchestrates retrieval → generation → optional TTS. Request_id tracing on every event; reconnect-safe sequence numbers.

Streaming progressive tokens with minimal buffering
Quality guardrails emit SSE events
Optional /audio.ready events when TTS_PROVIDER=openai

Retrieval router

per-corpus provider

Each corpus picks its retrieval provider in corpora.provider_config_json. Empty config normalizes to local pgvector.

Provider: local_pgvector | aws_bedrock_kb | gcp_vertex
Default top_k_default: 5
Cloud creds at runtime; tests run mock-only

Generator

Gemini · Vertex AI

Pluggable LLM provider with deterministic fake-LLM path for tests. Timeout + cancellation handling for streaming paths.

Flag: LLM_PROVIDER
Deterministic fake provider for hermetic tests

Retrieval Backends

Three production backends, each isolated behind the router with mock-tested adapters.

PostgreSQL + pgvector

local default

Cosine retrieval with similarity scoring, deterministic ordering, embedding-dimension invariants, race-safe upserts.

Alembic migrations enforce pgvector extension
Top-k default: 5

AWS Bedrock KB

managed

Knowledge Base id + region selectable per corpus. Bedrock adapter mock-tested in CI without live creds.

Provider: aws_bedrock_kb
Per-corpus knowledge_base_id + region

GCP Vertex AI Search

Discovery Engine

Discovery Engine datastores wired through the same router. Vertex adapter mock-tested with config-error mapping.

Provider: gcp_vertex
Per-corpus project + location + resource_id

Infrastructure

Stateful and durable building blocks shared across the platform.

PostgreSQL

system of record

Tenants, corpora, documents, chunks, audit log, compliance evidence, encrypted blobs, retention runs — all here. Alembic-migrated.

audit_events table for tamper-evident audit
var/evidence persisted artifact paths
Tenant guard + RLS posture checks

Redis

rate limits · idempotency · ARQ

Token-bucket rate limit state, idempotency conflict detection, and ARQ job queues for async ingestion + notifications.

Idempotency-Key contract on writes
ARQ workers: ingestion + notification delivery

ARQ Workers

ingestion · notifications · evaluator

Three durable worker pools — ingestion (chunk + embed + persist), notification delivery, operability evaluator.

Notification jobs/attempts as durable source of truth
Evaluator with distributed locking + heartbeats

KMS / Keyring

envelope encryption

Tenant key registry + encrypted blob store. Pluggable KMS providers; resumable re-encryption jobs with telemetry.

Flag: CRYPTO_ENABLED, CRYPTO_PROVIDER
Admin: /v1/admin/keyring + /v1/admin/keys
KEYRING_MASTER_KEY_REQUIRED for required-only mode