Architecture

Layered request flow, agent runtime, retrieval backends, and shared infrastructure

Live in production
Multi-tenant
Multi-cloud
One streaming endpoint. Layered enforcement. Pluggable retrieval.
Every request to /v1/run passes through five enforcement layers before any retrieval or generation work happens. The agent runtime then routes retrieval across pgvector, Bedrock KB, or Vertex AI Search — picked per-corpus from a single config field. Stateful infrastructure (Postgres, Redis, ARQ workers) is shared across the platform with envelope-encrypted secrets and a tamper-evident audit log.
Request authorization pipeline
Six-step decision order for every document action. Order is non-negotiable: tenant boundary first, default-deny last.
  1. 1

    Tenant boundary

    Non-bypassable tenant scoping. The principal's tenant_id is checked against every resource's tenant_id.

    TENANT_MISMATCH — request rejected before any policy runs.

  2. 2

    Kill switches / maintenance gates

    Per-feature kill switches and maintenance windows take precedence over RBAC and ABAC.

    FEATURE_DISABLED or MAINTENANCE_MODE — short-circuit denial.

  3. 3

    RBAC role gate

    Endpoint-level role check. The role matrix in README pins minimum role per route.

    ROLE_INSUFFICIENT — denied with the required role surfaced.

  4. 4

    Document ACL evaluation

    Explicit grants (read / write / admin) for the principal. Expired grants are ignored. Owner = automatic admin.

    ACL_DENIED — no matching grant or only expired grants found.

  5. 5

    ABAC policy evaluation

    Deny-first then allow-then. Higher priority wins. Conditions evaluated against principal, resource labels, and request context.

    POLICY_DENIED — surfaced with the matching policy name.

  6. 6

    Default deny

    If no allow policy matched, the configurable default is applied (default deny in production).

    DEFAULT_DENY — terminal step; flag controls behavior.

Edge & Auth
Every request passes through a layered gate before any retrieval or generation work happens.

Rate limit

Redis · token-bucket

Per-key + per-tenant dual enforcement with stable 429 schema, retry hints, and audit events for throttling and degraded mode.

  • Flag: RATE_LIMIT_ENABLED
  • Per route-class policies
  • Stable 429 schema with throttling headers

Auth

API key + SSO/SCIM

Hashed API key storage; OIDC SSO with PKCE + state/nonce replay protection; SCIM 2.0 token-authenticated provisioning.

  • Flag: AUTH_ENABLED, SSO_ENABLED, SCIM_ENABLED
  • Optional dev bypass via AUTH_DEV_BYPASS
  • Inactive key denial: AUTH_INACTIVE_KEY

RBAC

reader / editor / admin

Role gate on every protected endpoint. Admin role does not bypass document ACLs unless AUTHZ_ADMIN_BYPASS_DOCUMENT_ACL=true.

  • Tenant binding from authenticated principal
  • Role matrix in README (per endpoint)

ABAC

policy engine

Priority-aware deny-first / allow-then policies with simulation API and DSL conditions (eq, time_between, var).

  • Flag: AUTHZ_ABAC_ENABLED
  • Wildcards gated by AUTHZ_ALLOW_WILDCARDS
  • Simulation endpoint: POST /v1/admin/authz/policies/{id}/simulate

Doc ACL

creator-owner default

Per-document grants (read / write / admin) with expiring grants ignored. Default-deny posture configurable.

  • Flag: AUTHZ_DEFAULT_DENY
  • Document creators receive owner ACL on create
  • Grant API: POST /v1/admin/authz/documents/{id}/permissions
Agent Runtime
Stateful LangGraph agent serving the public /v1/run endpoint with progressive SSE streaming.

LangGraph agent

/v1/run · SSE

State graph orchestrates retrieval → generation → optional TTS. Request_id tracing on every event; reconnect-safe sequence numbers.

  • Streaming progressive tokens with minimal buffering
  • Quality guardrails emit SSE events
  • Optional /audio.ready events when TTS_PROVIDER=openai

Retrieval router

per-corpus provider

Each corpus picks its retrieval provider in corpora.provider_config_json. Empty config normalizes to local pgvector.

  • Provider: local_pgvector | aws_bedrock_kb | gcp_vertex
  • Default top_k_default: 5
  • Cloud creds at runtime; tests run mock-only

Generator

Gemini · Vertex AI

Pluggable LLM provider with deterministic fake-LLM path for tests. Timeout + cancellation handling for streaming paths.

  • Flag: LLM_PROVIDER
  • Deterministic fake provider for hermetic tests
Retrieval Backends
Three production backends, each isolated behind the router with mock-tested adapters.

PostgreSQL + pgvector

local default

Cosine retrieval with similarity scoring, deterministic ordering, embedding-dimension invariants, race-safe upserts.

  • Alembic migrations enforce pgvector extension
  • Top-k default: 5

AWS Bedrock KB

managed

Knowledge Base id + region selectable per corpus. Bedrock adapter mock-tested in CI without live creds.

  • Provider: aws_bedrock_kb
  • Per-corpus knowledge_base_id + region

GCP Vertex AI Search

Discovery Engine

Discovery Engine datastores wired through the same router. Vertex adapter mock-tested with config-error mapping.

  • Provider: gcp_vertex
  • Per-corpus project + location + resource_id
Infrastructure
Stateful and durable building blocks shared across the platform.

PostgreSQL

system of record

Tenants, corpora, documents, chunks, audit log, compliance evidence, encrypted blobs, retention runs — all here. Alembic-migrated.

  • audit_events table for tamper-evident audit
  • var/evidence persisted artifact paths
  • Tenant guard + RLS posture checks

Redis

rate limits · idempotency · ARQ

Token-bucket rate limit state, idempotency conflict detection, and ARQ job queues for async ingestion + notifications.

  • Idempotency-Key contract on writes
  • ARQ workers: ingestion + notification delivery

ARQ Workers

ingestion · notifications · evaluator

Three durable worker pools — ingestion (chunk + embed + persist), notification delivery, operability evaluator.

  • Notification jobs/attempts as durable source of truth
  • Evaluator with distributed locking + heartbeats

KMS / Keyring

envelope encryption

Tenant key registry + encrypted blob store. Pluggable KMS providers; resumable re-encryption jobs with telemetry.

  • Flag: CRYPTO_ENABLED, CRYPTO_PROVIDER
  • Admin: /v1/admin/keyring + /v1/admin/keys
  • KEYRING_MASTER_KEY_REQUIRED for required-only mode