Architecture
Layered request flow, agent runtime, retrieval backends, and shared infrastructure
/v1/run passes through five enforcement layers before any retrieval or generation work happens. The agent runtime then routes retrieval across pgvector, Bedrock KB, or Vertex AI Search — picked per-corpus from a single config field. Stateful infrastructure (Postgres, Redis, ARQ workers) is shared across the platform with envelope-encrypted secrets and a tamper-evident audit log.- 1
Tenant boundary
Non-bypassable tenant scoping. The principal's tenant_id is checked against every resource's tenant_id.
↳ TENANT_MISMATCH — request rejected before any policy runs.
- 2
Kill switches / maintenance gates
Per-feature kill switches and maintenance windows take precedence over RBAC and ABAC.
↳ FEATURE_DISABLED or MAINTENANCE_MODE — short-circuit denial.
- 3
RBAC role gate
Endpoint-level role check. The role matrix in README pins minimum role per route.
↳ ROLE_INSUFFICIENT — denied with the required role surfaced.
- 4
Document ACL evaluation
Explicit grants (read / write / admin) for the principal. Expired grants are ignored. Owner = automatic admin.
↳ ACL_DENIED — no matching grant or only expired grants found.
- 5
ABAC policy evaluation
Deny-first then allow-then. Higher priority wins. Conditions evaluated against principal, resource labels, and request context.
↳ POLICY_DENIED — surfaced with the matching policy name.
- 6
Default deny
If no allow policy matched, the configurable default is applied (default deny in production).
↳ DEFAULT_DENY — terminal step; flag controls behavior.
Rate limit
Redis · token-bucketPer-key + per-tenant dual enforcement with stable 429 schema, retry hints, and audit events for throttling and degraded mode.
- Flag: RATE_LIMIT_ENABLED
- Per route-class policies
- Stable 429 schema with throttling headers
Auth
API key + SSO/SCIMHashed API key storage; OIDC SSO with PKCE + state/nonce replay protection; SCIM 2.0 token-authenticated provisioning.
- Flag: AUTH_ENABLED, SSO_ENABLED, SCIM_ENABLED
- Optional dev bypass via AUTH_DEV_BYPASS
- Inactive key denial: AUTH_INACTIVE_KEY
RBAC
reader / editor / adminRole gate on every protected endpoint. Admin role does not bypass document ACLs unless AUTHZ_ADMIN_BYPASS_DOCUMENT_ACL=true.
- Tenant binding from authenticated principal
- Role matrix in README (per endpoint)
ABAC
policy enginePriority-aware deny-first / allow-then policies with simulation API and DSL conditions (eq, time_between, var).
- Flag: AUTHZ_ABAC_ENABLED
- Wildcards gated by AUTHZ_ALLOW_WILDCARDS
- Simulation endpoint: POST /v1/admin/authz/policies/{id}/simulate
Doc ACL
creator-owner defaultPer-document grants (read / write / admin) with expiring grants ignored. Default-deny posture configurable.
- Flag: AUTHZ_DEFAULT_DENY
- Document creators receive owner ACL on create
- Grant API: POST /v1/admin/authz/documents/{id}/permissions
LangGraph agent
/v1/run · SSEState graph orchestrates retrieval → generation → optional TTS. Request_id tracing on every event; reconnect-safe sequence numbers.
- Streaming progressive tokens with minimal buffering
- Quality guardrails emit SSE events
- Optional /audio.ready events when TTS_PROVIDER=openai
Retrieval router
per-corpus providerEach corpus picks its retrieval provider in corpora.provider_config_json. Empty config normalizes to local pgvector.
- Provider: local_pgvector | aws_bedrock_kb | gcp_vertex
- Default top_k_default: 5
- Cloud creds at runtime; tests run mock-only
Generator
Gemini · Vertex AIPluggable LLM provider with deterministic fake-LLM path for tests. Timeout + cancellation handling for streaming paths.
- Flag: LLM_PROVIDER
- Deterministic fake provider for hermetic tests
PostgreSQL + pgvector
local defaultCosine retrieval with similarity scoring, deterministic ordering, embedding-dimension invariants, race-safe upserts.
- Alembic migrations enforce pgvector extension
- Top-k default: 5
AWS Bedrock KB
managedKnowledge Base id + region selectable per corpus. Bedrock adapter mock-tested in CI without live creds.
- Provider: aws_bedrock_kb
- Per-corpus knowledge_base_id + region
GCP Vertex AI Search
Discovery EngineDiscovery Engine datastores wired through the same router. Vertex adapter mock-tested with config-error mapping.
- Provider: gcp_vertex
- Per-corpus project + location + resource_id
PostgreSQL
system of recordTenants, corpora, documents, chunks, audit log, compliance evidence, encrypted blobs, retention runs — all here. Alembic-migrated.
- audit_events table for tamper-evident audit
- var/evidence persisted artifact paths
- Tenant guard + RLS posture checks
Redis
rate limits · idempotency · ARQToken-bucket rate limit state, idempotency conflict detection, and ARQ job queues for async ingestion + notifications.
- Idempotency-Key contract on writes
- ARQ workers: ingestion + notification delivery
ARQ Workers
ingestion · notifications · evaluatorThree durable worker pools — ingestion (chunk + embed + persist), notification delivery, operability evaluator.
- Notification jobs/attempts as durable source of truth
- Evaluator with distributed locking + heartbeats
KMS / Keyring
envelope encryptionTenant key registry + encrypted blob store. Pluggable KMS providers; resumable re-encryption jobs with telemetry.
- Flag: CRYPTO_ENABLED, CRYPTO_PROVIDER
- Admin: /v1/admin/keyring + /v1/admin/keys
- KEYRING_MASTER_KEY_REQUIRED for required-only mode