1. Overview & Architecture
What is AgentID?
AgentID is a security, compliance, and observability System of Record for AI systems. It is not just a prompt filter. It is a control plane plus runtime enforcement layer that sits between your application and the LLM execution path to enforce policy, capture telemetry, and generate audit evidence.
Get Started
1) Create your system
- Sign in at
https://app.getagentid.com. - Create an AI system.
- Copy:
AGENTID_API_KEYAGENTID_SYSTEM_ID
2) Install SDK
Node.js / TypeScript
npm install agentid-sdk
Vercel AI SDK wrapper
npm install ai agentid-vercel-sdk @ai-sdk/openai
Python
pip install agentid-sdk
3) Configure environment variables
export AGENTID_API_KEY="sk_live_..."
export AGENTID_SYSTEM_ID="00000000-0000-0000-0000-000000000000"
export OPENAI_API_KEY="sk-proj-..."
4) Basic runtime flow
Run guard() before model execution and log() after completion:
App Input -> guard() -> allow/block -> LLM call -> log()
Use these guides for full examples and wrappers:
- Read the full Node.js / TypeScript SDK Guide
- Read the full Python SDK Guide
- Read the Vercel AI SDK Wrapper Guide
Current Production Topology
The public runtime path is now split into a thin edge-facing route and a dedicated guard engine:
Client -> Vercel /api/v1/guard -> Fly guard engine -> verdict
|
+-> Vercel background side effects
(ai_events, system_metrics, async Tier-2 forensic audit)
This is intentional.
- Vercel remains the public API surface and owns auth, routing, rollout flags, and background persistence.
- Fly.io runs the hot guard engine with warm in-memory runtime, WASM matcher, and preflight blockers.
- Supabase remains the system-of-record for configuration and event storage.
Shadow Mode vs Blocking Mode
AgentID has two distinct runtime behaviors:
Blocking mode
/guardis called synchronously.- The client waits for the verdict before the LLM call.
- This is the production enforcement path for prompt injection, DB access, code execution, and PII leakage.
Shadow mode
- The public route can return an immediate
allowedresponse while still markingshadow_mode=true. - The upstream guard call, metadata enrichment, DB logging, and async Tier-2 forensic audit continue in the background.
- This is the recommended path when you want a zero-latency audit layer without affecting end-user UX.
Operationally, zero-latency behavior is strictly conditional on shadow_mode=true. Blocking systems still use the synchronous path.
The Two-Plane Architecture
To guarantee zero-trust security without bottlenecking your application, AgentID operates on a strict two-plane design:
- Control Plane: Dashboard and backend logic where you define system configurations, API keys, RBAC, guardrail policies, and compliance workflows.
- Data Plane: Runtime endpoints (
/guard,/agent/config,/ingest,/ingest/finalize) that process live traffic, enforce deterministic pre-execution checks, and trigger async deep-scan audit jobs.
Control Plane vs Data Plane Artifacts
AgentID policy detection is split into authoring and runtime execution:
- Control-plane authoring tables:
pattern_catalog,pattern_overrides,pattern_exceptions - Compiled artifact storage:
policy_pack_artifacts - System pointer:
ai_systems.policy_pack_versionandai_systems.policy_pack_updated_at
Runtime reads a precompiled policy pack instead of building regex/trie structures per request.
Runtime Cache Layers
The hot path uses layered caches:
- L1 in-memory cache: per-process, fastest path
- L2 cache: optional cross-instance cache
- DB fallback: authoritative source for systems, API keys, and policy-pack metadata
Production warmup now primes:
- public
agent/configlookup - public
guardauth/config preflight - direct Fly allow path
- direct Fly blocking path
This is why the first request after deploy is now much closer to steady-state than earlier versions.
Dual-Phase Evaluation Model
Phase 1: Synchronous Enforcement Fast Path
When your app calls guard(), the payload is evaluated by the deterministic policy engine:
- prompt-injection blockers
- DB-access detection
- code-execution detection
- PII leakage detection
- compiled Rust/WASM policy-pack matching
- local toxicity fast path
- synchronous local ML prompt/code classifiers for semantic prompt injection and code-risk variants
This path is optimized for low latency and returns an allow/block verdict before model execution.
Phase 2: Async Tier-2 Forensic Audit
After the guard event has been persisted, AgentID can queue an asynchronous forensic audit. This layer is domain-aware, uses the system's onboarding context, and enriches the stored event with:
ai_clean_summaryai_intentai_threat_analysisai_attack_sophisticationai_detected_signalsevaluation_metadata.forensic_audit
This Tier-2 audit is designed for auditor review and ISO 42001 evidence quality. It does not own the hot-path allow/block decision.
SDK Execution Model
Official SDKs default to backend-first enforcement:
/guardis the authority for prompt injection, DB access, code execution, and PII leakage.- Optional local preflight exists only for explicit fail-close or
clientFastFailuse cases. log()persists the post-execution lifecycle row./ingest/finalizecan attachsdk_ingest_msafter the primary ingest write.- Multimodal prompts are supported in the SDK/wrapper path by extracting text parts for security scanning while passing image/audio/file attachments through to the provider unchanged.
- Attachment presence and normalized attachment media types are persisted into event metadata so auditors can see that a file traveled with the prompt even when the binary payload itself was not OCR- or vision-scanned.
Important scope rule:
- Automatic protection/logging applies only to the SDK-wrapped runtime surfaces.
- For OpenAI wrappers today, that means
chat.completions.create. - For Vercel AI SDK applications, use
agentid-vercel-sdksogenerateText()/streamText()stay unchanged while still running the same AgentID lifecycle. - If your application uses
responses.create, Assistants, or a custom provider path, callguard()andlog()explicitly unless you have a dedicated integration wrapper.
See the dedicated guide:
Latency Semantics
Activity latency reflects synchronous processing, not async audit wall time.
processing_time_ms: synchronous request runtime shown in Activityai_audit_duration_ms: async post-processing duration- SDK metadata may also include:
sdk_config_fetch_mssdk_local_scan_mssdk_guard_mssdk_ingest_ms
The first request after deploy or version switch can still be slower than steady state, but current production warmup is designed to keep that cold-start tax bounded instead of multi-second on every allow/block path.
Warmup and Cron Strategy
AgentID production can optionally call the internal warm route on a schedule:
/api/internal/guard/warm
That route is intended to keep:
- Vercel auth/config resolution warm
- Fly guard runtime warm
- representative allow and block branches hot
It is an operational stabilizer, not a substitute for correct cache behavior.