Skip to main content

9. Runtime Verification Runbook

Use this runbook after policy/config changes, runtime refactors, or infrastructure moves.

1) Prerequisites

Set:

  • AGENTID_API_KEY
  • AGENTID_SYSTEM_ID
  • local API base when testing locally, for example http://127.0.0.1:3000/api/v1

For production verification you also need:

  • a blocking-profile system
  • an observe-profile system
  • the matching API keys for each system

Optional async forensic audit overrides:

  • AGENTID_ASYNC_AI_AUDIT_MODEL
  • AZURE_OPENAI_ASYNC_AUDIT_DEPLOYMENT_NAME
  • AGENTID_ASYNC_AI_AUDIT_PROMPT_VERSION

2) Bootstrap First Active Policy Pack

node scripts/qa/bootstrap-policy-pack-and-verify.mjs --base-url=http://127.0.0.1:3000/api/v1 --system-id=<SYSTEM_UUID>

Pass criteria:

  • policy_pack_artifacts contains an active artifact for the system
  • ai_systems.policy_pack_version > 0
  • the verification event shows:
    • metadata_policy_pack_fallback: false
    • metadata_policy_pack_version > 0

3) Validate Local Guard + Ingest Lifecycle

powershell -ExecutionPolicy Bypass -File .\scripts\qa\run-guard-diagnostic.ps1 `
-BaseUrl http://127.0.0.1:3000/api/v1 `
-ApiKey $env:AGENTID_API_KEY `
-SystemId $env:AGENTID_SYSTEM_ID `
-SkipBenchmark

Pass criteria:

  • GET /api/v1/agent/config returns 200
  • guard and ingest lifecycle tests pass
  • policy matrix outcome matches current dashboard toggles

4) Validate Production Matrix

The primary production regression command is:

npm run qa:guard-prod-matrix

This exercises both:

  • blocking profile
  • observe profile

Pass criteria:

  • blocking 22/22
  • observe 22/22
  • no unexpected 401, 436, or 503

Recommended interpretation:

  • steady-state allow path should stay in low hundreds of milliseconds
  • observe path should be close to allow-path latency
  • if the first blocking request spikes, inspect fallback headers/logs before blaming the matcher

5) Warm Production Runtime

The internal warm endpoint is:

https://app.getagentid.com/api/internal/guard/warm

Use it to prewarm:

  • public auth/config lookup
  • public preflight guard path
  • direct Fly allow path
  • direct Fly blocking path

Manual warm call

$headers = @{ Authorization = "Bearer $env:CRON_SECRET" }
Invoke-RestMethod -Method Get -Uri "https://app.getagentid.com/api/internal/guard/warm" -Headers $headers

Cron guidance

  • Vercel Cron should target the path /api/internal/guard/warm
  • external cron services should call the full URL and send Authorization: Bearer <CRON_SECRET>

6) Verify Zero-Latency Shadow Mode

Shadow mode should return immediately while still persisting a background audit trail.

Expected response headers:

  • x-agentid-zero-latency-shadow: 1
  • x-agentid-upstream: deferred_shadow

Expected behavior:

  • client receives allowed: true
  • shadow_mode: true
  • matching ai_events row appears later with:
    • guard_upstream_source = "fly"
    • shadow_mode = true

7) Validate Labeling + Async Tier-2 Forensic Audit

powershell -ExecutionPolicy Bypass -File .\scripts\qa\run-ai-label-audit-check.ps1 `
-BaseUrl http://127.0.0.1:3000/api/v1 `
-ApiKey $env:AGENTID_API_KEY `
-SystemId $env:AGENTID_SYSTEM_ID `
-Model gpt-4o

Then in Activity:

  • verify expected labels on each test case (Injection, PII/Data Leak, DB Access, Code Exec)
  • wait 10-30s and refresh for async forensic audit completion
  • confirm the detail panel contains the expected auditor fields when AI analysis is enabled:
    • ai_clean_summary
    • ai_intent
    • ai_threat_analysis
    • ai_attack_sophistication
    • ai_detected_signals
    • evaluation_metadata.forensic_audit

Operational interpretation:

  • synchronous labels still come from the guard hot path
  • async forensic audit can refine generic labels and add secondary signals
  • concrete synchronous hard-block classes such as DB Access, Code Exec, and PII/Data Leak should remain authoritative

8) Troubleshooting Checklist

Supabase Auth CPU spike / repeated 522 on /auth/v1/token?grant_type=refresh_token

Typical signal:

  • Supabase API Gateway shows many POST /auth/v1/token?grant_type=refresh_token
  • caller is Vercel Edge Functions
  • x_client_info is supabase-ssr/... createServerClient
  • origin times are tens of seconds and end in 522

What to verify:

  1. Current deploy includes the middleware auth-refresh stabilization logic.
  2. AGENTID_MIDDLEWARE_SESSION_TIMEOUT_MS stays short (~1200ms by default).
  3. AGENTID_AUTH_REFRESH_LEEWAY_SECONDS is set so refresh only happens near expiry.
  4. AGENTID_AUTH_REFRESH_BACKOFF_SECONDS is enabled so failed refreshes are throttled.
  5. After deploy, middleware metrics show backoff/clearing instead of unbounded refresh attempts:
    • auth_refresh_attempt
    • auth_refresh_degraded
    • auth_refresh_backoff_active
    • auth_refresh_cleared_stale_cookies
  6. Supabase API Gateway refresh-token 522 volume drops sharply within a few minutes of deploy.

policy_pack_fallback=true

  1. Confirm active artifact exists for the system
  2. Confirm ai_systems.policy_pack_version is non-zero
  3. Rebuild the pack
  4. Re-run bootstrap verification

No log row after the client used AgentID

This is the most common integration misunderstanding.

Verify these points in order:

  1. guard() alone only creates a preflight row. It does not create the final complete lifecycle row.
  2. Dashboard activity/graphs/cost need /ingest (directly or through SDK wrapper) after the model response.
  3. JS/Python wrapOpenAI() currently instrument chat.completions.create, not arbitrary OpenAI surfaces such as responses.create.
  4. If the app uses a custom OpenAI helper or background worker, confirm that helper actually calls agent.log() or a supported wrapper path.
  5. If the request path returns to the client before telemetry is awaited, verify the runtime keeps the background task alive long enough for /ingest to complete.

9) Benchmark Hot Path

npm run bench:policy-pack-hotpath

This benchmark measures:

  • normalization time
  • trie prefilter time
  • regex evaluation time
  • total detection hot-path time

Target:

  • hot_path_total_ms.p95 < 30

10) Latency SLO Interpretation

Use these two local benchmark profiles when you want detailed diagnostics:

powershell -ExecutionPolicy Bypass -File .\scripts\qa\run-guard-diagnostic.ps1 `
-BaseUrl http://127.0.0.1:3000/api/v1 `
-ApiKey $env:AGENTID_API_KEY `
-SystemId $env:AGENTID_SYSTEM_ID `
-Warmup 4 -Iterations 30 -Parallel 1 -RequestTimeoutSec 15
powershell -ExecutionPolicy Bypass -File .\scripts\qa\run-guard-diagnostic.ps1 `
-BaseUrl http://127.0.0.1:3000/api/v1 `
-ApiKey $env:AGENTID_API_KEY `
-SystemId $env:AGENTID_SYSTEM_ID `
-Warmup 4 -Iterations 40 -Parallel 3 -RequestTimeoutSec 15

Guidance:

  • treat Parallel=1 plus DB telemetry as the primary user-facing SLO
  • treat burst runs as pressure validation, not the main latency contract
  • after region moves or proxy changes, rerun both profiles and compare p50/p95 deltas