9. Runtime Verification Runbook
Use this runbook after policy/config changes, runtime refactors, or infrastructure moves.
1) Prerequisites
Set:
AGENTID_API_KEYAGENTID_SYSTEM_ID- local API base when testing locally, for example
http://127.0.0.1:3000/api/v1
For production verification you also need:
- a blocking-profile system
- an observe-profile system
- the matching API keys for each system
2) Bootstrap First Active Policy Pack
node scripts/qa/bootstrap-policy-pack-and-verify.mjs --base-url=http://127.0.0.1:3000/api/v1 --system-id=<SYSTEM_UUID>
Pass criteria:
policy_pack_artifactscontains an active artifact for the systemai_systems.policy_pack_version > 0- the verification event shows:
metadata_policy_pack_fallback: falsemetadata_policy_pack_version > 0
3) Validate Local Guard + Ingest Lifecycle
powershell -ExecutionPolicy Bypass -File .\scripts\qa\run-guard-diagnostic.ps1 `
-BaseUrl http://127.0.0.1:3000/api/v1 `
-ApiKey $env:AGENTID_API_KEY `
-SystemId $env:AGENTID_SYSTEM_ID `
-SkipBenchmark
Pass criteria:
GET /api/v1/agent/configreturns200- guard and ingest lifecycle tests pass
- policy matrix outcome matches current dashboard toggles
4) Validate Production Matrix
The primary production regression command is:
npm run qa:guard-prod-matrix
This exercises both:
- blocking profile
- observe profile
Pass criteria:
blocking 22/22observe 22/22- no unexpected
401,436, or503
Recommended interpretation:
- steady-state allow path should stay in low hundreds of milliseconds
- observe path should be close to allow-path latency
- if the first blocking request spikes, inspect fallback headers/logs before blaming the matcher
5) Warm Production Runtime
The internal warm endpoint is:
https://app.getagentid.com/api/internal/guard/warm
Use it to prewarm:
- public auth/config lookup
- public preflight guard path
- direct Fly allow path
- direct Fly blocking path
Manual warm call
$headers = @{ Authorization = "Bearer $env:CRON_SECRET" }
Invoke-RestMethod -Method Get -Uri "https://app.getagentid.com/api/internal/guard/warm" -Headers $headers
Cron guidance
- Vercel Cron should target the path
/api/internal/guard/warm - external cron services should call the full URL and send
Authorization: Bearer <CRON_SECRET>
6) Verify Zero-Latency Shadow Mode
Shadow mode should return immediately while still persisting a background audit trail.
Expected response headers:
x-agentid-zero-latency-shadow: 1x-agentid-upstream: deferred_shadow
Expected behavior:
- client receives
allowed: true shadow_mode: true- matching
ai_eventsrow appears later with:guard_upstream_source = "fly"shadow_mode = true
7) Validate Labeling + Async AI Audit
powershell -ExecutionPolicy Bypass -File .\scripts\qa\run-ai-label-audit-check.ps1 `
-BaseUrl http://127.0.0.1:3000/api/v1 `
-ApiKey $env:AGENTID_API_KEY `
-SystemId $env:AGENTID_SYSTEM_ID `
-Model gpt-4o-mini
Then in Activity:
- verify expected labels on each test case (
Injection,PII/Data Leak,DB Access,Code Exec) - wait
10-30sand refresh for async AI audit completion
8) Troubleshooting Checklist
Supabase Auth CPU spike / repeated 522 on /auth/v1/token?grant_type=refresh_token
Typical signal:
- Supabase API Gateway shows many
POST /auth/v1/token?grant_type=refresh_token - caller is
Vercel Edge Functions x_client_infoissupabase-ssr/... createServerClient- origin times are tens of seconds and end in
522
What to verify:
- Current deploy includes the middleware auth-refresh stabilization logic.
AGENTID_MIDDLEWARE_SESSION_TIMEOUT_MSstays short (~1200msby default).AGENTID_AUTH_REFRESH_LEEWAY_SECONDSis set so refresh only happens near expiry.AGENTID_AUTH_REFRESH_BACKOFF_SECONDSis enabled so failed refreshes are throttled.- After deploy, middleware metrics show backoff/clearing instead of unbounded refresh attempts:
auth_refresh_attemptauth_refresh_degradedauth_refresh_backoff_activeauth_refresh_cleared_stale_cookies
- Supabase API Gateway refresh-token 522 volume drops sharply within a few minutes of deploy.
policy_pack_fallback=true
- Confirm active artifact exists for the system
- Confirm
ai_systems.policy_pack_versionis non-zero - Rebuild the pack
- Re-run bootstrap verification
No log row after the client used AgentID
This is the most common integration misunderstanding.
Verify these points in order:
guard()alone only creates a preflight row. It does not create the final complete lifecycle row.- Dashboard activity/graphs/cost need
/ingest(directly or through SDK wrapper) after the model response. - JS/Python
wrapOpenAI()currently instrumentchat.completions.create, not arbitrary OpenAI surfaces such asresponses.create. - If the app uses a custom OpenAI helper or background worker, confirm that helper actually calls
agent.log()or a supported wrapper path. - If the request path returns to the client before telemetry is awaited, verify the runtime keeps the background task alive long enough for
/ingestto complete.
Activity exists but cost, token, or ROI graphs are empty
This usually means the integration guarded/logged something, but did not log a spend-bearing completion row with provider usage.
Verify in Activity detail or the stored ai_events row:
event_typeiscomplete.lifecycle_statusiscompleted.model_versionis the real provider model id, notNot applicable.input_tokensandoutput_tokensare non-null.cost_usdis non-null for models present inmodel_pricing.metadata.model_used=trueandmetadata.spend_bearing=trueon the LLM row.- For ROI, the AI system has
human_hourly_rateandhuman_time_per_task_minconfigured in Settings or onboarding.
Integration fixes:
- Vercel AI SDK: pass the
withAgentId(...)wrapped model to the realgenerateText()/streamText()call and consume the wrapped result/stream. - Node/OpenAI: call
secured.chat.completions.create(...), not the raw OpenAI client. - Manual provider: pass
usageortokenstoagent.log(...)//ingestfrom the provider response. - Custom model: add pricing or use a normalized model id already present in the pricing catalog.
9) Benchmark Hot Path
npm run bench:policy-pack-hotpath
This benchmark measures:
- normalization time
- trie prefilter time
- regex evaluation time
- total detection hot-path time
Target:
hot_path_total_ms.p95 < 30
10) Latency SLO Interpretation
Use these two local benchmark profiles when you want detailed diagnostics:
powershell -ExecutionPolicy Bypass -File .\scripts\qa\run-guard-diagnostic.ps1 `
-BaseUrl http://127.0.0.1:3000/api/v1 `
-ApiKey $env:AGENTID_API_KEY `
-SystemId $env:AGENTID_SYSTEM_ID `
-Warmup 4 -Iterations 30 -Parallel 1 -RequestTimeoutSec 15
powershell -ExecutionPolicy Bypass -File .\scripts\qa\run-guard-diagnostic.ps1 `
-BaseUrl http://127.0.0.1:3000/api/v1 `
-ApiKey $env:AGENTID_API_KEY `
-SystemId $env:AGENTID_SYSTEM_ID `
-Warmup 4 -Iterations 40 -Parallel 3 -RequestTimeoutSec 15
Guidance:
- treat
Parallel=1plus DB telemetry as the primary user-facing SLO - treat burst runs as pressure validation, not the main latency contract
- after region moves or proxy changes, rerun both profiles and compare p50/p95 deltas