9. Runtime Verification Runbook

Use this runbook after policy/config changes, runtime refactors, or infrastructure moves.

1) Prerequisites

Set:

AGENTID_API_KEY
AGENTID_SYSTEM_ID
local API base when testing locally, for example http://127.0.0.1:3000/api/v1

For production verification you also need:

a blocking-profile system
an observe-profile system
the matching API keys for each system

2) Bootstrap First Active Policy Pack

node scripts/qa/bootstrap-policy-pack-and-verify.mjs --base-url=http://127.0.0.1:3000/api/v1 --system-id=<SYSTEM_UUID>

Pass criteria:

policy_pack_artifacts contains an active artifact for the system
ai_systems.policy_pack_version > 0
the verification event shows:
- metadata_policy_pack_fallback: false
- metadata_policy_pack_version > 0

3) Validate Local Guard + Ingest Lifecycle

powershell -ExecutionPolicy Bypass -File .\scripts\qa\run-guard-diagnostic.ps1 `
  -BaseUrl http://127.0.0.1:3000/api/v1 `
  -ApiKey $env:AGENTID_API_KEY `
  -SystemId $env:AGENTID_SYSTEM_ID `
  -SkipBenchmark

Pass criteria:

GET /api/v1/agent/config returns 200
guard and ingest lifecycle tests pass
policy matrix outcome matches current dashboard toggles

4) Validate Production Matrix

The primary production regression command is:

npm run qa:guard-prod-matrix

This exercises both:

blocking profile
observe profile

Pass criteria:

blocking 22/22
observe 22/22
no unexpected 401, 436, or 503

Recommended interpretation:

steady-state allow path should stay in low hundreds of milliseconds
observe path should be close to allow-path latency
if the first blocking request spikes, inspect fallback headers/logs before blaming the matcher

5) Warm Production Runtime

The internal warm endpoint is:

https://app.getagentid.com/api/internal/guard/warm

Use it to prewarm:

public auth/config lookup
public preflight guard path
direct Fly allow path
direct Fly blocking path

Manual warm call

$headers = @{ Authorization = "Bearer $env:CRON_SECRET" }
Invoke-RestMethod -Method Get -Uri "https://app.getagentid.com/api/internal/guard/warm" -Headers $headers

Cron guidance

Vercel Cron should target the path /api/internal/guard/warm
external cron services should call the full URL and send Authorization: Bearer <CRON_SECRET>

6) Verify Zero-Latency Shadow Mode

Shadow mode should return immediately while still persisting a background audit trail.

Expected response headers:

x-agentid-zero-latency-shadow: 1
x-agentid-upstream: deferred_shadow

Expected behavior:

client receives allowed: true
shadow_mode: true
matching ai_events row appears later with:
- guard_upstream_source = "fly"
- shadow_mode = true

7) Validate Labeling + Async AI Audit

powershell -ExecutionPolicy Bypass -File .\scripts\qa\run-ai-label-audit-check.ps1 `
  -BaseUrl http://127.0.0.1:3000/api/v1 `
  -ApiKey $env:AGENTID_API_KEY `
  -SystemId $env:AGENTID_SYSTEM_ID `
  -Model gpt-4o-mini

Then in Activity:

verify expected labels on each test case (Injection, PII/Data Leak, DB Access, Code Exec)
wait 10-30s and refresh for async AI audit completion

8) Troubleshooting Checklist

Supabase Auth CPU spike / repeated `522` on `/auth/v1/token?grant_type=refresh_token`

Typical signal:

Supabase API Gateway shows many POST /auth/v1/token?grant_type=refresh_token
caller is Vercel Edge Functions
x_client_info is supabase-ssr/... createServerClient
origin times are tens of seconds and end in 522

What to verify:

Current deploy includes the middleware auth-refresh stabilization logic.
AGENTID_MIDDLEWARE_SESSION_TIMEOUT_MS stays short (~1200ms by default).
AGENTID_AUTH_REFRESH_LEEWAY_SECONDS is set so refresh only happens near expiry.
AGENTID_AUTH_REFRESH_BACKOFF_SECONDS is enabled so failed refreshes are throttled.
After deploy, middleware metrics show backoff/clearing instead of unbounded refresh attempts:
- auth_refresh_attempt
- auth_refresh_degraded
- auth_refresh_backoff_active
- auth_refresh_cleared_stale_cookies
Supabase API Gateway refresh-token 522 volume drops sharply within a few minutes of deploy.

`policy_pack_fallback=true`

Confirm active artifact exists for the system
Confirm ai_systems.policy_pack_version is non-zero
Rebuild the pack
Re-run bootstrap verification

No log row after the client used AgentID

This is the most common integration misunderstanding.

Verify these points in order:

guard() alone only creates a preflight row. It does not create the final complete lifecycle row.
Dashboard activity/graphs/cost need /ingest (directly or through SDK wrapper) after the model response.
JS/Python wrapOpenAI() currently instrument chat.completions.create, not arbitrary OpenAI surfaces such as responses.create.
If the app uses a custom OpenAI helper or background worker, confirm that helper actually calls agent.log() or a supported wrapper path.
If the request path returns to the client before telemetry is awaited, verify the runtime keeps the background task alive long enough for /ingest to complete.

Activity exists but cost, token, or ROI graphs are empty

This usually means the integration guarded/logged something, but did not log a spend-bearing completion row with provider usage.

Verify in Activity detail or the stored ai_events row:

event_type is complete.
lifecycle_status is completed.
model_version is the real provider model id, not Not applicable.
input_tokens and output_tokens are non-null.
cost_usd is non-null for models present in model_pricing.
metadata.model_used=true and metadata.spend_bearing=true on the LLM row.
For ROI, the AI system has human_hourly_rate and human_time_per_task_min configured in Settings or onboarding.

Integration fixes:

Vercel AI SDK: pass the withAgentId(...) wrapped model to the real generateText() / streamText() call and consume the wrapped result/stream.
Node/OpenAI: call secured.chat.completions.create(...), not the raw OpenAI client.
Manual provider: pass usage or tokens to agent.log(...) / /ingest from the provider response.
Custom model: add pricing or use a normalized model id already present in the pricing catalog.

9) Benchmark Hot Path

npm run bench:policy-pack-hotpath

This benchmark measures:

normalization time
trie prefilter time
regex evaluation time
total detection hot-path time

Target:

hot_path_total_ms.p95 < 30

10) Latency SLO Interpretation

Use these two local benchmark profiles when you want detailed diagnostics:

powershell -ExecutionPolicy Bypass -File .\scripts\qa\run-guard-diagnostic.ps1 `
  -BaseUrl http://127.0.0.1:3000/api/v1 `
  -ApiKey $env:AGENTID_API_KEY `
  -SystemId $env:AGENTID_SYSTEM_ID `
  -Warmup 4 -Iterations 30 -Parallel 1 -RequestTimeoutSec 15

powershell -ExecutionPolicy Bypass -File .\scripts\qa\run-guard-diagnostic.ps1 `
  -BaseUrl http://127.0.0.1:3000/api/v1 `
  -ApiKey $env:AGENTID_API_KEY `
  -SystemId $env:AGENTID_SYSTEM_ID `
  -Warmup 4 -Iterations 40 -Parallel 3 -RequestTimeoutSec 15

Guidance:

treat Parallel=1 plus DB telemetry as the primary user-facing SLO
treat burst runs as pressure validation, not the main latency contract
after region moves or proxy changes, rerun both profiles and compare p50/p95 deltas

1) Prerequisites​

2) Bootstrap First Active Policy Pack​

3) Validate Local Guard + Ingest Lifecycle​

4) Validate Production Matrix​

5) Warm Production Runtime​

Manual warm call​

Cron guidance​

6) Verify Zero-Latency Shadow Mode​

7) Validate Labeling + Async AI Audit​

8) Troubleshooting Checklist​

Supabase Auth CPU spike / repeated 522 on /auth/v1/token?grant_type=refresh_token​

policy_pack_fallback=true​

No log row after the client used AgentID​

Activity exists but cost, token, or ROI graphs are empty​

9) Benchmark Hot Path​

10) Latency SLO Interpretation​

1) Prerequisites

2) Bootstrap First Active Policy Pack

3) Validate Local Guard + Ingest Lifecycle

4) Validate Production Matrix

5) Warm Production Runtime

Manual warm call

Cron guidance

6) Verify Zero-Latency Shadow Mode

7) Validate Labeling + Async AI Audit

8) Troubleshooting Checklist

Supabase Auth CPU spike / repeated `522` on `/auth/v1/token?grant_type=refresh_token`

`policy_pack_fallback=true`

No log row after the client used AgentID

Activity exists but cost, token, or ROI graphs are empty

9) Benchmark Hot Path

10) Latency SLO Interpretation