Docs / Reference · Detectors

Detectors

Seventeen structural detectors run automatically on every completed run. All thresholds configurable. Shadow mode lets you evaluate new detectors against real traffic before they page anyone.

What each detector catches

Seventeen structural detectors run automatically on every completed run — 16 Tier 1 detectors plus prompt injection signal detection. All thresholds are configurable via detectors.yml — no code changes, no rebuild.

DetectorTriggerSeverity
PROMPT_INJECTION_SIGNALInput matches known injection / jailbreak patternsCRIT
TOOL_LOOPSame tool called ≥3× in a 5-tool-call windowHIGH
TOOL_THRASHINGAgent alternates between exactly two toolsHIGH
LLM_TRUNCATION_LOOPfinish_reason=length fires ≥2 timesHIGH
RETRY_STORMSame tool fails 3+ times in a row without recoveryHIGH
EMPTY_LLM_RESPONSEZero-length output with finish_reason=stopHIGH
CASCADING_TOOL_FAILURE3+ consecutive failures across 2+ distinct toolsHIGH
SLOW_STEPStep duration exceeds 2× P75 baseline ¹ or static fallback (tool >15s, LLM >30s)MED/HIGH
CONTEXT_BLOATPrompt tokens grow beyond 2× P75 baseline ¹ or static fallback (3× from first to last call)MED
GOAL_ABANDONMENTTool use stops, then ≥4 consecutive LLM calls with no exitMED
REASONING_STALLLLM:tool-call ratio exceeds 2× P75 baseline ¹ or static fallback (≥4×)MED/HIGH
STEP_COUNT_INFLATIONRun used >2× P75 step count for this agent ¹MED
FIRST_STEP_FAILUREError or empty output at step ≤2MED
RAG_EMPTY_RETRIEVALRetrieval returned 0 results or relevance <0.3, agent answered anywayMED
TOOL_AVOIDANCEFinal answer without calling available toolsMED
COST_SPIKETotal token consumption exceeds 3× P75 baseline ¹ or static fallback (>50,000 tokens)MED
SESSION_LATENCYTotal wall-clock run duration exceeds 3× P75 baseline ¹ or static fallback (>5 min)MED
¹ Six detectors use per-agent learned baselines: STEP_COUNT_INFLATION, SLOW_STEP, CONTEXT_BLOAT, REASONING_STALL, COST_SPIKE, and SESSION_LATENCY. P75 is computed from the last 50 successfully completed runs for the same agent_id + agent_version. Each detector falls back to its static threshold until at least 20 such runs exist, then switches to the adaptive baseline automatically. Tune the multiplier per agent with inflation_factor in detectors.yml.

Tuning thresholds

Edit detectors.yml in the repo root.

default:
  tool_loop:
    threshold: 2        # fire if same tool called ≥N times in window
  context_bloat:
    growth_factor: 4.0  # last/first prompt token ratio to trigger

web-research:
  tool_loop:
    threshold: 5        # search agents legitimately repeat queries

Named sections match agent_id and inherit from default, overriding only what you specify. Restart the detector to apply:

docker compose restart detector

Shadow mode

Every signal is stored with a shadow flag. The alerts worker only delivers signals where shadow = false.

All 17 built-in detectors ship live. Custom detectors start in shadow mode — signals stored and visible in the dashboard, but no Slack or webhook alert fires — until you add them to LIVE_DETECTORS:

# services/detector/detector_svc/db.py
LIVE_DETECTORS: set[str] = {
    "TOOL_LOOP",
    "YOUR_NEW_DETECTOR",   # promote once precision > 80%
    …
}

This lets you validate a new detector against real traffic before it pages anyone.

Shadow signals in the dashboard

The Alerts page surfaces shadow signals in a dedicated section below the live alert groups. Dashed border, reduced opacity, SHADOW badge. The section only appears when at least one shadow signal exists.

curl "http://localhost:8002/v1/agents/my-agent/signals?include_shadow=true" \
  -H "Authorization: Bearer dt_dev_test"

How detection works

  1. Detector worker polls Postgres every 5 seconds for completed or stalled runs
  2. Fetches all events for that run from events
  3. Replays them into a RunState — tool calls, LLM calls, retrievals, durations
  4. Runs all 16 Tier 1 detectors against the state (PROMPT_INJECTION_SIGNAL is extracted from the run.started payload — detected by the SDK on raw input before hashing)
  5. Writes any triggered FailureSignal rows
  6. Marks the run processed in processed_runs

Detection adds zero latency to the agent — it runs entirely after the run completes.