Documentation

Run agents. Know when they break.

Dunetrace is runtime reliability for AI agents. 29 structural detectors, deterministic explanations, Slack alerts in seconds. These pages cover everything from a two-minute install to the database schema.

Start here

Quick start

Up and running in two minutes

Clone, docker compose up, instrument your agent, open the dashboard. Runs locally with no API key.

Architecture

How the pipeline works

Five services, one Postgres, one static dashboard. SDK → ingest → detector → explain → alerts. Failure modes included.

Integrate your agent

LangChain / LangGraph

One callback, zero agent changes

DunetraceCallbackHandler plugs into the LangChain callback system and translates every event automatically.

CrewAI

Global hooks, one wrapper

DunetraceCrewCallback registers global LLM and tool hooks. Wrap crew.kickoff() with dt.run() to group all events under one run.

AutoGen

Wrap the model client once

DunetraceAutoGenObserver wraps a multi-agent AutoGen conversation. observer.wrap_client() instruments every LLM call automatically.

Haystack

DunetraceHaystackTracer implements the Haystack Tracer protocol. One enable_tracing() call covers every pipeline run — LLMs, retrievers, and tool invocations.

LlamaIndex

Trace LlamaIndex agents

Capture queries, tools, and LLM calls from LlamaIndex workflows.

PydanticAI

Type-safe agents, traced

Instrument PydanticAI agents and their tools with the Dunetrace SDK.

OpenAI Agents

OpenAI Agents SDK

Capture runs, tool calls, and handoffs from the OpenAI Agents SDK.

smolagents

Hugging Face smolagents

Trace code and tool-calling agents built with smolagents.

Vercel AI

Vercel AI SDK

Instrument TypeScript agents on the Vercel AI SDK (Node 22+).

LiteLLM

LiteLLM proxy

Auto-instrument any model routed through a LiteLLM proxy.

Dify

Dify workflows

Monitor Dify agents and workflows with the Dunetrace SDK.

Hermes

Hermes agents

Instrument Hermes agent runs with the Dunetrace SDK.

Custom Python agent

Decorator, middleware, or manual

Six paths: @dt.trace/@dt.tool, @dt.agent(), ASGI, WSGI, manual dt.run(), or OTel receiver.

TypeScript agent

npm package with background buffering

npm install dunetrace. Call autoInstrument() once to track every OpenAI/Anthropic client and outbound fetch, or wrap individually with dt.wrapOpenAI() and dt.tool(). Same detectors and alerts as Python.

Langdock

Zero-code OTel monitoring

Point Langdock's "Tracing cloud URL" at the Dunetrace ingest service. 29 structural detectors activate immediately with no code changes.

Langfuse

Pull evaluation results in

Root-cause analysis is native and needs no Langfuse credentials. Connect Langfuse separately to pull its own evaluation results into the same dashboard.

OpenTelemetry

Export and receive, both SDKs

Export agent runs as OTLP spans to Datadog, Grafana, or Honeycomb, or ingest existing gen_ai.* traces so the detectors run with no code change.

All integrations

FastAPI, Flask, OTel, Loki

OpenLLMetry, Grafana Loki, Tempo, Honeycomb, Datadog. Side-by-side setup for each.

Detectors

29 structural detectors

What each one catches, its threshold, how to tune detectors.yml, and shadow-mode evaluation.

Operate it

Dashboard

Mission control at :3000

Overview, Runs, Alerts, Analytics, Heatmap, Agents, Compare, Detectors. Auto-refreshes every 15s.

Alerts

Slack, webhook, weekly digest

Rate context, HMAC signatures, at-least-once delivery, and the Monday 9am UTC digest.

MCP server

Query your agents from your editor

Ask Claude Code or Cursor about agent health, failure patterns, and run timelines using the Dunetrace MCP server tools.

Guardrails & reference

Policies

Runtime guardrails

Stop, switch model, inject a prompt, or cap a run mid-execution. The engine behind runtime prevention.

Approvals

Human-in-the-loop

Gate consequential tool calls until a human approves in Slack or the dashboard. Fail-closed.

Semantic evaluation

Tier 2, LLM-based judgment

Hallucination, task completion, and cross-turn frustration, sampled post-completion via DeepEval.

Voice pack

Voice agent detectors

Nine detectors for real-time voice agents: STT confidence, silence, turn-taking, TTS, VAD.

State machine

Events into RunState

How paired events reconstruct a run into the state the detectors read.

Operations

Retention & controls

Storage growth, retention, and the manual service controls for ingest and detection.

Platform

Pillar 3

Semantic Evaluation

LLM-based judgment for hallucination, task completion, and cross-turn frustration — post-hoc, sampling-based, opt-in.

Pillar 4 · The differentiator

Runtime Prevention

Policies that stop, redirect, or downgrade a run while it's happening — the one thing no tracer can do.

Compare

Dunetrace vs Langfuse

Different problems, and how to use both together.

Compare

Dunetrace vs LangSmith

LangChain-native tracing and eval vs framework-agnostic runtime prevention.

Compare

Dunetrace vs Braintrust

Eval-first scoring vs in-path structural detection.

Compare

Dunetrace vs Helicone

LLM-call-level proxy observability vs agent-run-level structural detection.

Compare

Dunetrace vs Arize

ML/LLM observability and drift monitoring vs real-time agent failure prevention.

FAQ

How is Dunetrace different from LangSmith or Langfuse?

LangSmith and Langfuse are passive tracing tools — they record what happened so you can inspect it later. Dunetrace is active: it watches the structural pattern of every run and fires a Slack alert within 15 seconds of completion. You don't have to know something broke to open a dashboard. Think of Dunetrace as your alerting layer and Langfuse as your deep-inspection layer. They work well together — get the alert from Dunetrace, then drill into the full trace in Langfuse for root cause analysis.

How much overhead does the SDK add?

Less than 500µs per agent run. Events are buffered in a background thread — your agent is never blocked waiting on Dunetrace. The SDK patches OpenAI, Anthropic, and httpx globally so there is nothing to add to each call site. If the ingest service goes down, buffered events are dropped silently and your agent keeps running unaffected.

What frameworks and languages does Dunetrace support?

Python and TypeScript/Node. In Python: first-class support for LangChain, LangGraph, CrewAI, AutoGen, and Haystack. Any other framework works via @dt.agent() decorator or context manager. FastAPI, Flask, and ASGI/WSGI apps are supported via middleware. The TypeScript SDK auto-instruments OpenAI and Anthropic with autoInstrument(), or wraps individual clients with dt.wrapOpenAI() / dt.wrapAnthropic(). Zero-code instrumentation is available for any OpenTelemetry-compatible pipeline (Langdock, Dify, etc.).

Can I tune the detector thresholds?

Yes. Edit detectors.yml in the repo root — no code changes or rebuilds needed. You can set per-agent overrides too. For example, a web-research agent that legitimately repeats queries can have a higher tool-loop threshold than your other agents. Apply changes with docker compose restart detector.

What happens if my ingest service goes down?

The SDK buffers up to 10,000 events locally and your agent runs unaffected. Once the service recovers, buffered events are shipped automatically. The detector worker processes all queued runs on restart — so signals and alerts are delayed, not lost.

How do I set up Slack alerts?

Add SLACK_WEBHOOK_URL to your .env file, then restart the alerts worker: docker compose restart alerts. Optionally set SLACK_CHANNEL and SLACK_MIN_SEVERITY to filter noise. Get a webhook URL from api.slack.com/messaging/webhooks. Generic webhooks (PagerDuty, Linear, custom) are also supported — set WEBHOOK_URL instead.

Something missing?

Open an issue on GitHub or email the team. Docs PRs welcome.

Open an issue ↗ Get in touch