PandaProbe

What is PandaProbe?

PandaProbe is an open source agent engineering platform designed to help you debug and improve AI agents. It provides tracing, evaluation runs, metrics, and live monitoring across the full agent development lifecycle.

The platform focuses on making agent behavior observable: it captures an agent run step-by-step, including chains, agents, LLM calls, and tool calls, along with model parameters, token usage, and metadata. This supports both initial debugging (“first run”) and ongoing improvements (“continuous improvement”).

Key Features

Automatic tracing via instrumentation: A single instrument() call traces your full agent run, helping you capture spans for chains, agents, LLMs, and tools.
Framework and provider compatibility: Works with top agent frameworks and integrates with any LLM provider (so you can use your existing stack).
Detailed span and usage visibility: Lets you see model types, parameters, token usage, and key metadata, with spans that reflect the structure of an agent run.
Evals and metrics: Adds evaluation runs and metrics alongside tracing to support debugging and continuous improvement.
Live monitoring and developer tooling: Designed for monitoring agent behavior while you develop and refine agent workflows.

How to Use PandaProbe

Get started using the provided docs and installation instructions.
Initialize tracing once at startup before creating agents. For example, create an adapter instance, then call adapter.instrument().
Run your agent normally. After instrumentation, PandaProbe captures the steps of your run (chains/agents/LLMs/tools) as spans.
Review traces, evals, and metrics to identify issues and iterate on your agent’s behavior.

Example pattern shown on the site:

Create a framework/provider adapter (e.g., GoogleADKAdapter) with session/user identifiers and tags.
Call instrument() once at startup.
Proceed with agent runner usage; the runner becomes fully traced.

Use Cases

Debugging an agent run end-to-end: Trace a complete execution to see how chains, agent steps, LLM calls, and tool invocations relate, including token usage and key metadata.
Verifying behavior after changes: Use eval runs and metrics to compare agent behavior across iterations while you adjust prompts, tool logic, or model configuration.
Instrumenting a specific agent framework integration: Use the Python SDK and provided adapters to add tracing to agent runners in frameworks such as LangGraph, LangChain, or CrewAI.
Monitoring production-like runs: Tag runs (e.g., with a production tag) and use live monitoring to track agent activity and diagnose issues as they appear.
Custom instrumentation: When built-in adapters don’t cover your setup, use PandaProbe’s support for custom instrumentation in the Python SDK.

FAQ

Is PandaProbe open source?
Yes. PandaProbe is available under the Apache 2.0 license, and the site states you can self-host the core features for free without limitations.
Can I use tracing without the evaluation/metrics components?
The site describes tracing alongside evals and metrics, but it does not explicitly state whether you can use only tracing. Check the documentation or FAQ section for the supported configuration.
What deployment options are available?
PandaProbe offers PandaProbe Cloud (PandaProbe hosts) and self-hosting (you host). It also mentions alternative hosting options such as hybrid & self-hosted.
Which frameworks does it support?
The page lists integrations for LangGraph, LangChain, CrewAI, and several agent SDKs (including Google ADK, Claude Agent SDK, OpenAI Agents SDK, and Gemini).
How do I get started?
The site recommends beginning with setup via the documentation, then calling instrument() once at startup before creating agents so traces are captured during runs.

Alternatives

Agent observability and tracing platforms: Alternatives in the same category typically focus on end-to-end trace capture for LLM calls and tool execution. Differences usually come down to how they integrate with agent frameworks and whether they also provide eval/metrics workflows.
LLM/AI monitoring solutions: Some tools emphasize monitoring prompts, latency, and token usage for production LLM applications. They may be less structured around agent spans (chains/agents/tools) unless explicitly built for agent workflows.
Evaluation frameworks and test harnesses for LLM agents: These focus on measuring outputs and regressions rather than providing detailed runtime tracing. You may need separate tracing tooling to connect evaluations back to specific agent steps.
OpenTelemetry-based tracing for custom stacks: If you already use OpenTelemetry, an alternative approach is to instrument your agent runtime directly. This can offer flexibility but may require more engineering compared to dedicated agent engineering adapters.

PandaProbe

What is PandaProbe?

Key Features

How to Use PandaProbe

Use Cases

FAQ

Alternatives

Alternatives

AakarDev AI

Arduino VENTUNO Q

Devin

BenchSpan

open-codex-computer-use

PromptScout