PandaProbe
PandaProbe is an open source agent engineering platform for tracing, evals, metrics, and live monitoring to debug and improve AI agents.
What is PandaProbe?
PandaProbe is an open source agent engineering platform designed to help you debug and improve AI agents. It provides tracing, evaluation runs, metrics, and live monitoring across the full agent development lifecycle.
The platform focuses on making agent behavior observable: it captures an agent run step-by-step, including chains, agents, LLM calls, and tool calls, along with model parameters, token usage, and metadata. This supports both initial debugging (“first run”) and ongoing improvements (“continuous improvement”).
Key Features
- Automatic tracing via instrumentation: A single
instrument()call traces your full agent run, helping you capture spans for chains, agents, LLMs, and tools. - Framework and provider compatibility: Works with top agent frameworks and integrates with any LLM provider (so you can use your existing stack).
- Detailed span and usage visibility: Lets you see model types, parameters, token usage, and key metadata, with spans that reflect the structure of an agent run.
- Evals and metrics: Adds evaluation runs and metrics alongside tracing to support debugging and continuous improvement.
- Live monitoring and developer tooling: Designed for monitoring agent behavior while you develop and refine agent workflows.
How to Use PandaProbe
- Get started using the provided docs and installation instructions.
- Initialize tracing once at startup before creating agents. For example, create an adapter instance, then call
adapter.instrument(). - Run your agent normally. After instrumentation, PandaProbe captures the steps of your run (chains/agents/LLMs/tools) as spans.
- Review traces, evals, and metrics to identify issues and iterate on your agent’s behavior.
Example pattern shown on the site:
- Create a framework/provider adapter (e.g.,
GoogleADKAdapter) with session/user identifiers and tags. - Call
instrument()once at startup. - Proceed with agent runner usage; the runner becomes fully traced.
Use Cases
- Debugging an agent run end-to-end: Trace a complete execution to see how chains, agent steps, LLM calls, and tool invocations relate, including token usage and key metadata.
- Verifying behavior after changes: Use eval runs and metrics to compare agent behavior across iterations while you adjust prompts, tool logic, or model configuration.
- Instrumenting a specific agent framework integration: Use the Python SDK and provided adapters to add tracing to agent runners in frameworks such as LangGraph, LangChain, or CrewAI.
- Monitoring production-like runs: Tag runs (e.g., with a
productiontag) and use live monitoring to track agent activity and diagnose issues as they appear. - Custom instrumentation: When built-in adapters don’t cover your setup, use PandaProbe’s support for custom instrumentation in the Python SDK.
FAQ
-
Is PandaProbe open source?
Yes. PandaProbe is available under the Apache 2.0 license, and the site states you can self-host the core features for free without limitations. -
Can I use tracing without the evaluation/metrics components?
The site describes tracing alongside evals and metrics, but it does not explicitly state whether you can use only tracing. Check the documentation or FAQ section for the supported configuration. -
What deployment options are available?
PandaProbe offers PandaProbe Cloud (PandaProbe hosts) and self-hosting (you host). It also mentions alternative hosting options such as hybrid & self-hosted. -
Which frameworks does it support?
The page lists integrations for LangGraph, LangChain, CrewAI, and several agent SDKs (including Google ADK, Claude Agent SDK, OpenAI Agents SDK, and Gemini). -
How do I get started?
The site recommends beginning with setup via the documentation, then callinginstrument()once at startup before creating agents so traces are captured during runs.
Alternatives
- Agent observability and tracing platforms: Alternatives in the same category typically focus on end-to-end trace capture for LLM calls and tool execution. Differences usually come down to how they integrate with agent frameworks and whether they also provide eval/metrics workflows.
- LLM/AI monitoring solutions: Some tools emphasize monitoring prompts, latency, and token usage for production LLM applications. They may be less structured around agent spans (chains/agents/tools) unless explicitly built for agent workflows.
- Evaluation frameworks and test harnesses for LLM agents: These focus on measuring outputs and regressions rather than providing detailed runtime tracing. You may need separate tracing tooling to connect evaluations back to specific agent steps.
- OpenTelemetry-based tracing for custom stacks: If you already use OpenTelemetry, an alternative approach is to instrument your agent runtime directly. This can offer flexibility but may require more engineering compared to dedicated agent engineering adapters.
Alternatives
AakarDev AI
AakarDev AI is a powerful platform that simplifies the development of AI applications with seamless vector database integration, enabling rapid deployment and scalability.
Arduino VENTUNO Q
Arduino VENTUNO Q is an edge AI computer for robotics, combining AI inference hardware and a microcontroller for deterministic control. Arduino App Lab-ready.
Devin
Devin is an AI coding agent that helps software teams complete code migrations and large refactoring by running subtasks in parallel.
BenchSpan
BenchSpan runs AI agent benchmarks in parallel, captures scores and failures in run history, and uses commit-tagged executions to improve reproducibility.
open-codex-computer-use
open-codex-computer-use is an open-source “Computer Use” MCP server that lets AI agents run desktop GUI actions on macOS, Linux, and Windows.
PromptScout
PromptScout tracks how your brand is mentioned, which competitors are recommended, and what sources are cited in AI answers—plus website audits.