UStackUStack
ReasoningBank icon

ReasoningBank

ReasoningBank is an agent memory framework that learns from successes and failures, helping deployed agents improve at test time for web browsing and software engineering.

ReasoningBank

What is ReasoningBank?

ReasoningBank is a novel agent memory framework that helps deployed agents learn from both successful and failed experiences. It is designed for long-running agents that need to improve over time rather than treat each task as an isolated attempt.

The framework stores structured memories that capture generalizable reasoning strategies instead of only recording full action traces. Those memories are retrieved before action, updated after the agent finishes a task, and used to support test-time self-evolution in agentic workflows.

Key Features

  • Structured memory items: Each memory includes a title, a short description, and distilled content, making stored experience easier to reuse than a raw trajectory.
  • Retrieval before action: The agent queries ReasoningBank before acting so relevant past strategies can shape the next attempt.
  • Extraction from both success and failure: The framework turns successful runs into reusable tactics and failed runs into cautionary lessons and counterfactual signals.
  • Closed-loop retrieval, extraction, and consolidation: ReasoningBank is built as a continuous memory workflow that updates after each interaction.
  • Self-judgement with an LLM-as-a-judge: The system can assess trajectories and extract insights even when the judgement is not perfectly accurate.
  • Memory-aware test-time scaling: ReasoningBank can use multiple exploration trajectories to distill stronger memories from inference-time search and self-contrast.

How to Use ReasoningBank

A typical workflow starts by attaching ReasoningBank to an agent that performs tasks such as web browsing or software engineering. Before each action, the agent retrieves relevant memories from the bank and uses them as context.

After the task, the agent evaluates the trajectory, extracts useful strategies or failure reflections, and appends them as new structured memories. Over time, this creates a repository of general lessons that the agent can reuse on later tasks.

Use Cases

  • Web browsing agents: Use past browsing experiences to avoid repeated navigation mistakes and to reuse effective search or page interaction strategies.
  • Software engineering agents: Capture lessons from codebase exploration, debugging, and task completion so the agent can work more effectively across repeated assignments.
  • Persistent task automation: Support agents that run continuously and need to improve as they encounter new workflows and edge cases.
  • Inference-time exploration: Distill multiple candidate trajectories into memories when using test-time scaling methods.
  • Failure analysis for agents: Turn unsuccessful attempts into guardrails, such as avoiding traps that caused loops or missed steps.

FAQ

What kind of memory does ReasoningBank store? It stores structured memories that summarize reasoning strategies, decision rationales, and operational insights, rather than only keeping full action logs.

Does it learn only from successful runs? No. A central part of ReasoningBank is that it also analyzes failed experiences and turns them into preventive lessons.

Does the system require perfect self-evaluation? No. The source notes that the framework is robust even when the LLM-based judgement is not perfectly accurate.

What tasks was it evaluated on? The source says it was evaluated on web browsing and software engineering benchmarks.

Is ReasoningBank a standalone model? No. It is described as an agent memory framework that works with an agent during test time.

Alternatives

  • Trajectory memory systems: These store detailed action histories, which can preserve more raw context but may not distill higher-level strategies as directly.
  • Workflow memory systems focused on successful runs: These summarize only successful workflows, which can be simpler but may miss learning signals from failures.
  • General agent memory layers: Broader memory systems for agents may emphasize retrieval of past interactions, but not necessarily structured reasoning extraction from both success and failure.
  • No-memory agent setups: Agents without persistent memory are simpler to implement but will not accumulate reusable lessons across tasks.