ReasoningBank
ReasoningBank is an agent memory framework that learns from successes and failures, helping deployed agents improve at test time for web browsing and software engineering.
What is ReasoningBank?
ReasoningBank is a novel agent memory framework that helps deployed agents learn from both successful and failed experiences. It is designed for long-running agents that need to improve over time rather than treat each task as an isolated attempt.
The framework stores structured memories that capture generalizable reasoning strategies instead of only recording full action traces. Those memories are retrieved before action, updated after the agent finishes a task, and used to support test-time self-evolution in agentic workflows.
Key Features
- Structured memory items: Each memory includes a title, a short description, and distilled content, making stored experience easier to reuse than a raw trajectory.
- Retrieval before action: The agent queries ReasoningBank before acting so relevant past strategies can shape the next attempt.
- Extraction from both success and failure: The framework turns successful runs into reusable tactics and failed runs into cautionary lessons and counterfactual signals.
- Closed-loop retrieval, extraction, and consolidation: ReasoningBank is built as a continuous memory workflow that updates after each interaction.
- Self-judgement with an LLM-as-a-judge: The system can assess trajectories and extract insights even when the judgement is not perfectly accurate.
- Memory-aware test-time scaling: ReasoningBank can use multiple exploration trajectories to distill stronger memories from inference-time search and self-contrast.
How to Use ReasoningBank
A typical workflow starts by attaching ReasoningBank to an agent that performs tasks such as web browsing or software engineering. Before each action, the agent retrieves relevant memories from the bank and uses them as context.
After the task, the agent evaluates the trajectory, extracts useful strategies or failure reflections, and appends them as new structured memories. Over time, this creates a repository of general lessons that the agent can reuse on later tasks.
Use Cases
- Web browsing agents: Use past browsing experiences to avoid repeated navigation mistakes and to reuse effective search or page interaction strategies.
- Software engineering agents: Capture lessons from codebase exploration, debugging, and task completion so the agent can work more effectively across repeated assignments.
- Persistent task automation: Support agents that run continuously and need to improve as they encounter new workflows and edge cases.
- Inference-time exploration: Distill multiple candidate trajectories into memories when using test-time scaling methods.
- Failure analysis for agents: Turn unsuccessful attempts into guardrails, such as avoiding traps that caused loops or missed steps.
FAQ
What kind of memory does ReasoningBank store? It stores structured memories that summarize reasoning strategies, decision rationales, and operational insights, rather than only keeping full action logs.
Does it learn only from successful runs? No. A central part of ReasoningBank is that it also analyzes failed experiences and turns them into preventive lessons.
Does the system require perfect self-evaluation? No. The source notes that the framework is robust even when the LLM-based judgement is not perfectly accurate.
What tasks was it evaluated on? The source says it was evaluated on web browsing and software engineering benchmarks.
Is ReasoningBank a standalone model? No. It is described as an agent memory framework that works with an agent during test time.
Alternatives
- Trajectory memory systems: These store detailed action histories, which can preserve more raw context but may not distill higher-level strategies as directly.
- Workflow memory systems focused on successful runs: These summarize only successful workflows, which can be simpler but may miss learning signals from failures.
- General agent memory layers: Broader memory systems for agents may emphasize retrieval of past interactions, but not necessarily structured reasoning extraction from both success and failure.
- No-memory agent setups: Agents without persistent memory are simpler to implement but will not accumulate reusable lessons across tasks.
Alternatives
AakarDev AI
AakarDev AI is a powerful platform that simplifies the development of AI applications with seamless vector database integration, enabling rapid deployment and scalability.
Arduino VENTUNO Q
Arduino VENTUNO Q is an edge AI computer for robotics, combining AI inference hardware and a microcontroller for deterministic control. Arduino App Lab-ready.
Devin
Devin is an AI coding agent that helps software teams complete code migrations and large refactoring by running subtasks in parallel.
Lasso
Lasso is an AI-first PIM for ecommerce teams that enriches product attributes and descriptions, processes supplier data, and monitors competitors via app or API.
Codex Plugins
Use Codex Plugins to bundle skills, app integrations, and MCP servers into reusable workflows—extending Codex access to tools like Gmail, Drive, and Slack.
Struere
Struere is an AI-native operational system that replaces spreadsheet workflows with structured software—dashboards, alerts, and automations.