Exploring Agentic Memory beyond Reasoning and Tool-Use

December 3, 2025

Traditional memory mechanisms in AI agent systems typically fall into two categories. The first is reasoning-based memory [1], where the model actively summarizes memory segments after each conversational turn. Conceptually, this mirrors the reasoning process: information is reconsidered, recomposed, and stored as a summary. While intuitive, repeated summarization is computationally costly, and critical details often degrade over successive turns.

The second approach is tool-use-based memory [2,3]. Here, memory is stored in external databases. When recall is needed, the model queries this storage and retrieves relevant interactions. Although easy to integrate, this often leads to fragmented understanding, as the retrieval and reintegration process can strip away crucial nuance and context.

We have developed a fundamentally different approach - instead of treating memory as a separate storage task, we view the entire trajectory as the memory itself, managed through a continuous process of intelligent forgetting. Our method operates in three steps: Mask, Allocate, and Refill.

Mask: We select chunks of the agent’s trajectory and mask them out, creating space for re-processing.
Allocate: We assign a token budget to the masked chunk based on its estimated importance. High-value segments receive larger budgets to preserve detail, while less critical chunks are compressed or discarded.
Refill: Each masked chunk is regenerated under its assigned constraint, producing a compressed representation that fits the allocated budget.

This cycle allows the system to make i.i.d. decisions on what to prune, strictly adhering to context budget constraints. While currently implemented via autoregressive fill-in-the-middle (FIM), this process is inspired by how humans forget wisely—instinctively discarding irrelevant details (like a billboard passed while driving) to retain meaningful experiences. Thus, we term this paradigm Memory Diffusion. It equips agents to dynamically refine their context window, maintaining $O(1)$ time complexity relative to trajectory length.

This introduces a new class of sequence augmentation. Standard autoregressive modeling focuses on two primary operations:

Reasoning: Generating internal thought traces (e.g., within <think> tags) to guide output.
Execution: Appending observation outputs following tool invocations.

We propose a third trajectory operation: actively modifying the sequence to optimize context for future autoregression. By treating "forgetting" as a recurring, parallelizable operation throughout the trajectory, we achieved the best performance in training-free comparisons.

Through intensive engineering, we achieved state-of-the-art (SOTA) results on the Locomo benchmark [4] with 93% accuracy (excluding adversarial cases).

"Human thought naively feels a bit more like autoregression but it's hard to say that there aren't more diffusion-like components in some latent space of thought." — Andrej Karpathy

Looking ahead, we view Diffusion Language Models (DLMs) [5,6] as the ideal architectural fit for this paradigm. The bidirectional denoising and masking mechanisms native to DLMs align perfectly with our Mask–Allocate–Refill view of memory. We are currently training diffusion-based language models inside a full RL loop, making diffusion a model-native memory mechanism for more grounded and efficient agents.

We also release a live demo of this algorithm as an early research preview:

Visualization may not exactly reflect the underlying algorithm.

References

[1] MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent (Yu et al, 2025)

[2] MemGPT: Towards LLMs as Operating Systems (Packer et al, 2023)

[3] Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory (Chhikara et al, 2025)

[4] Evaluating Very Long-Term Conversational Memory of LLM Agents (Maharana et al, 2024)

[5] Diffusion-LM Improves Controllable Text Generation (Li et al, 2022)

[6] Planning with Diffusion for Flexible Behavior Synthesis (Janner et al, 2022)

Author

Mind Lab

Core Contributors

Alex Yin, Rio Yang, Pony Ma, Andrew Chen

Team

Andrew Chen, Kaijie Chen, Steven Chiang, Yuhua Jiang, Xiang Lei, Guanming Liu, Qihan Liu, Yiwen Lu, Pony Ma, Alex Yin, Rio Yang and Mindverse Team

Acknowledgement

Special thanks to Yizhou Zheng for their valuable feedback on this blog.

Names are listed alphabetically within team and acknowledgement.

Citation

Please cite this work using the BibTeX citation:

@misc{yin2025exploring,
  author = {Alex Yin and Rio Yang and Pony Ma and Andrew Chen and {Mind Lab}},
  title = {Exploring Agentic Memory Beyond Reasoning and Tool-Use},
  year = {2025},
  howpublished = {Mind Lab: A Lab for Experiential Intelligence},
  note = {https://macaron.im/mindlab/research/exploring-agentic-memory-beyond-reasoning-and-tool-use}
}

Share to