MindClaw: Fine-Tuning OpenClaw for Personalized Long-Term Memory

March 16, 2026

OpenClaw helped make the agent paradigm legible to a much broader audience [1]. It made it natural to think of AI not as a one-shot response engine, but as a system that can act, accumulate traces, and evolve through use.

At Mind Lab, we have been pursuing two closely related questions. The first is how to build AI systems that improve from real experience rather than remaining frozen after deployment. The second is how to move from context engineering to context learning, so that useful context does not remain external scaffolding forever [2]. MindClaw sits at the intersection of those two threads.

Try it now in our preview and see how it works.

Where Prompt-Space Growth Starts to Break

A large part of OpenClaw's current growth still happens in prompt space. Skills are generated and stored in context. Memory is recovered by retrieving text back into context. This works unusually well in the cold-start phase. The system can feel adaptive, personal, and stronger than its base model would suggest.

But a system that grows mainly by accumulating more skills and more memory in prompt space also accumulates more noise.

Two failure modes appear repeatedly.

The first is skill drift. The system collects many skills that do not actually work. Some are directionally correct but operationally vague. Some are redundant. Some are never triggered. Some only look plausible because they were generated by stronger models.

The second is retrieval dependence. The model does not truly own those skills. Whether it can find the right one at the right moment depends on retrieval quality, prompt layout, context budget, and transient model state. This is not parametric memory. It is memory by re-insertion.

This is the same concern behind our earlier argument for context learning [2]. Context can improve the current trajectory. But if those gains are never internalized into parameters, they do not become durable capability. They remain external support, and the burden of carrying them forward keeps increasing.

Why Personal OpenClaw Needs Parametric Memory

We believe a personal agent eventually needs parametric memory.

If an agent really becomes adapted to your work, it should not need to reconstruct your recurring preferences, workflows, and decision habits from scratch on every important trajectory. Retrieval still matters, but retrieval should be supplemental. It should not carry the full burden of long-term personalization.

This is why we see LoRA RL as central to Personal OpenClaw. LoRA makes parameter updates light enough to run continuously in real systems rather than only in expensive offline retraining cycles [6].

Real usage produces the signals that matter most: task traces, repeated corrections, successful routines, failure modes, and timing patterns that only appear in deployment. If all of that remains external text, the system's long-term behavior remains bottlenecked by retrieval. If those signals can instead be written into parameters through LoRA RL, the system starts to change in a more lasting way. It becomes increasingly shaped by the work it actually does.

A simple way to frame the distinction is this:

Context engineering helps the model do better on this turn.
Context learning helps the model become better on future turns.

The first improves local performance. The second creates accumulation.

MindClaw as an Online Learning System

MindClaw is our current online implementation of that idea. The aim is straightforward: make OpenClaw-style experience accumulation become more than an ever-growing context window.

MindClaw has two layers.

The agent layer

At the agent layer, MindClaw uses MetaClaw [3].

MetaClaw provides a practical learning loop around a deployed agent: online conversation collection, skill injection, post-failure skill evolution, and RL sample collection. It is the agent-facing layer that connects everyday usage with a structured learning loop.

The infrastructure layer

At the infrastructure layer, MindClaw uses MinT [4].

Once the goal shifts from "put more skill into context" to "internalize skill into parameters," the bottleneck is no longer just the agent framework. It becomes an infrastructure problem. You need a system that can turn live experience into stable LoRA RL updates. You need to manage training state, connect rollout and optimization, recover from failures, and keep the loop usable under real product constraints.

That is the problem MinT was built to address. When we introduced MinT, our argument was simple: many teams already possess the most valuable asset, which is real experience. They have product traces, domain workflows, recurring user behavior, and failure modes that only appear in practice. What they often lack is a practical way to turn that experience into steady model improvement. Without that infrastructure, experience remains a log. It does not become intelligence.

In MindClaw, MinT is not merely a backend that happens to run training jobs. It is the infrastructure that makes the parameter-update loop practical over time.

What We Are Trying to Close

The gap we care about is simple.

The system can keep generating more skills, but the model may still fail to learn how to use them reliably.

If that gap remains open, the system drifts toward a kind of pseudo-evolution. The skill library gets larger. Prompts get longer. Retrieval gets more complex. But the agent does not become proportionally more stable. It becomes more dependent on finding the right external text at the right moment.

This is why we care less about producing more skills and more about increasing the probability that useful skills are actually learned, actually triggered, and gradually internalized. SkillRL is a useful reference point here: skills should not be treated as isolated text artifacts, but as part of the learning loop [5].

Combined with a meta-learning perspective, the picture becomes clearer:

At the fast timescale, the system can continue generating and refining skills.
At the slower timescale, high-value skills should be written into parameters through LoRA RL.

For users, the difference is concrete. If skills only exist in context, the experience is often that the system sometimes remembers to use them. If useful skills are gradually written into parameters, the experience becomes different: the system becomes more consistent, more personal, and less dependent on reconstructing the same capability from prompt space again and again.

Why LoRA RL Matters

The practical reason is economics.

If long-term memory requires full-parameter training, this path does not become a product. If long-term memory depends only on retrieval and context stuffing, internal noise keeps growing. LoRA RL offers a middle path: lower training cost, faster updates, and a feasible continuous improvement loop, while the result lives in parameters rather than only in external text [6].

This is why we see LoRA RL as more than a cheaper post-training method. It changes the feasibility boundary of long-term personalization. Only when parameter updates become operationally light enough does Personal OpenClaw become more than a one-time configuration trick.

From this perspective, MindClaw is not mainly about adding more skills. It is about making high-value skills become parametric memory.

From Generally Useful Agents to Personal Ones

MindClaw is our current answer to this direction. The point is not simply that we built another OpenClaw-style system. The point is that a personal agent cannot remain forever at the context layer. It has to gradually acquire parametric memory, adapt more stably to your work, and accumulate experience in a form that compounds over time.

In our view:

MetaClaw provides the agent-layer learning loop.
MinT provides the RL infrastructure that makes parameter-level adaptation practical over time.
LoRA RL provides the bridge that turns recurring experience into durable capability.

MindClaw is our current answer to that direction. We will continue sharing what we learn from this system, especially around which skills should remain external, which should be internalized, and how personal memory can become a stable capability rather than an accumulating source of prompt noise.

References

[1] OpenClaw: Your own personal AI assistant (Peter et al, 2026)

[2] From Context Engineering to Context Learning (Ma et al, 2026)

[3] MetaClaw (Aiming Lab et al, 2026)

[4] MinT: RL Infrastructure for Experiential Intelligence (Lu et al, 2026)

[5] SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning (Xia et al, 2026)

[6] LoRA Without Regret (Schulman et al, 2025)

Author

Mind Lab

Core Contributors

Lucian Li, Qihan Liu, Song Cao, Ruijian Ye, Andrew Chen, Pony Ma

Team

Andrew Chen, Kaijie Chen, Song Cao, Nolan Ho, Songlin Jiang, Fancy Kong, Jingdi Lei, Xiang Lei, Lucian Li, Qihan Liu, Tianchen Li, Yiwen Lu, Pony Ma, Wenbin Wang, Alex Yin, Rio Yang, Ruijian Ye, Di Zhang, Conley Zhao, Congjie Zheng and Mindverse Team

Names are listed alphabetically within team.

Citation

Please cite this work using the BibTeX citation:

@misc{li2026mindclaw,
  author = {Lucian Li and Qihan Liu and Song Cao and Ruijian Ye and Andrew Chen and Pony Ma and {Mind Lab}},
  title = {MindClaw: Fine-Tuning OpenClaw for Personalized Long-Term Memory},
  year = {2026},
  howpublished = {Mind Lab: A Lab for Experiential Intelligence},
  note = {https://macaron.im/mindlab/research/mindclaw-fine-tuning-openclaw-for-personalized-long-term-memory}
}

Share to