Introducing Mind Lab: Building AI that Learns from Real Experience

Why we started Mind Lab

Over the last decade, progress in AI has been driven by scale. We made models bigger, datasets larger, and training runs longer. It worked.

Today, we have powerful open-source models with trillions of parameters that can write code, summarize documents, and pass standardized exams. For many tasks, you can simply plug in a pretrained model and get something surprisingly good.

But as we deploy these systems into real products, a new bottleneck has become obvious: there is a growing gap between what models know and how they grow. Most models are still trained once, offline, on a static dataset. After that, they’re essentially frozen. They don’t really learn from their own usage, they repeat the same mistakes, and they fail to adapt to evolving users.

Mind Lab exists to close that gap.

We focus on systems that keep improving from real-world experience — not just from ever-larger pretraining corpora. We call this focus Experiential Intelligence: the study and engineering of AI systems whose primary source of improvement is the ongoing stream of interactions with the world.

From Static “Brains” to Adaptive “Minds”

Pretraining is an incredible way to build what we call a brain: a compact, static model of patterns in data. In products today, that’s often where the story ends. Every interaction is a one-way street. The model sees the world, reacts, and then immediately forgets.

This “brain-only” setup is powerful, but it has a very specific shape. All of the learning happens before deployment, in one giant batch. After that, the system is essentially a fixed function. It can simulate many behaviors, but it does not change its own behavior based on what it experiences with you.

A mind, as we use the term, is different. A mind is not just a repository of knowledge; it is a process that maintains and updates a view of the world, of itself, and of what “better” means over time. It treats each interaction not just as a request to satisfy, but as evidence that can refine how it will act in the future.

Concretely, we think a system starts to deserve the name mind when it has at least four things:

  • Internal models of the world and of itself

    A mind doesn’t see each input as an isolated prompt. It maintains latent state about what environment it is in, what the user is trying to do, what tools are available, and what its own strengths and limitations are. These internal models let it form expectations, detect surprises, and reason about cause and effect rather than just local token patterns.

  • Values that define what “better” means

    A mind has a value function: a notion of which outcomes are preferable, given limited time, compute, and risk. In a product, that might mean prioritizing task success and user satisfaction over verbosity or novelty. Without such values, a system can generate plausible answers but cannot consistently choose good actions when there are trade-offs, uncertainty, or delayed consequences.

  • Mechanisms for adaptation through experience

    A mind doesn’t freeze after pretraining. As it interacts with the world, it updates its internal models and its policy. If a particular strategy keeps failing for a given workflow, it gradually stops using that strategy there. If a user consistently corrects a certain type of mistake, the system adjusts how it handles similar situations in the future. The key is that past episodes leave a trace that shapes future behavior.

  • A social interface: metacognition in a human environment

    Minds don’t reason in a vacuum. They operate among users and other agents, and they know it. That means having some awareness of their own uncertainty (“I’m not confident about this answer”), exposing that uncertainty in the interaction (“I can propose two options with different trade-offs”), and adapting to norms and preferences over time. This social layer is what allows an AI system to become a reliable collaborator rather than just a silent function call.

When you put these ingredients together, the behavior of the system changes in kind, not just in degree. A brain-like model will apologize for the same mistake again and again; a mind-like system will reorganize its own expectations so that the mistake becomes less likely. A brain treats your corrections as one-off events; a mind treats them as training signals. Brains give you a snapshot of capability. Minds define a trajectory: how the system will grow with you as it accumulates experience.

Designing for minds instead of brains is not just about writing a different loss function. It forces us to rethink the whole loop: how products surface meaningful feedback, how infrastructure turns that feedback into updates, and how algorithms keep the system both adaptive and safe as it learns. That end-to-end challenge is the core of what we work on at Mind Lab.

How We Build Minds in Practice

Research–Product Co‑Design

At Mind Lab, research and product share the same loop. The systems we study are the systems that serve real users, and the data that drives our experiments comes from actual usage rather than synthetic scripts.

When we design a feature, we think at the same time about the user experience, the feedback signals the agent will see, and the mechanism by which those signals will influence future behavior. A feature is interesting to us if it helps people and also produces clear, interpretable observations about what the agent did and how well it worked.

In practice, we:

  • instrument interactions so that task outcomes, user edits, and preferences can be turned into training and evaluation data;
  • maintain pipelines that transform raw logs into structured episodes suitable for reinforcement learning and related methods;
  • integrate policy updates into our normal deployment process, with safety checks and monitoring.

This co‑design makes it possible to study learning dynamics directly in the environments where the agents are expected to operate.

Phase I: Scaling Agentic RL

Our current phase of work is organized into three connected tracks:

  1. Efficient Real‑World Learning Infrastructure

    We are building and operating infrastructure that makes “learning in the wild” practical: training large models, managing data and experiment pipelines, and handling the scheduling and orchestration needed for continuous RL and fine‑tuning on live signals instead of one‑off runs.

  2. Algorithm Research for Generalization

    We are developing and implementing methods that let RL agents truly adapt, including mechanisms for refining world and self models, more effective use of value functions for sample‑efficient learning, and deeper integration of memory as part of the learning loop rather than a separate store.

  3. Online Learning & Evaluation

    We are running experiments on learning in live products while keeping behavior stable, designing evaluation protocols and safeguards so that models can grow from new experience without catastrophic forgetting or unexpected regressions when research meets real deployment.

A Small Example: Learning HTML Aesthetics

One simple illustration of this approach comes from front‑end layout generation experiment. Large language models can already produce HTML and CSS, but the visual quality of the results is uneven: spacing, alignment, and hierarchy often feel wrong to human designers.

A natural approach is to collect a batch of human preference data, train a reward model once, and then optimize the generator against that fixed score. In practice, we found that this offline setup encourages reward hacking: as the policy learns to exploit quirks of the static reward, its ELO rating against fresh human comparisons declines. The agent becomes good at pleasing the proxy, not at producing layouts people genuinely prefer.

Code_Generated_Image.png

To address this, we moved to a streaming reward model. Using our live infrastructure, we continuously update the reward model on fresh, on‑policy feedback from the agent’s latest outputs and real usage, and train the policy against this evolving signal. In our internal evaluations, the policy trained with the streaming reward model shows a rising ELO score, while the one trained on a fixed reward model steadily loses ground. The task is narrow, but it makes the point we care about: when the reward is kept online and tied to live feedback, optimization improves alignment with human preference instead of working against it.

Looking Ahead

We started Mind Lab from a simple observation: powerful models are no longer the bottleneck; the real challenge is helping them grow from experience. Our work is about turning real products into places where that growth can happen safely and systematically.

In the coming months, we’ll share more about the infrastructure we are building, the algorithms we are testing in live settings, and what we are learning from these deployments. If you are interested in agents that improve through use, or you are building products where this kind of learning matters, we’d be happy to connect.

Welcome to the era of experiential intelligence. Welcome to Mind Lab.

Author

Mind Lab

Core Contributors

Pony Ma, Rio Yang, Qihan Liu, Kaijie Chen, Andrew Chen

Team

Kaijie Chen, Andrew Chen, Songlin Jiang, Yuhua Jiang, Xiang Lei, Guanming Liu, Qihan Liu, Yiwen Lu, Pony Ma, Alex Yin, Rio Yang and Mindverse Team

Acknowledgement

Special thanks to Gao Huang, Hao Sun, Yang Yue, Shunyu Yao, Qichen Zhao for their valuable feedback on this blog.

Names are listed alphabetically within team and acknowledgement.

Citation

Please cite this work using the BibTeX citation:

@misc{pony2025exploring, author = {Pony Ma and Rio Yang and Qihan Liu and Kaijie Chen and Andrew Chen and {Mind Lab}}, title = {Building AI that Learns from Real Experience}, year = {2025}, howpublished = {Mind Lab: A Lab for Experiential Intelligence}, note = {https://macaron.im/mindlab/blog/building-ai-that-learns-from-real-experience} }

Mind Lab © 2025 · contact@mindlab.ltd