MinT: RL Infrastructure for Experiential Intelligence

Today we are launching Mind Lab Toolkit (MinT), a managed service for post training and reinforcement learning on a wide range of open models from Qwen3-0.5B to 1T Kimi-K2.

We believe the most important breakthroughs in AI will come from reinforcement learning — but only if we remove the friction of infrastructure. With MinT, users can focus on high-level data, training and sampling configurations while the API abstracts away the underlying infrastructure complexities (hardware, communication, scaling, acceleration, etc.)

Originally developed for our internal R&D, we are now opening this toolkit to the public to catalyze the development of experiential intelligence both within our lab and across the global research and innovation community.

Many teams already have the asset that matters most: proprietary experience – product traces, domain workflows, expert preferences, and the failures that only show up in real usage. What most teams do not have is a practical way to turn that experience into models that improve steadily in the scenario they care about. Large scale reinforcement learning is still inaccessible for many companies and research groups because of infrastructure complexities.

MinT closes that gap. It makes serious reinforcement learning affordable to try, straightforward to run, and easy to repeat. When you can close the loop, your data stops being a static storage and starts becoming a compounding advantage.

Why we built MinT

We believe the next stage of AI will be decided by learning from real experience. Not just learning once on idealized corpora, but learning from real tasks, real users, and long horizon goals.

This kind of learning has been infeasible, requiring a distributed infrastructure that supports strong base models and can collect experience, schedule training, manage model states, evaluate updates, and keep everything reproducible. Without that, reinforcement learning stays confined to small experiments or to a handful of organizations with specialized infrastructure.

MinT is our attempt to make this practical. It is a reusable foundation for Experiential Intelligence, where real world complexity becomes a training resource rather than an engineering tax.

What MinT is

MinT is an abstraction model that decouples training logic from infrastructure complexity.

With MinT, you define the loop at the level that matters:

  1. What to train: choose the model you want to adapt.
  2. What data to learn from: provide datasets, environments, or experience sources.
  3. How to learn: configure your post training or reinforcement learning method.
  4. How to evaluate: specify the metrics and success signals that reflect your objective.

image.png

MinT takes care of the rest. It schedules compute, runs distributed jobs, manages model state, and handles failure recovery on the backend. Your team does not need to find GPUs or operate clusters to run training loops.

MinT is also fully compatible with the Tinker API. If you already have code written against that interface, migration is designed to be frictionless.

What you get with MinT

MinT is built to help teams start running large model reinforcement learning quickly and keep iterating.

You get a managed training service through a Python SDK, with low level primitives that let you express common post training workflows. You can validate your setup economically, then scale up when you are ready.

Out of the box, MinT provides:

  • A low barrier path to your first closed loop

    You can follow the quickstart to run a complete minimal loop that trains and then refines a model.

  • A direct bridge from training to inference

    After training, you can materialize weights and create sampling clients from the result, so evaluation and iteration stay tightly connected.

  • Built in support for modern open model families

    MinT supports Qwen3, DeepSeek V3, Kimi K2 families and a growing set of multimodal and robotics oriented models such as Qwen3 VL and π0.

  • Compatibility and migration tooling

    MinT is API compatible with Tinker, and we also provide a migration Claude Code Skill to help convert existing training code from verl, TRL, OpenRLHF, or custom PyTorch loops into MinT interfaces.

They are already using MinT

MinT is already powering work across research labs and startups:

  • Leap Lab, Tsinghua University: using MinT to explore whether reinforcement learning can push beyond the knowledge boundaries of base models.
  • RoPL Lab, SJTU and SII: investigating how reinforcement learning can enhance embodied decision-making foundation models and decision world models.
  • EigenAI: exploring the use of MinT and Data Agent synthetic data to conduct agentic RL training on 1T models.
  • Maschine Robot: using MinT to support their brain–computer interface agent, enabling affective conversational interaction.
  • Mindical Health: employing MinT for RL‑based post‑training of medical coding models, significantly improving accuracy and successfully deploying the solution in dozens of top-tier hospitals.

A note on scale and cost

MinT grew out of our own experience building and iterating Macaron - world’s first personal agent loved by millions of users. Keeping an agent product improving over time leaded us to building the solid infrastructure that MinT now exposes as a service.

As a part of this effort, Mind Lab completed what we believe is the first end to end LoRA based reinforcement learning on trillion parameter models. We are sharing that result because it is a concrete proof point for two things. First, the platform is built against real constraints. Second, the economics of large scale learning can be changed through parameter efficient adaptation, which is why MinT can make reinforcement learning practical for far more features, products and teams than traditional full parameter training.

For more technical details:

Schedule a demo to onboard MinT via the link below or by emailing contact@mindlab.ltd.

MinT is just the beginning

MinT is our first step toward a broader toolkit for learning from real experience. We will share more best practices for product-driven learning loops, including templates for data collection, evaluation, and deployment.

We will also build more "out of the box" workflows so teams can run reinforcement learning loops across more environments and model families with less setup. We will continue expanding our research agenda with MinT, especially on how parameter efficient fine-tuning can support broader forms of learning, including memory, agents and reward learning.

Get started with MinT and join us in turning real product experience into the driving force for ever-evolving intelligence.

Author

Mind Lab

Core Contributors

Yiwen Lu, Xiang Lei, Yushen Li, Songlin Jiang, Qihan Liu, Kaijie Chen, Andrew Chen, Pony Ma

Team

Kaijie Chen, Andrew Chen, Songlin Jiang, Yuhua Jiang, Xiang Lei, Guanming Liu, Qihan Liu, Scott Liu, Yiwen Lu, Pony Ma, Alex Yin, Rio Yang and Mindverse Team

Acknowledgement

Special thanks to our early access partners for their valuable feedback on MinT.

Names are listed alphabetically within team and acknowledgement.

Citation

Please cite this work using the BibTeX citation:

@misc{yiwen2026announcing, author = {Yiwen Lu and Xiang Lei and Yushen Li and Songlin Jiang and Qihan Liu and Kaijie Chen and Andrew Chen and Pony Ma and {Mind Lab}}, title = {MinT: RL Infrastructure for Experiential Intelligence}, year = {2026}, howpublished = {Mind Lab: A Lab for Experiential Intelligence}, note = {https://macaron.im/mindlab/research/mint-rl-infrastructure-for-experiential-intelligence} }
Share to
FacebookLinkedInX

Mind Lab © 2025 · contact@mindlab.ltd