From Static Models to Adaptive Agents: Innovations in Tinker and Mind Lab

Author: Boxu LI

In the evolving landscape of artificial intelligence, where pretraining at extreme scales has yielded formidable static capabilities, the frontier now shifts from building ever-larger static models to creating agentic systems – AI agents that can reason deeply, use tools, see and remember, and continuously learn from experience[1].

Thinking Machines Lab's Tinker platform, with its recent general availability announcement on December 12, 2025, represents a pivotal infrastructural leap, democratizing access to fine-tuning and multimodal extension of trillion-parameter models. Concurrently, Mind Lab— the research division of Macaron AI—articulates a philosophical and technical framework for "experiential intelligence," wherein models transition from frozen repositories of knowledge to dynamic processes that refine themselves via real-world feedback. This convergence offers profound opportunities for refining the co-design of research and product, closing the loop between algorithmic innovation and deployed adaptation.

Key Innovations in Tinker's Updates

  • Thinking Machines Lab's Tinker platform achieves general availability, supporting fine-tuning of Moonshot AI's trillion-parameter Kimi K2 Thinking MoE model, OpenAI-compatible inference, and multimodal inputs via Alibaba's Qwen3-VL series.
  • These enable efficient customization of frontier reasoning and vision-language models, with demonstrations showing superior few-shot performance in image classification.
  • Mind Lab (Macaron AI's research arm) advances scalable LoRA-based RL on similar trillion-scale MoE models, emphasizing experiential adaptation.

In this post, we’ll dive into Tinker’s new Kimi K2 reasoning model, OpenAI-compatible interface, and Qwen3-VL vision models, then explore Mind Lab’s philosophy of experiential intelligence, their trillion-parameter reinforcement learning (RL) breakthroughs, memory diffusion approach, and the strategic implications for building the next generation of AI systems.

Tinker’s Latest Innovations: Reasoning, Tools, and Vision

Tinker is an AI training platform designed to let researchers fine-tune and deploy cutting-edge models without worrying about infrastructure[2][3]. In December 2025, Tinker announced several major updates that bolster the reasoning capabilities, tool use, and vision understanding of AI models[4]:

  • Kimi K2 Thinking Model: Users can now fine-tune Kimi K2 Thinking, a colossal 1-trillion-parameter model and the largest in Tinker’s lineup[5]. Kimi K2 is a Mixture-of-Experts (MoE) transformer designed for lengthy chain-of-thought reasoning and agentic tool use[6]. Despite its scale, only a subset (~32B) of its parameters are active at a time, allowing it to achieve state-of-the-art reasoning performance while keeping inference efficient[7]. This open model – described as “open agentic intelligence” – rivals or surpasses many closed models on complex reasoning benchmarks[7]. By supporting Kimi K2 on Tinker, Thinking Machines enables researchers to leverage an advanced reasoning engine for tasks that demand multi-step logic, planning, or external tool calls. Importantly, Tinker fine-tunes such models using LoRA (Low-Rank Adaptation), training small adapter matrices instead of updating all trillion weights[8]. This approach significantly reduces the memory and compute needed for customization. In fact, internal studies found that with the right setup, LoRA can match the learning performance of full fine-tuning while using far less resources[9]. In practice, that means users can adapt a giant model like Kimi K2 to new tasks or domains without prohibitive cost – a crucial step for more efficient reasoning workflows.
  • OpenAI API-Compatible Inference: To accelerate research–product integration, Tinker introduced an inference interface that is compatible with OpenAI’s API for completions[10]. Essentially, one can query a Tinker-hosted model using the same API calls that OpenAI’s platform uses, by specifying a model path with a special tinker:// URI. For example, developers can call the Tinker model’s completion API with an OpenAI-like syntax (model, prompt, max_tokens, etc.) and get results as if they were calling openai.Completion.create[10]. This plug-and-play compatibility means any tooling or application built around the OpenAI API can seamlessly integrate Tinker’s models[10]. It lowers friction for adopting advanced open models in real products: you could fine-tune Kimi K2 on Tinker, then drop it into an existing chain-of-thought agent or chatbot framework with minimal code changes. Moreover, Tinker’s API scaffolding even allows sampling from a model while it’s still training[10] – enabling interactive evaluation or tool-augmented training loops where a model can be tested and used in parallel with its fine-tuning process. This update supports more efficient agent development workflows, letting researchers continuously integrate and test model improvements in realistic settings.
  • Qwen3-VL Vision–Language Models: Another major addition to Tinker is support for multimodal vision-language models. The platform added two vision-enabled models, Qwen3-VL-30B and Qwen3-VL-235B, which can accept image inputs alongside text[11]. These models (30 billion and 235 billion parameters respectively, both MoE architectures) are instruction-tuned to follow prompts that include images, e.g. answering questions about a diagram or interpreting a photo. With simple API calls, users can now feed an image (as an ImageChunk) interleaved with text into the model and get a language response[12]. This unlocks a variety of vision-informed applications – from analyzing screenshots and charts to multimodal assistants that see and talk. Notably, Qwen3-VL models were designed with data-efficient vision capabilities in mind. To illustrate this, Thinking Machines fine-tuned the 235B Qwen3-VL model on a few classic image classification tasks (Caltech101, Oxford Flowers, etc.), using LoRA adapters for efficiency[13]. They compared its performance to a strong vision-only baseline (DINOv2 ViT model with a classifier head), across varying amounts of training data per class[14].

[15] Comparison of fine-tuned Qwen3-VL-235B (vision-language model) vs. DINOv2 (vision-only baseline) on image classification tasks with limited labeled examples. Qwen3-VL achieves higher accuracy, especially in the low-data regime (far left), thanks to its language-informed visual understanding.

Even with only one example per class, the 235B Qwen3-VL model attained reasonable accuracy, significantly outperforming DINOv2 in this extreme low-data regime[15]. As the number of examples increased, both models improved, but Qwen3-VL retained an edge, demonstrating stronger few-shot generalization[16]. The advantage comes from the model’s built-in language and world knowledge – for instance, Qwen3-VL already has a concept of what a “sunflower” or “golden retriever” looks like or is described as, by virtue of its multimodal pretraining[16]. This means it can recognize or categorize novel images with minimal new examples. In practical terms, Tinker’s users can achieve high accuracy on vision tasks with very small datasets by leveraging these large vision-language models. This data-efficient vision capability is crucial for real-world scenarios where labeled data is scarce. It also hints at the power of tool-augmented reasoning: a model that “sees” can leverage both visual cues and linguistic context, making it a more versatile agent (for example, reading a diagram and explaining it, or using an image as part of a reasoning chain). Overall, the addition of Qwen3-VL to Tinker extends the platform’s reach from pure text to the visual domain, enabling multi-modal reasoning workflows under the same unified training API.

Mind Lab’s Adaptive Systems: Experiential Intelligence in Action

On the research front, Mind Lab – a new frontier research lab affiliated with Macaron AI – is tackling the challenge of making AI agents truly adaptive and experiential. Mind Lab’s ethos is that “real intelligence comes from real experience, not just bigger pre-training”[17]. In other words, simply scaling up models on static datasets is not enough; the next leap in AI will come from systems that learn continually from interactions, much like humans accumulating experience. Mind Lab frames this vision as Experiential Intelligence – moving from static “brains” to adaptive “minds” that can form internal world models, update their knowledge through feedback, have explicit goals or values, and even reflect on their own actions[18]. This is a direct response to the limitations of current LLMs, which are often powerful but frozen after pre-training[18]. By introducing mechanisms for genuine adaptation – such as continual reinforcement learning and dynamic memory – Mind Lab aims to create agents that evolve with use.

Two core pillars of Mind Lab’s work are: (1) Efficient RL fine-tuning of massive models to instill new behaviors, and (2) Advanced memory systems that allow agents to retain and utilize long-term knowledge. Both are geared toward making AI more agentic (autonomously deciding and improving) and tightly coupling research advances with product deployment.

LoRA-Based Trillion-Parameter RL with 10% GPUs

How we Achieved this?

One of Mind Lab’s headline achievements is demonstrating reinforcement learning at trillion-parameter scale – and doing so in a practical, cost-effective way. In December 2025 they announced the first end-to-end RL pipeline on the 1.04T-parameter Kimi K2 reasoning model, achieved with only ~10% of the GPU resources that such training would normally require[19]. How was this possible? The team built a specialized training engine that combines parameter-efficient finetuning (LoRA) with hybrid parallelism across the model’s Mixture-of-Experts structure[20][21].

Instead of tuning all trillion weights, Mind Lab’s approach injects low-rank adaptation matrices into selected layers of Kimi K2 (both in the dense backbone and within expert layers) and updates only those during RL[22]. This dramatically reduces the number of trainable parameters (for example, a LoRA rank of a few tens or hundreds per layer, instead of full matrices) and hence cuts memory and compute usage by an order of magnitude. At the same time, training a model of this size requires distributing the workload across many GPUs efficiently. The team employed a hybrid-parallel strategy: a coordinated use of tensor parallelism, pipeline parallelism, expert parallelism (for the MoE experts), and sequence parallelism (for long sequence training), all made compatible with sharded LoRA updates[23]. In practice, this meant leveraging existing large-model training frameworks (NVIDIA’s Megatron and ByteDance’s VolcEngine RL), augmenting them to handle LoRA on MoE, and carefully balancing the computation across 64 GPUs in a cluster[24]. The result was stable on-policy RL training (akin to a PPO-style algorithm) on the full Kimi K2 model with a reward model providing feedback on reasoning quality[22] – something previously thought infeasible for most teams due to cost.

Equally important, it worked: the LoRA-finetuned Kimi K2 achieved significant improvements on long-horizon reasoning tasks, with smooth learning curves and no divergence[25]. Crucially, the adapted model retained the general skills of the base model (thanks to only minimal, focused weight changes) while gaining new task-specific behaviors[26]. This means the base model’s massive prior knowledge was not overwritten, only augmented – a key benefit of LoRA finetuning. In fact, Mind Lab’s experiments confirmed that larger models provide a stronger foundation for RL. Under a fixed training budget, a large model plus small LoRA adapters outperformed a smaller model trained with full tuning, both on in-domain tasks and transferring to new ones[27]. As the team puts it, RL is “prior-limited” – if the base model can’t generate high-quality trajectories to begin with, RL has little signal to amplify[27]. A powerful pretrained prior like Kimi K2 gives RL a rich set of behaviors to hone in on, whereas training a small model from scratch has to invent those behaviors anew. This insight flips the conventional wisdom: it can be more compute-efficient to do RL on a large model (with a strong prior and LoRA efficiency) than to do RL on a smaller model, even if the smaller model is cheaper per step[28]. Mind Lab’s contribution here is not just an algorithm, but an infrastructure strategy – a blueprint for making continuous learning feasible on the biggest models. They have upstreamed their methods into open-source projects (Megatron-Bridge, VERL)[29], so the community can reproduce and build on this work, potentially enabling many groups to fine-tune trillion-parameter agents on modest hardware budgets.

Memory Diffusion: Rethinking Agent Memory Beyond Vector DBs

Memory Diffusion live demo

Another frontier Mind Lab is exploring is how an AI agent can handle long-term memories of its interactions. Many current systems bolt on a vector database for retrieving past conversation snippets or use summary techniques to compress history. Mind Lab proposes a more integrated, “model-native” memory system called Memory Diffusion[30]. The idea is to treat the entire sequence of an agent’s dialogue or trajectory as editable memory within the model’s context, rather than something stored externally. Memory Diffusion works by iteratively maintaining a fixed-size window of context via a mask–allocate–refill loop[30]. At each step, the model decides which tokens (pieces of past conversation) to keep (mask) and which to drop, then refills the freed space with newly incoming content – all while respecting a strict token budget for the context length[30]. Essentially, the model is learning to manage its own context, compressing or forgetting less relevant details and retaining important facts as the interaction grows. This is analogous to intelligent forgetting, where the goal isn’t to remember everything indefinitely (which isn’t feasible given context length limits), but to remember usefully under real constraints[30].

By operating at the token sequence level, Memory Diffusion avoids the need for external embeddings or similarity search; the “memory” lives in the same representational space as the model’s working context. Mind Lab reports that this approach achieves state-of-the-art long-horizon memory performance, meaning the agent can carry on extended conversations or tasks without losing pertinent information, all through learned in-model mechanisms[31]. It also runs in constant time relative to context size – no explosion of retrieval cost as history grows, since the context length is fixed and managed via the mask/refill operations[31]. In practical terms, an agent with Memory Diffusion could engage in a conversation lasting thousands of turns, and while it cannot explicitly keep every detail, it will continuously decide what to keep in mind. Important user preferences or unresolved questions will persist, while trivial chit-chat from much earlier might be pruned away. This approach treats memory as a first-class component of the model’s cognition, aligning with Mind Lab’s view that memory should be an active, learning part of the system rather than a passive datastore[30].

Read more at our technical blog

Research–Product Co-Design: A Continuous Learning Loop

Tinker's infrastructural affordances and Mind Lab's algorithmic efficiencies form a natural symbiosis. Tinker enables direct application of Mind Lab's hybrid LoRA RL to Kimi K2 and Qwen3-VL, facilitating multimodal agentic loops.

In research-product co-design—Mind Lab's core tenet—this manifests as:

  1. Instrumentation for Feedback: Deployed agents (e.g., via Tinker-served models) generate structured episodes from user interactions, tool outcomes, and corrections.
  2. Online RL Pipelines: Hybrid parallelism supports continual updates on live signals, evolving value functions and policies without offline batches.
  3. Multimodal Adaptation: Vision inputs allow RL on perceptual tasks, refining world models for GUI navigation, document understanding, or visual reasoning.
  4. Safety and Stability: Colocated rollouts minimize distribution shift; streaming rewards (as in Mind Lab's HTML aesthetics example) prevent reward hacking.

Strategically, this paradigm accelerates iteration: products become experimental testbeds, yielding high-fidelity data that refines research hypotheses. For instance, few-shot vision classification gains from Tinker can seed RL objectives in deployed visual agents, progressively aligning perceptual policies with user preferences.

Traditionally, AI research would produce a model or algorithm, and then separately a product team might figure out how to deploy it, with relatively slow iteration between the two. Mind Lab instead operates on a philosophy of research–product co-design: every new technique is quickly tested in a live agent setting, and real user interactions generate data to refine the research[32].

“Research and product are no longer separate tracks. They are a closed feedback loop: user experience → data → RL training → deployment → better UX → richer data → repeat.”[33]. In practice, this means that when Mind Lab improves their RL algorithm or memory system, they integrate it into an actual user-facing agent (for example, Macaron’s personal AI assistant) and observe how it performs with real users. The usage data – what questions users ask, where the agent fails or succeeds, explicit feedback – is then fed back as training signal (through supervised fine-tuning or reinforcement learning) for the next model update. This tight loop greatly accelerates learning: the product is the experiment.

One implication is the use of streaming reward models and online RLHF (Reinforcement Learning from Human Feedback). Instead of collecting a static dataset of human preference comparisons and training a reward model once, Mind Lab’s framework envisions continuously updating the reward model as new feedback comes in during deployment. For example, if an agent is solving tasks for users and occasionally gets a thumbs-down or correction, those signals can be streamed into the reward model to refine its notion of “good” behavior on the fly. The next time RL is run (which could be in a scheduled cadence or even asynchronously), the updated reward model guides the policy to better align with user preferences. This streaming RL paradigm turns deployment into an extension of training – the longer the agent runs in the real world, the more experience it gathers, and the better it becomes. The OpenAI-compatible interface provided by Tinker actually complements this strategy: it allows these continuously-learned models to be plugged into existing products and tools easily, meaning a research lab can rapidly push new model versions to a product and observe results, without needing to rebuild the integration each time.

From Tinker’s side, the platform’s ability to sample from a model mid-training[10] could facilitate such iterative loops by enabling intermediate evaluations and fine-grained tuning decisions. On Mind Lab’s side, the co-design loop ensures that their innovations (like trillion-scale RL or memory diffusion) are stress-tested in real use cases. This approach surfaces practical challenges early (e.g., how to handle latency or unexpected user inputs) and closes the gap between cutting-edge research and user-facing AI products. The strategic payoff is that improvements are driven by real-world needs and directly validated against real-world use. As Mind Lab notes, genuine progress comes from “continuous learning from user–product interactions”[33], and an agent that can adapt in situ will ultimately deliver a far better user experience than one that is fixed at deployment.

Implications for Agentic AI and Future Co-Designed Systems

Taken together, the advances from Tinker and Mind Lab highlight a profound shift in how we build AI systems – from static models to adaptive agents co-designed with their environments. Several key implications emerge:

  • Foundation Models to Foundation Agents: The introduction of agentic models like Kimi K2 (with tool-use and reasoning baked in) and techniques to continually fine-tune them suggests that large language models are evolving into platforms for behavior, not just knowledge. Instead of one-time trained models that only imitate text, we get agents that can plan, act, and incorporate feedback. This blurs the line between an AI model and an AI product: the model is increasingly the agent you interact with, and it can update itself to serve you better. Building such agents requires uniting model-centric research (new architectures, training methods) with product-centric thinking (user experience, deployment constraints) in a single development cycle.
  • Tool-Augmented Reasoning as the Norm: With Tinker’s OpenAI-compatible interface and models explicitly built for tool use, we can foresee AI agents seamlessly invoking external tools, APIs, or databases as part of their reasoning process. Kimi K2’s design and Mind Lab’s agentic experiments both emphasize that solving complex tasks often requires an AI to consult tools or simulate environments[34][35]. Future systems will likely integrate tool APIs at the core of the model’s training (as Kimi’s large-scale agentic data synthesis did[36]), yielding out-of-the-box tool-using capabilities. Strategically, this means AI products will be more than a monolithic model – they’ll be tool orchestration platforms, where the model serves as a brain that knows when and how to call other services. The ease of integrating Tinker models via familiar APIs lowers the barrier for developers to create such tool-using AI workflows in practice.
  • Stateful Interaction and Personalized AI: Memory innovations like Memory Diffusion point toward AI that can maintain long-term state about interactions. Instead of treating each session or query in isolation, future agents will carry a memory of previous interactions, preferences, and contexts in a principled, bounded way. This will enable much more personalized and context-aware AI assistants – ones that don’t reset every time, but truly learn who they are interacting with and what has been happening. Importantly, Mind Lab’s approach shows that this can be done without infinite context windows; through learned memory management, agents can get smarter about what to remember. For users, this means a more fluid experience: a personal AI that remembers past conversations will feel more like an ongoing dialogue or a consistent assistant, rather than a series of disconnected uses. It also raises new design questions: how do we ensure the right things are remembered or forgotten? The answer likely lies in techniques like memory diffusion that incorporate human-like forgetting and emphasis.
  • Hybrid Infrastructure as a Competitive Advantage: The technical groundwork laid by these projects – e.g. hybrid parallel training, LoRA-on-MoE, distributed RL – will be a game-changer for AI development teams. Groups that adopt these methods can fine-tune the largest models with relatively modest compute, which could democratize the ability to build specialized high-performance AI agents. Instead of only big tech companies being able to deploy trillion-parameter models, any lab or startup could leverage an open model like Kimi K2 and adapt it via LoRA on a smaller GPU cluster[37][21]. This flattens the playing field and also encourages experimentation with large models in niche domains (since cost is less prohibitive). We may see an explosion of tailored trillion-scale agents – some focused on medical reasoning, others on legal research, others on creative design – all made feasible by efficient fine-tuning frameworks. The open-source integrations (Megatron, etc.) further ensure that these innovations spread quickly. Moreover, a hybrid parallel approach means that for any given hardware budget, one can squeeze out more effective training by smart scheduling and parallelizing, rather than just accepting a smaller model. This is critical as we push models to incorporate more modalities and longer contexts, which will further increase computational demands.
  • Continuous Learning and Human–AI Interaction: Finally, the notion of a closed-loop learning system transforms the user’s role in AI evolution. Every user interaction becomes a potential training example, and every deployment is an experiment. In practical terms, this could lead to AI services that improve dramatically overnight as they retrain on the previous day’s data – much like how software updates roll out. Users might start to expect that if they correct an AI today, it won’t repeat the mistake tomorrow. This sets up a virtuous cycle: better products attract more usage, yielding more data to learn from, which in turn improves the product. However, it also demands careful co-design of evaluation and safety – if an agent is learning from its own interactions, we need robust reward models and guardrails to ensure it learns the right lessons (avoiding reinforcing undesirable behaviors). Mind Lab’s work on incorporating human preference rewards and self-critique into RL is an early template for this[35]. In the long run, such research–product co-design may become standard practice: instead of a research paper ending with “we fine-tuned a model and achieved X,” the success criterion will be “we deployed an adaptive agent to users and it sustainably improved its performance/utility by Y% over time.”

Toward Adaptive Minds: A Concluding Vision

As static scaling laws plateau, the synthesis exemplified by Tinker's accessible trillion-scale customization and Mind Lab's efficient experiential RL heralds a transformative era. By embedding adaptation into the product loop, we move beyond brittle brains toward resilient minds—systems that not only reason and perceive at frontier levels but grow symbiotically with their environments. This co-evolutionary trajectory promises AI that is not merely capable, but continually becoming more attuned to human needs and the complexities of the real world.


[1] [34] [35] [36] [2507.20534] Kimi K2: Open Agentic Intelligence

https://ar5iv.labs.arxiv.org/html/2507.20534

[2] [3] [8] [9] Tinker - Thinking Machines Lab

https://thinkingmachines.ai/tinker/

[4] [5] [6] [10] [11] [12] [13] [14] [15] [16] Tinker: General Availability and Vision Input - Thinking Machines Lab

https://thinkingmachines.ai/blog/tinker-general-availability/

[7] [20] [21] [22] [23] [24] [25] [26] [27] [28] [37] How We Build Trillion Parameter Reasoning RL with 10% GPUs

https://macaron.im/mindlab/research/building-trillion-parameter-reasoning-rl-with-10-gpus?trk=article-ssr-frontend-pulse_little-text-block

[17] [30] [33] Macaron AI | LinkedIn

https://www.linkedin.com/company/macaronaiofficial

[18] [19] [29] [31] [32] Introducing Mind Lab — Macaron AI's Research Arm

https://www.linkedin.com/pulse/introducing-mind-lab-macaron-ais-research-arm-macaronaiofficial-tkz2e?trk=organization_guest_main-feed-card_feed-article-content

Boxu earned his Bachelor's Degree at Emory University majoring Quantitative Economics. Before joining Macaron, Boxu spent most of his career in the Private Equity and Venture Capital space in the US. He is now the Chief of Staff and VP of Marketing at Macaron AI, handling finances, logistics and operations, and overseeing marketing.

Apply to become Macaron's first friends