From Scaling to Experiential Intelligence: Ilya Sutskever’s Vision & Macaron’s Approach

Author: Boxu Li

The End of the “Scaling” Era and a Return to Research

In a recent conversation with Dwarkesh Patel, Ilya Sutskever – co-founder of OpenAI and now head of startup Safe Superintelligence (SSI) – reflected on the state of AI and where it’s headed. Sutskever argues that the AI industry is moving past the era of “just make it bigger” scaling and back into an age of fundamental research[1]. Over roughly 2012–2020, deep learning progress was driven by new ideas (the “age of research”), followed by 2020–2025’s focus on scaling up data and parameters (the “age of scaling”)[1]. But now, simply increasing model size or dataset size is yielding diminishing returns. As Sutskever bluntly puts it, “if you just 100× the scale, [not] everything would be transformed… it’s back to the age of research again, just with big computers.”[2][3] In other words, future breakthroughs will come not from brute-force scale, but from new training recipes and smarter algorithms.

A core problem motivating this shift is what Sutskever calls the generalization gap. Today’s large models can ace benchmarks yet still stumble on practical tasks – a paradox that has become increasingly obvious. “These models somehow just generalize dramatically worse than people. It’s super obvious. That seems like a very fundamental thing,” Sutskever notes[4]. Models that score top marks on coding competitions or language exams can still make bizarre errors – repeating the same bug fix back and forth, or failing at simple commonsense decisions – that no competent human would[4][5]. This highlights a fragility: neural networks don’t truly understand or adapt as robustly as humans do, despite their impressive narrow skills. As one summary of Sutskever’s talk explains, even though we’ve built models that perform well on evaluations, their real-world reliability remains “a fragility evidenced by … high performance on evaluations contrasted with real-world errors.”[6]

Why do current models fall short on generalization? Sutskever suggests it’s partly an artifact of our training paradigm. In the era of large-scale pre-training, we simply fed the model everything (internet-scale text) and hoped breadth of data would yield broad capabilities. It did – up to a point. But after pre-training, companies fine-tune models with reinforcement learning (RL) on specific benchmarks or user instructions. Sutskever suspects this RL stage often overspecializes models to do well on tests rather than genuinely improving their understanding[7]. In his conversation, he gives a vivid analogy: one “student” (analogous to an AI model) spends 10,000 hours practicing competitive programming problems and becomes a savant at coding contests, whereas another student practices more modestly and focuses on broad computer science intuition[8][9]. The first might win competitions but the second ends up a more versatile engineer in the real world. Today’s models are like the over-prepped specialist – they excel in the narrow conditions they were tuned for, but they lack the “it factor” that humans have for adapting skills to new, messy problems[10][11]. In short, our AIs have not yet achieved the robust, fluid generalization that we humans gain through a lifetime of experience.

Why Humans Learn Better: Sample Efficiency and Continual Learning

A major theme in Sutskever’s discussion is the sample efficiency of human learning. Humans need astonishingly little data to learn complex tasks. For instance, Yann LeCun has pointed out that a teenager can learn to drive a car with maybe 10 hours of practice – a vanishingly small dataset by AI standards[12]. Young children learn to recognize cars (and thousands of other concepts) from just everyday life exposure[12]. By contrast, current AI models often require enormous training sets and still can’t match human flexibility. Sutskever notes that evolution preloads us with some useful inductive biases – e.g. millions of years of vision and locomotion shaped our brains – but that alone isn’t the whole story[13][12]. Even in domains not honed by evolution (like reading, math, or programming), humans rapidly outlearn today’s algorithms[14][15]. This suggests that “whatever it is that makes people good at learning” goes beyond just built-in knowledge – we have a fundamentally more efficient learning algorithm[14][15].

What might that algorithm be? One clue, Sutskever argues, is that humans learn continually and interactively, not in one giant batch. We don’t ingest terabytes of text and then freeze our brains; instead, we learn from ongoing experience, constantly updating our knowledge. He points out that a human being at age 15 has vastly less total data intake than a large language model’s corpus, yet by 15 we achieve a deeper understanding and make far fewer obvious mistakes[16][17]. The difference is that humans keep learning throughout life – we don’t consider our “training phase” done at adolescence. “A human being is not an AGI… instead, we rely on continual learning,” Sutskever says, highlighting that even a superintelligent AI might need to be deployed more like a 15-year-old prodigy than an all-knowing oracle[18][19]. Such an AI would have a strong foundation but “lacks a huge amount of knowledge” initially – it would then learn on the job in various roles, just as a bright young human goes out into the world to train as a doctor or engineer[19][20]. In fact, Sutskever’s vision of a safe superintelligence is explicitly not a static model that “knows how to do every job,” but a system that “can learn to do every single job” and keeps getting better[20][21]. In other words, real AI success may mean creating masters of learning, not just masters of any fixed task.

Another aspect of human learning is our built-in feedback mechanisms. Humans have emotions and intuition that act like an internal reward signal, guiding us as we learn new skills. Sutskever recounts a striking case: a man who lost the ability to feel emotion (due to brain damage) became catastrophically bad at decision-making, struggling even to choose which socks to wear[22][23]. Without emotional cues, he had no internal sense of what mattered. This suggests that our brains leverage a kind of value function – a running estimate of how well things are going – to learn efficiently and make decisions[24][25]. In reinforcement learning terms, we don’t wait until the very end of an experience to get a reward; we generate intrinsic rewards at intermediate steps (pleasure, frustration, curiosity, etc.), which hugely accelerates learning. Sutskever argues that today’s RL algorithms lack this richness – they often wait for a final score and are thus extremely inefficient on long-horizon tasks[26][27]. “If you are doing something that goes for a long time…it will do no learning at all until [the end],” he explains of naive RL[28]. The fix is to give AI agents a better sense of progress – a value function to short-circuit long feedback delays[29][30]. Incorporating such internal feedback could make training far more efficient. Sutskever even likens it to how emotions function for humans[31], calling it a promising direction to “use your compute more productively” than brute-force trial and error[30]. In sum, a combination of continual learning and richer self-supervision (value signals) might be the key to closing the generalization gap.

Key insight: Current AI models need far more data than humans and still aren’t as adaptable. Humans learn efficiently by continuously gathering experience and by using internal feedback (our “emotional” value function) to guide learning. Building AI that learns in a similar interactive, incremental way – and that can judge its own progress – could dramatically improve generalization[32][4].

Beyond Pre-Training: Toward Experiential Intelligence

These insights resonate deeply with our philosophy at Macaron AI. We often distill it in one line: Real intelligence learns from real experience. Rather than betting solely on bigger models or larger offline datasets, Macaron’s research focuses on experiential learning – training AI through active interactions, feedback, and long-horizon memory, much like a human gaining skills over time. This approach, which we call Experiential Intelligence, is about models whose capabilities grow from the quality and diversity of experiences they learn from, not just the quantity of data they ingest. It’s a conscious departure from the era of blind scaling. As Sutskever himself emphasized, simply piling on more data or parameters yields diminishing returns[2]; the next leap forward will come from algorithms that can learn more from less by leveraging the right experiences.

Concretely, Macaron’s Mind Lab research division has been pioneering techniques to enable continual, feedback-driven learning in large models. We don’t throw out our foundation model and pre-train a new one from scratch for every upgrade. Instead, we extend strong base models with iterative post-training: reinforcement learning on real tasks, human-in-the-loop feedback, and long-term memory integration. For example, our team recently became the first in the world to run high-performance RL fine-tuning on a 1-trillion-parameter open-source model – using parameter-efficient LoRA adapters – while consuming only ~10% of the usual GPU budget. This was a breakthrough in making large-scale post-training feasible. In essence, we showed that giving a colossal model new experiences (and learning from them) can be done orders-of-magnitude more efficiently than naive methods. The result? Instead of just squeezing out a slightly lower perplexity on static data, we taught the model new skills via interaction – and did so in a tractable, cost-effective way. (Notably, we’ve open-sourced the techniques behind this and contributed them to popular training frameworks like NVIDIA’s Megatron and ByteDance’s VEGA, so the broader community can build on them.)

Memory: Learning to Forget Wisely

Another pillar of Macaron’s approach is memory – not in the trivial sense of a chat history window, but as a learned component of the model that accumulates and curates knowledge over time. Humans don’t treat every piece of input equally; we remember important events and readily forget the rest. This ability to forget wisely is crucial to handle long-term dependencies without overload. Inspired by this, our researchers developed a novel memory system called Memory Diffusion. Unlike brute-force caching or retrieval, Memory Diffusion teaches the model how information should evolve over a long conversation or usage history. The model learns to “diffuse” out irrelevant details and sharpen the salient facts as context grows. Empirically, this method has outperformed classic memory baselines (like fixed-length context or heuristic retrieval) in maintaining long-horizon coherence. More intuitively, it gives the model a kind of working memory that prioritizes what matters – much as your brain quickly forgets the billboards you passed on your commute but retains where you’re headed and why. By letting the model learn which signals to keep and which to let go, we end up with a system that can carry forward important learnings from one task to the next, enabling more continuous learning. This memory mechanism has become a key piece of Macaron’s agent architecture, alongside our advances in reasoning and tool-use. It’s another example of how we favor architectural smarts over raw scale: instead of just expanding a context window to 1 million tokens (which is inefficient), we give the model a way to intelligently compress and recall knowledge from its own experience.

Real-World Feedback Loops

Crucially, Macaron’s research doesn’t happen in isolation from our product. We believe in a tight research↔product loop: improvements in the lab are directly validated by user experience, and insights from the product inform new research. For instance, Macaron’s personal AI app actively logs anonymized feedback on where the AI’s responses fall short or when users seem dissatisfied. These signals feed into our reinforcement learning training as an additional reward signal. We’ve found that training on real user feedback often yields larger gains in capability than simply adding more internet text to pre-training. This aligns with Sutskever’s observation that what you train on can matter more than how much – a small amount of targeted experience can teach a model something that billions of static tokens couldn’t[7]. By closing the loop between deployment and research, we ensure our AI actually improves at the tasks people care about. In Sutskever’s terms, we are giving our models the “it factor” that comes from experiencing the world, not just memorizing it.

Convergence: A New Paradigm for AI

It’s encouraging to see a growing consensus among AI leaders that continual, experiential learning is the way forward. Sutskever’s vision of a superintelligence that learns like a human – constantly and adaptively – is precisely the path Macaron has been pursuing. We’re not alone in this shift. Google’s recent Pathways strategy, for example, also advocates training one model on many tasks and modalities so it can learn new skills over time, moving beyond single-purpose models. And researchers like Jason Wei and Jeff Dean have discussed the need for architectures that can accumulate knowledge incrementally and efficiently, rather than relying solely on gargantuan one-shot training runs. This represents a broader industry momentum toward what might be called “learning-centric AI” (as opposed to today’s model-centric AI). In this new paradigm, the question becomes: How quickly can an AI acquire a new ability or adapt to a new situation? – rather than how many parameters does it have or how much data was used to pre-train it. By that measure, humans still hold the crown. But the gap is closing.

At Macaron AI, our bet is that Experiential Intelligence – AI that learns from real experience – will unlock the next wave of performance and reliability. We’re already seeing proof points: our models trained with reinforcement learning and human feedback are not only performing better on benchmarks, but more importantly, they feel more aligned with user needs in practice. They make fewer off-the-wall errors and recover from mistakes more gracefully, because their training has taught them to notice and correct mistakes (much like a human would). Our memory mechanisms similarly give them continuity that pure transformers lack, allowing a conversation or task to carry over months without resetting. All of these advantages stem from treating intelligence as a process, not a static artifact. As Sutskever put it, an deployed AI might go through a “learning trial-and-error period” during deployment[19][21] – and that’s a feature, not a bug, so long as it’s controlled and aligned.

Alignment, of course, is paramount when we talk about AI learning on its own. Interestingly, Sutskever suggested that it may even be easier to align an AI that truly learns and understands over time – potentially one that values sentient life and can model the world and others empathetically – than to align a static super-genius that was trained behind closed doors[33]. If an AI grows up interacting with humans, there’s an opportunity to instill human values throughout its development (and to observe and correct missteps). This echoes our view that transparency and gradual deployment are key to safe AI. Macaron’s platform, by engaging users directly and learning from them, provides a natural sandbox for this incremental approach. We intentionally roll out new learning capabilities in stages, monitoring behavior and gathering feedback, rather than unleashing a black-box model trained in a vacuum. In short, experiential learning not only makes AI smarter – it can make AI safer and more human-aligned too.

Conclusion: Embracing Experiential Intelligence

Both Ilya Sutskever’s forward-looking perspective and Macaron’s development journey point to the same conclusion: the next breakthrough AI will be a master learner, not just a bigger memorizer. An AI that can learn from experience, internalize feedback, remember and adapt over the long term – essentially, an AI that can grow – is one that can generalize to the messiness of the real world. This represents a profound shift in mindset from earlier years: it’s not just about how much knowledge the model starts with, but how effectively it can gain new knowledge. Sutskever’s imagined “superintelligent 15-year-old” encapsulates this idea[18][19]. At Macaron, we are working to build that kind of continually learning AI side by side with our community of users.

The implications of experiential, continual learning AI are sweeping. Technically, it means higher sample efficiency – doing more with less – and models that can quickly adapt to any domain or distribution. Economically, it promises AI workers who can be retrained on the fly, vastly accelerating innovation and productivity (Sutskever predicts potentially rapid growth once such AI proliferates[34][35]). And for society, it means AI systems that are more understandable, because we will see them learn and can shape their development, rather than being handed a fully formed enigma.

Achieving this will not be easy. It demands advances in algorithms, systems, and our theoretical understanding of learning. Yet the pieces are coming together: from value functions and advanced RL to lifelong memory architectures and human-in-the-loop training. As we integrate these pieces, we move closer to AI that truly thinks and learns on its feet. This is the ethos driving Macaron’s research, and it aligns closely with the vision articulated by leaders like Sutskever. The age of scaling taught us a great deal, but the age of Experiential Intelligence is now dawning. In this new age, the frontier is not just bigger models – it’s smarter, more adaptable, more human-like learners. And that is exactly what we’re striving to build.

Sources:

· Ilya Sutskever’s interview with Dwarkesh Patel (Nov 2025) – Dwarkesh Podcast: “Moving from the Age of Scaling to the Age of Research.” Highlights available at Dwarkesh’s blog[1][4][18][19].

· Summary of Sutskever’s key points by Best of AI digest[36].

· LeCun’s observation on human driving efficiency (referenced by Sutskever)[12].

· Macaron AI Mind Lab – Internal research briefs on Experiential Intelligence and Memory (2025).

· Macaron AI open-source contributions on large-scale RL training (Megatron-Bridge & VEGA integration, 2025).


[1] [2] [3] [4] [5] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [32] [34] [35] Ilya Sutskever – We're moving from the age of scaling to the age of research

https://www.dwarkesh.com/p/ilya-sutskever-2

[6] [31] [33] [36] Driving Forces in AI: Scaling to 2025 and Beyond (Jason Wei, OpenAI) by Best AI papers explained

https://creators.spotify.com/pod/profile/ehwkang/episodes/Driving-Forces-in-AI-Scaling-to-2025-and-Beyond-Jason-Wei--OpenAI-e30rd59

Boxu earned his Bachelor's Degree at Emory University majoring Quantitative Economics. Before joining Macaron, Boxu spent most of his career in the Private Equity and Venture Capital space in the US. He is now the Chief of Staff and VP of Marketing at Macaron AI, handling finances, logistics and operations, and overseeing marketing.

Apply to become Macaron's first friends