
Hey, buddy. I'm Anna. How's going on? I've been asking Macaron to generate mini-apps since GLM-4.7 was the default. Meal planners that forget dietary restrictions. Travel itineraries with impossible timing. Fitness trackers that reset progress randomly.
GLM-5 dropped February 11. I ran the same prompts again. Some things got better. Some things are still broken.

Macaron isn't trying to replace Notion. It's a personal AI platform that generates small, interactive tools on demand—"mini-apps." You describe what you need, and it builds something functional within seconds.
These aren't polished SaaS products. They're quick, single-purpose utilities. A meal planner that remembers dietary restrictions. A reading tracker with notes. A reflection journal that surfaces patterns.
As of mid-February 2026, these run on GLM-5, which launched with 744B parameters and built-in agentic intelligence. Before that, GLM-4.7. The quality difference between those two models is what I've been testing.

GLM-4.7 generated layouts that looked auto-generated. Buttons in wrong places. Misaligned text fields. Painful color schemes.
GLM-5 is noticeably better. When I asked for a weekly meal planner, it gave me a clean grid with sensible spacing, readable fonts, and colors that didn't hurt. The improvement comes from GLM-5's "native Agent Mode capabilities" extending to interface generation.
Still not flawless. Complex layouts—nested sections, conditional visibility—come out wonky about 30% of the time. I asked for a fitness tracker with collapsible goal categories, and clicking "expand" on cardio also expanded strength goals. Close, but not quite.
This is where the jump felt most significant.
GLM-4.7 could build interactive elements, but logic broke predictably. Meal planners forgot favorites. Travel itineraries couldn't handle time zones. Reading trackers crashed on deletions.
GLM-5 handles state better. Macaron's February 2026 analysis noted it "remembers soft preferences within a session" and "handles backtracking better." I tested this with a travel day-planner, then changed constraints mid-generation. GLM-4.7 would reset everything. GLM-5 adjusted the timeline and kept good suggestions.
Interactivity improved subtly. Buttons respond consistently. Input validation works more often. Dropdown menus don't break randomly. GLM-5 is better at graceful degradation—when something fails, it doesn't destroy the entire app.
Despite the improvements, GLM-5 mini-apps still hit predictable failure modes.
Complex date/time logic: Recurring events, time zones, "every other Thursday except holidays"—these work approximately. My fitness tracker with adaptive rest days sometimes scheduled two consecutive rest days, sometimes none.
Memory persistence: Macaron's deep memory stores preferences, but relying on it for mini-app state is unreliable. A reflection journal should pull past patterns. In practice, it does this inconsistently.
Multi-user contexts: Shared meal planning or group itineraries? GLM-5 can generate the interface but struggles with conflicting preferences or merged inputs.
Large datasets: Feed a reading tracker 50+ books and performance degrades. Scrolling gets choppy. Search breaks.

These are the ones that actually worked well enough to keep using past the initial test.
Prompt: "Build me a weekly meal planner that remembers I'm lactose intolerant and vegetarian. Let me mark recipes as favorites and suggest meals based on what I've liked before."
GLM-5 generated a clean interface with meal slots for each day, a favorites list, and basic filtering. The dietary restriction memory worked consistently—it didn't suggest cheese-heavy dishes after I'd specified lactose intolerance once.
The "suggest based on favorites" feature is less reliable. Sometimes it pulls from your saved recipes, sometimes it seems to forget and offers random suggestions. But for quick meal planning without having to re-state preferences every time, it's useful.
Prompt: "Create a travel day planner for a weekend trip. Include time estimates between locations, notes about each stop, and flag if the timing gets too tight."
This one impressed me. GLM-5 built an interface where you can add stops, input estimated durations, and get automatic warnings if you've scheduled overlapping activities or unrealistic transit times. When I tried to put a museum visit 15 minutes before a dinner reservation across town, it flagged the conflict.
The time estimation isn't perfect—it doesn't account for traffic or real-world transit delays—but as a planning tool to catch obvious scheduling mistakes, it works.

Prompt: "Build a fitness tracker that logs workouts and adjusts weekly goals based on what I completed last week."
GLM-4.7 couldn't handle "adaptive" reliably. GLM-5 gets closer. It generates a tracker where you set initial goals, log workouts, and at week's end, it suggests adjustments based on completion rate. Hit 100%? It suggests increasing 10-20%. Under 50%? Maintain or reduce.
Where it breaks: track more than three workout types and the interface clutters. Goal suggestions don't account for context (injury, travel), so treat them as starting points.
Prompt: "Create a reading tracker where I can add books, rate them, and save notes. Let me filter by rating or search notes."
Most straightforward mini-app, which is why it works well. GLM-5 generates a clean table with title, author, rating, notes. Search works for exact matches, less so for fuzzy searches. The basic note field (no formatting) actually helps—encourages concise writing.
Friction: no auto-save. Close without manually saving and your notes disappear.
Prompt: "Build a weekly reflection journal where I answer the same three questions: What went well? What didn't? What will I try differently? Show me patterns."
Pattern detection is where this gets interesting. After a few weeks, the mini-app surfaces recurring themes. GLM-5 with deep memory could say things like "You tried A and B for three weeks: A stuck when you did it at 7 a.m., B never passed 1-2 attempts."
The detection isn't sophisticated—mostly keyword matching—but even basic pattern surfacing helps when trying to notice what's actually working.
Tracking patterns and iterating on mini-apps can get messy—logs, prompts, and small tweaks scattered across sessions. We’ve been there too. With Macaron, you can store your mini-app prompts, iteration notes, and feedback all in one place, making it easy to refine apps without losing context.
Try it with your next mini-app →

I've generated probably 30+ mini-apps at this point, and a few patterns emerged about what prompts work better with GLM-5.
Be specific about constraints upfront: Instead of "build me a meal planner," try "build me a meal planner for 5 weekday dinners, vegetarian, with a grocery list feature." GLM-5 handles explicit constraints better than vague requests.
Describe the core interaction first: "I want to add books and rate them" is clearer than "I want a reading app." Focus the prompt on the primary action users will take, then add secondary features.
Mention failure modes you want to avoid: "Don't reset data when I close the app" or "warn me if times overlap" tells GLM-5 what edge cases to handle. It doesn't always work, but explicitly stating these constraints increases the odds.
Ask for simple first, iterate later: Generate a basic version first, test it, then ask for specific additions. "Add a search feature to this reading tracker" works better than trying to specify every feature in the initial prompt.
Use concrete examples: Instead of "adaptive fitness goals," say "if I complete 80% of my workouts, suggest increasing my weekly target by one session." Specific examples help GLM-5 understand the logic you want.
Test immediately and describe what broke: Generate, use it for 5 minutes, then tell Macaron what failed. "The save button doesn't work when I have more than 10 items" gives GLM-5 concrete information to fix. The iteration loop is faster than trying to anticipate every edge case upfront.
GLM-5 mini-apps aren't replacing dedicated software, and they shouldn't. But for small, personal utilities that would take 20 minutes to build manually and 3 seconds to generate, the quality jumped enough to make them genuinely useful. I'm keeping the meal planner and the reading tracker. The others I'll revisit when I need them.
I'm curious whether the pattern detection in the reflection journal holds up past the first month. And whether the fitness tracker's adaptive logic stays helpful or starts suggesting absurd goals after enough data accumulates. For now, they work well enough that I'm not manually tracking these things in spreadsheets anymore.