GLM-5 Not Working? Quick Fixes for Macaron

Hey, I'm Anna. I wasn't planning to diagnose a language model on a Tuesday morning. I only opened the app because I wanted a quick outline for a short client email and the response sat there spinning longer than my coffee cooled. That small pause, annoying, resume-breaking, pushed me into poking around GLM-5. Over the past week I ran a few short tests, hit repeated slow loads, and found one or two surprises about fallback behavior. This piece is what I actually saw, step by step: the common early glitches, three practical fixes I tried, how I decided when to report the issue.

Common GLM-5 issues in the first week

Model not loading / slow response

The first thing I noticed was the delay. Not a millisecond hiccup, actual waiting. I launched a short prompt I'd used dozens of times with GLM-4.7 and saw a spinner for 15–30 seconds before any token appeared. In one run, the model failed to produce anything until I refreshed the page.

Reaction: small frustration. I didn't assume it was the model at first: my go-to checks were network and the app itself. But the pattern repeated across different prompts and devices, so it stopped feeling like coincidence.

Friction: slow responses break the moment. When I'm juggling a few tiny tasks, a 20-second stall is enough to make me abandon the tool. That's the kind of tiny friction that accumulates and makes features feel optional.

Output quality drops vs GLM-4.7

A second pattern cropped up once the model did return answers: the tone and concision I'd relied on from GLM-4.7 were sometimes off. Short prompts produced longer, roundabout replies. Instructions that used to get crisp lists now returned looser, less-targeted prose.

What caught me off guard was subtlety: the model wasn't hallucinating wildly, but it was less predictable. For instance, asking for a three-bullet summary of an article resulted in five sentences of general context and only two usable bullets. Not catastrophic, but annoying when I want a compact slot of output for copying into notes.

Reaction: mild disappointment and a quiet recalibration. I started qualifying my prompts with more structure to compensate, "Three bullets, each one line", which improved results but added extra friction.

Memory not connecting properly

I test memory lightly: small preferences or facts I asked the model to remember across a few prompts. With GLM-4.7, a short memory check ("Remember I prefer brief replies") reliably shifted style for the session. With GLM-5, memory seemed inconsistent. Sometimes the model honored the preference: other times it ignored it and defaulted to more expansive answers.

Friction: this inconsistency is the kind of thing that erodes trust. If a model alternates between following and ignoring simple preferences, I find myself double-checking responses or adding instructions repeatedly. That negates the convenience memory is supposed to deliver.

Limitations and uncertainty: I wasn't able to probe deep state or server logs, so I can't say whether the issue was session-routing, a partial rollout, or deliberate model behavior changes. I only report what I observed across about a dozen short sessions over a week.

3-step fix checklist

Check model status (is it actually live?)

First, I check the obvious: is the model actually serving in my environment? If you're using a platform that shows model status (an API dashboard, status page, or in-app indicator), look for outages or degraded-performance notices. When I checked the provider status page during one of my slow runs, there was a brief "degraded" flag that matched the timing.

What to do: if the status page shows a problem, wait 10–20 minutes and retry. If nothing is posted, try a simple sanity test: switch a short prompt to a different model (if available) and note the difference in response timing and content. That helps separate local issues from provider-side problems.

Why it matters in practice: you'll save time reporting a bug that's already known, and you'll avoid changing settings that don't need changing.

Clear cache and retry

This sounds like a browser-age solution, and it sometimes is. I cleared local cache and cookies, restarted the browser, and reran the prompt. In one situation that fixed an unexplained load hang. In others it didn't.

What to do: clear cache, hard-refresh, or try an incognito window. If you're on mobile, force-quit the app and reopen. If that fixes things, the problem was likely a stale session or token. If not, the issue is probably elsewhere.

Practical note: clearing cache is quick and often the difference between filing a bug report and moving on. Do this before you escalate.

Fall back to GLM-4.7 temporarily

If you have access to GLM-4.7 (or another reliable model), switch back for immediate needs. That's what I did for an urgent short brief: switch, generate, paste, done. The older model behaved predictably and saved me the time.

Why this is a pragmatic choice: new models can have growing pains. A graceful fallback keeps your work moving while the newer model stabilizes. If you depend on consistent tone or memory behavior, falling back avoids the micro-friction of rewording prompts multiple times.

Caveat: if you don't have an older model available, consider simplifying prompts or splitting tasks into smaller calls to reduce the chance of long, failed responses.

When to report vs when to wait

I'm picky about filing formal bug reports. Here's how I decided what to report during my week with GLM-5.

Report when:

The behavior is reproducible: I could run the same prompt three times in a row and hit the same hang or degraded output quality. Reproducible issues are useful to engineers.
The problem blocks a workflow: if the model consistently ignores memory instructions for your use case (not just one-off misses), that's worth reporting.
There's data you can attach: timestamps, small prompt samples, and the model version or session ID. I included three sample prompts and the rough timestamps in my report.

Wait when:

The issue is intermittent and only happened once in several retries. Providers often push tiny fixes quickly: a single anomaly can resolve itself.
The problem aligns with a known outage notice. No need to duplicate effort if the status page already confirms degraded service.
You lack diagnostic info. If you can't reproduce and have no logs or versions to share, wait until you can gather more evidence.

How I filed my report: brief context, exact prompts, screenshots of timing, and a note about fallbacks that worked. Keep it concise and reproducible, engineers appreciate short, repeatable cases more than long narratives.

How Macaron handles model fallback automatically

What I saw: during one of the slow GLM-5 sessions, the app briefly served an error and then returned a response that matched GLM-4.7's style. I hadn't manually changed models. When I dug into session logs the app exposed (small, user-facing logs), I saw a line indicating a fallback event, "primary model timeout, using fallback model", and the response that followed was quicker and more predictable.

Reaction: quiet relief. I liked that I didn't have to switch anything. The fallback saved the workflow without me noticing until I checked logs.

Limits and what I don't know: I don't know the exact timeout threshold Macaron uses, nor whether it retries GLM-5 later in the same session. I also can't say how widely this fallback is enabled across accounts or regions. Those are details Macaron would need to confirm.

Why this matters for ordinary use: automatic fallback reduces disruption. For someone like me who treats AI as a helpful companion for small tasks, a seamless switch means fewer interruptions and less decision fatigue.

When a model stalls, most tools just stop and wait. At Macaron, we’ve seen how disruptive that moment can be. In some situations, Macaron may switch away from a slow or unresponsive model and return a response using a more stable fallback, without requiring manual intervention.

→Learn more about how Macaron approaches model reliability

If you're curious to test it: try a controlled prompt that previously timed out, and watch the app's logs or response headers if available. If you see a fallback message, that's a useful trust signal. If you don't, ask support for their fallback policy or configuration details.

A small lingering thought: I appreciate when tools handle errors quietly, but I also value transparency. Showing a non-intrusive notice, "Response served by fallback model", would help users understand differences in answer style or memory behavior without forcing them to hunt through logs.

I'll keep using GLM-5 in short bursts while relying on fallback when it matters. Your mileage may vary, and if you depend on razor consistency for client-facing text, keeping an older model on standby is a reasonable safety net.