GLM-5 vs MiniMax M2.5: Coding & Agent Race Begins (2026)

Blog image

Meta Description:GLM-5 vs MiniMax M2.5 launched on the same night (Feb 11, 2026). We break down the Agent and coding race, Anthropic’s Opus 4.6 risk report, Claude Cowork on Windows, and 12 other AI releases worth tracking.

glm-5-vs-minimax-m2-5-2026

Hey fellow AI tinkerers — if you're the type who had 8 tabs open last night because two Chinese AI labs decided to ship flagship models at the exact same time, you're in the right place.

I'm Hanks. I test AI tools and build workflows for a living — 3+ years of breaking things on purpose and rebuilding from scratch. And honestly? Yesterday was one of those nights where I kept thinking there's no way another one just dropped. But there it was.

Here's what actually matters from today's 15 headlines. Not all of them do.

Foundation Models

MiniMax M2.5 & ZhipuAI GLM-5: Same Night. Coincidence? I Don't Buy It.

Blog image

Two of China's top AI labs shipped flagship models within hours of each other on February 11. The timing is too clean to be accidental. This is China's version of the Anthropic vs OpenAI dynamic — and it's starting to look deliberate.

MiniMax M2.5 skipped M2.2 entirely and jumped straight to M2.5. That kind of version jump usually signals that the capability delta was too large to call it a minor release. Focus areas: Coding and Agent performance.

Blog image

ZhipuAI's GLM-5 came with more architectural specifics:

Spec

GLM-5 Details

Architecture

MoE (Mixture of Experts)

Total Experts

256

Active Parameters

44B

Attention Mechanism

DeepSeek sparse attention

Coding Benchmark

Open-source SOTA

Long-horizon Agent

Open-source SOTA

Compared to

Approaching Claude Opus 4.5

Here's the part I found genuinely interesting — GLM-5 had been quietly live on OpenRouter under the alias Pony Alpha before the official launch. Community testers ran it. Feedback was solid. That "stealth test first, announce loud later" approach tells you they were confident in the numbers before making noise.

My honest read: both models are gunning hard at Agent + Coding. But strong benchmarks aren't the same as stable production behavior. I've seen too many models ace demos and fall apart in real workflows. The stability question takes time to answer.

vLLM Drops Streaming Input + Real-Time WebSocket API — Bigger Than It Sounds

This one's flying under the radar and it shouldn't.

vLLM, in collaboration with Meta and Mistral AI, just shipped streaming input and a real-time WebSocket API — making it the first mainstream open-source LLM inference engine to support this.

Why does that matter? Traditional LLM inference runs like this: wait for the full input → begin processing → output. Streaming input means the model starts processing as tokens arrive. For long-context tasks and real-time conversation systems, that's a meaningful latency cut — not a marginal one.

If you're running local LLMs or self-hosted inference, this is worth a version bump and a real-task test.

Tencent Hunyuan 3D 3.1 Lands on Replicate

Replicate now hosts Tencent Hunyuan 3D 3.1, supporting up to 8 input views for generating high-fidelity 3D models with accurate geometry and textures.

I haven't deep-tested this one yet. But the 8-view input spec is meaningful — single-view 3D generation tends to produce geometric inconsistencies. Multi-view constraints tighten the geometry significantly. If 3D generation is part of your workflow, put this on the watchlist.

Research

Anthropic's Claude Opus 4.6 Catastrophic Risk Report — This One's Worth Reading Slowly

Blog image

This is the highest-density item in today's digest, and probably the least-clicked. That's a problem.

Anthropic published a catastrophic risk report for Claude Opus 4.6, fulfilling a commitment they made earlier. The context: as model capabilities approach AI Safety Level 4 — the threshold for "capable of autonomous AI R&D" — Anthropic promised systematic risk evaluation before crossing it.

This is not a marketing document. It's a public technical statement about capability boundaries and potential failure modes.

My first reaction was something like: a company voluntarily publishing a report that says our model is approaching a dangerous threshold deserves to be taken seriously. That kind of transparency is rare and it's the kind of thing that separates safety theater from actual safety work.

If you're building anything with autonomous AI Agents, this report belongs in your reading queue.

OpenAI Deep Research Now Runs on GPT-5.2

Blog image

ChatGPT's Deep Research feature is now powered by GPT-5.2, rolling out today. New interaction features alongside the model upgrade: connect apps to search specific sites, track progress in real-time and insert follow-up questions mid-run, full-screen report view.

Product Releases

Claude Cowork Comes to Windows — Full Feature Parity With macOS

This is a real unlock for non-Mac users.

Anthropic announced that Claude's Cowork feature now fully supports Windows, with complete feature parity against macOS. That includes file access, multi-step task execution, plugins, and MCP connectors.

The official announcement is light on specifics, but full parity means the Mac-only wall is gone. If you were waiting on this, you can start running now.

Supabase × ByteDance TRAE IDE: One Less Handoff in Your Dev Stack

Blog image

Supabase announced integration with ByteDance's TRAE IDE, adding one-click backend deployment, Supabase Platform Kit support, and MCP connectivity.

The logic here is simple: TRAE is an AI-assisted coding environment, Supabase handles backend infrastructure. Connecting them collapses a step in the "write code → deploy → run" chain. For solo developers and small teams, fewer tool-switching moments means fewer context drops.

OpenAI Deep Research: Interaction Upgrades Worth Noting Separately

Beyond the GPT-5.2 model upgrade, the UX changes deserve their own mention:

Connect specific apps to search targeted sites
Real-time progress tracking with mid-run interruption support
Full-screen report view

None of these are headline features on their own. But they solve a real problem — Deep Research used to feel like a black box. You kicked it off and waited. Now you can see what it's doing and steer it while it's running. That's a meaningful workflow difference.

LLaDA 2.1: 100B Discrete Diffusion Language Model Goes Open-Source

Blog image

Ant Group open-sourced LLaDA 2.1 via LMSys. The specs:

Parameters: 100B
Architecture: Discrete Diffusion Language Model
Generation: Fast parallel generation with real-time error correction
Training: Large-scale block-level reinforcement learning
Inference support: SGLang added support immediately

Discrete diffusion LMs aren't a new concept — but a 100B open-source version is rare. SGLang's day-one support means the inference stack is ready. If you want to experiment with non-autoregressive generation at scale, this is the clearest on-ramp right now.

Browserbase Functions: Serverless Compute With a Built-In Browser

Blog image

Browserbase launched Functions — code and browser running co-located, no infrastructure management required.

The pitch is direct: stop managing Chrome processes. For teams doing web scraping, automated testing, or building AI Agents that need to interact with browsers, this removes one of the most annoying infrastructure layers in that stack.

Industry & Events

Vercel on AI Agents in Production: "Building Isn't the Bottleneck Anymore"

Blog image

Vercel put it plainly: building is no longer the hard part. The real challenge for AI Agents in production is safety, reliability, and auditability. Their self-driving infrastructure product is their answer to that.

This framing is accurate and I'd push back on anyone who disagrees with it. The gap between "this Agent works in a demo" and "this Agent works reliably in production" is the actual problem most teams are stuck on right now.

NVIDIA GTC Golden Ticket Giveaway

Includes Jensen Huang keynote VIP seating + DGX Spark. Deadline: February 15. Check NVIDIA's GTC page if you're interested.

n8n Launches a Dedicated RAG Resource Hub

Blog image

n8n dropped a RAG-specific page consolidating use cases, workflow breakdowns, supported LLMs, vector database integrations, templates, and tutorials. If you're building RAG pipelines on n8n, this is faster than digging through docs.

The Bottom Line

One sentence: Chinese AI labs are now competing at the flagship model layer on Agent + Coding, while Anthropic is doing something different — drawing capability boundaries and making risk public.

Both things are worth tracking. They're just not solving the same problem.

At Macaron, we built our personal AI around exactly this kind of signal-from-noise problem — turning daily AI updates into structured, actionable decisions you can actually run with. If you want to test whether your own workflow can absorb what's happening in AI right now without losing the thread, try Macaron free and run it on a real task. Judge the results yourself.