
Hey fellow AI builders — if you've shipped at least one agent that worked flawlessly in a demo and then broke spectacularly on the third real user request, you're the exact audience for this.
I'm Hanks, and I've been assembling and stress-testing AI agent pipelines for a few years now. Not proof-of-concepts — systems that actually run. The shift to DeepSeek for agent work has been interesting because the model is genuinely capable, the pricing makes iteration cheap, and the tool-calling behavior has matured significantly with V3.2. But there are gaps in the docs that will catch you.
This guide walks through the architecture choices, a working code review agent built from scratch, LangChain integration, deployment considerations, and the failure modes that will cost you time if you don't know about them in advance.

Three things make DeepSeek the current best-value model for agent work:
Tool calling is stable. From DeepSeek-V3.2, the API supports tool use in thinking mode as well as standard mode. This means you can use deepseek-reasoner for complex multi-step planning and still call external tools — a combination that wasn't possible in earlier versions.
Strict mode eliminates schema drift. In strict mode, the model strictly adheres to the format requirements of the function's JSON schema when outputting a function call, ensuring that the model's output complies with the user's definition. For production agents, this matters — you stop writing defensive parsing code that handles half-formed JSON.
Parallel tool calls. Function Calling supports multiple functions in one call, up to 128, and it supports parallel function calls. For agents that need to gather information from multiple sources simultaneously, this is the difference between a fast agent and one that serializes everything and feels slow.
Cost. At $0.28/M input tokens (cache miss) and $0.028/M on cache hits, iterating on agent logic is cheap. Running 500 test passes to debug a ReAct loop doesn't break the budget.
One important caveat upfront: deepseek-reasoner does not support tool calling or structured output. Those features are supported by DeepSeek-V3 specified via model deepseek-chat. Keep this distinction in your mental model — the thinking model is for reasoning, the chat model is for tool use.

There are two patterns worth considering for most agent tasks. The right choice depends on whether your tasks are dynamic and reactive, or structured and decomposable.

ReAct (Reason + Act) is the simpler pattern and the right starting point for most agents. The loop is:
Observation → Thought → Action → Observation → …
The model receives a task, reasons about what tool to call next, calls it, receives the result as an observation, and repeats until it reaches a stopping condition. Each iteration is a single API call with the full conversation history.
When to use ReAct:
ReAct loop state machine:
[USER TASK]
│
▼
[THINK: What do I need to call?]
│
├─ tool_calls present? ──► [EXECUTE TOOL] ──► [APPEND RESULT]
│ │
│ └──► back to THINK
│
└─ finish_reason == "stop"? ──► [RETURN FINAL ANSWER]
Plan-and-Execute splits the task into two phases: a planner model produces a structured task list, then an executor processes each step. The two models can be different — you might use deepseek-reasoner for planning and deepseek-chat for execution.
When to use Plan-and-Execute:
The tradeoff: Plan-and-Execute is more complex to debug, and replanning when the executor hits an unexpected state requires careful design. Start with ReAct. Graduate to Plan-and-Execute when you hit its limits.

Let's build a code review agent that can read files, run static analysis, and produce structured review comments. This is realistic enough to expose all the interesting edge cases.
Start with your tool definitions. Use strict mode from the start — it saves debugging time later.
tools = [
{
"type": "function",
"function": {
"name": "read_file",
"strict": True,
"description": "Read the contents of a file from the repository.",
"parameters": {
"type": "object",
"properties": {
"filepath": {
"type": "string",
"description": "Relative path to the file, e.g. 'src/utils.py'"
}
},
"required": ["filepath"],
"additionalProperties": False
}
}
},
{
"type": "function",
"function": {
"name": "run_linter",
"strict": True,
"description": "Run a linter on a specific file and return violations.",
"parameters": {
"type": "object",
"properties": {
"filepath": {
"type": "string",
"description": "Relative path to the file to lint"
},
"linter": {
"type": "string",
"enum": ["flake8", "pylint", "ruff"],
"description": "Which linter to run"
}
},
"required": ["filepath", "linter"],
"additionalProperties": False
}
}
},
{
"type": "function",
"function": {
"name": "post_review_comment",
"strict": True,
"description": "Post a structured review comment for a specific line.",
"parameters": {
"type": "object",
"properties": {
"filepath": {"type": "string"},
"line_number": {"type": "integer"},
"severity": {
"type": "string",
"enum": ["critical", "warning", "suggestion"]
},
"comment": {"type": "string"}
},
"required": ["filepath", "line_number", "severity", "comment"],
"additionalProperties": False
}
}
}
]
The enum constraint on linter and severity is worth adding — it prevents the model from inventing values that your dispatch logic doesn't handle.
The ReAct loop itself. This is the core engine — study this pattern carefully:
import os
import json
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com"
)
# --- Tool implementations ---
def read_file(filepath: str) -> str:
try:
with open(filepath, "r") as f:
return f.read()
except FileNotFoundError:
return f"Error: File '{filepath}' not found."
def run_linter(filepath: str, linter: str) -> str:
import subprocess
result = subprocess.run(
[linter, filepath],
capture_output=True, text=True, timeout=30
)
return result.stdout or result.stderr or "No issues found."
def post_review_comment(filepath: str, line_number: int,
severity: str, comment: str) -> str:
# In production: call your code review API (GitHub, GitLab, etc.)
print(f"[{severity.upper()}] {filepath}:{line_number} — {comment}")
return "Comment posted."
TOOL_MAP = {
"read_file": read_file,
"run_linter": run_linter,
"post_review_comment": post_review_comment
}
# --- ReAct loop ---
def run_agent(task: str, max_iterations: int = 15) -> str:
messages = [
{
"role": "system",
"content": (
"You are a precise code review agent. "
"Use tools to read files and run linters before posting comments. "
"Always read a file before commenting on it. "
"Stop when all relevant files have been reviewed."
)
},
{"role": "user", "content": task}
]
for iteration in range(max_iterations):
response = client.chat.completions.create(
model="deepseek-chat",
messages=messages,
tools=tools,
tool_choice="auto",
max_tokens=2048
)
choice = response.choices[0]
messages.append(choice.message) # Append full assistant message
# Agent is done
if choice.finish_reason == "stop":
return choice.message.content
# Execute tool calls
if choice.finish_reason == "tool_calls":
for tool_call in choice.message.tool_calls:
fn_name = tool_call.function.name
fn_args = json.loads(tool_call.function.arguments)
# Validate before calling
if fn_name not in TOOL_MAP:
result = f"Error: Unknown tool '{fn_name}'"
else:
try:
result = TOOL_MAP[fn_name](**fn_args)
except Exception as e:
result = f"Tool error: {str(e)}"
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result)
})
return "Max iterations reached. Partial review completed."
# Run it
if __name__ == "__main__":
result = run_agent(
"Review the file src/utils.py for style issues and logic problems. "
"Post comments for any issues you find."
)
print(result)
Two things I always validate before calling a tool: the function name is in TOOL_MAP (prevents arbitrary dispatch on hallucinated tool names), and the call is wrapped in a try/except (a tool failure should become a tool result, not a crash).
Short-term memory is the conversation history — you already have that. The problem is session-to-session persistence: the agent has no memory of what it reviewed yesterday.
A minimal SQLite-backed session store:
import sqlite3
import json
from datetime import datetime
DB_PATH = "agent_memory.db"
def init_db():
conn = sqlite3.connect(DB_PATH)
conn.execute("""
CREATE TABLE IF NOT EXISTS sessions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT,
role TEXT,
content TEXT,
timestamp TEXT
)
""")
conn.commit()
conn.close()
def save_message(session_id: str, role: str, content: str):
conn = sqlite3.connect(DB_PATH)
conn.execute(
"INSERT INTO sessions VALUES (NULL, ?, ?, ?, ?)",
(session_id, role, content, datetime.utcnow().isoformat())
)
conn.commit()
conn.close()
def load_session(session_id: str, limit: int = 20) -> list[dict]:
conn = sqlite3.connect(DB_PATH)
rows = conn.execute(
"SELECT role, content FROM sessions WHERE session_id = ? "
"ORDER BY id DESC LIMIT ?",
(session_id, limit)
).fetchall()
conn.close()
return [{"role": r, "content": c} for r, c in reversed(rows)]
Use load_session() to prepend prior context to your messages list at agent startup. Cap the history at 20–30 messages — beyond that you're more likely to confuse the model than help it.
LangChain has first-class DeepSeek support. The langchain-deepseek package integrates DeepSeek's hosted chat models, with deepseek-chat supporting tool calling and structured output.
Install:
pip install langchain-deepseek langchain-core
Basic agent with bind_tools:
from langchain_deepseek import ChatDeepSeek
from langchain_core.messages import HumanMessage, SystemMessage, ToolMessage
from pydantic import BaseModel, Field
# Pydantic tool definitions — cleaner than raw JSON schema
class ReadFile(BaseModel):
"""Read a file from the repository."""
filepath: str = Field(description="Relative path to the file")
class RunLinter(BaseModel):
"""Run a linter on a file and return violations."""
filepath: str = Field(description="Path to the file")
linter: str = Field(description="Linter name: flake8, pylint, or ruff")
llm = ChatDeepSeek(
model="deepseek-chat",
temperature=0,
max_retries=3
)
llm_with_tools = llm.bind_tools([ReadFile, RunLinter])
# Run one turn
messages = [
SystemMessage(content="You are a code review agent. Use tools to inspect files."),
HumanMessage(content="Check src/auth.py for security issues.")
]
response = llm_with_tools.invoke(messages)
print(response.tool_calls)
Two key LangChain gotchas to know upfront:
First, strict=True in bind_tools uses a beta endpoint and the DeepSeek API may ignore the parameter according to LangChain's own docs. If you need strict schema enforcement, pass it directly in the raw API call rather than through LangChain's bind_tools.
Second, deepseek-reasoner does not support bind_tools. Attempting to use tool calling with deepseek-reasoner will fail because it is not supported by that model. If you want reasoning + tool calling, use deepseek-chat with thinking mode enabled via the API's thinking parameter — not deepseek-reasoner.
For production agents, LangGraph is worth the additional setup over bare LangChain chains. It gives you explicit state management, human-in-the-loop checkpoints, and built-in support for cycles (the ReAct loop pattern) — without you having to manage the message history manually.
A few things that will catch you in production that the tutorials skip:
Session isolation. If you're running multiple concurrent agent sessions, each must have its own isolated message history. A shared messages list across requests is a classic concurrency bug — one agent's tool results bleed into another agent's context.
Timeout budgets. Long-running agents need a total wall-clock timeout, not just a per-call timeout. Set a 120-second outer limit and let the agent's max_iterations catch the loop case independently.
Tool sandboxing. If your agent calls run_linter via subprocess, run it in a restricted environment. An agent that can call arbitrary subprocesses can be prompted into running arbitrary code. Use Docker, subprocess allowlists, or a tool execution sandbox.
Cost monitoring. DeepSeek doesn't expose a usage API endpoint — monitor costs via the dashboard, and set a balance alert before your agent goes live. A runaway loop can exhaust a prepaid balance surprisingly fast.
A minimal deployment checklist:
These are the failure modes I've hit personally — each cost at least an afternoon.
Pitfall 1: Forgetting to validate tool arguments before dispatch.
The model does not always produce valid arguments, even in strict mode. The model does not always generate valid JSON, and may hallucinate parameters not defined by your function schema. Validate the arguments in your code before calling your function. A try/except around every tool call and returning the error string as the tool result is the minimum viable protection.
Pitfall 2: Using deepseek-reasoner for tool-calling agents.
I tested this. The model produces a "reasoning_content" block and then returns no tool calls. The fix is to use deepseek-chat for any agent that needs to call tools. If you want the reasoning model's thinking for planning, run a separate planning pass with deepseek-reasoner first, then hand the plan to a deepseek-chat executor.
Pitfall 3: Not appending the full assistant message object.
When the model returns tool calls, you must append the entire choice.message object to your history (not just a role/content dict), then append each tool result with its matching tool_call_id. If the IDs don't match, the next API call returns a 400 error. This is the most common implementation mistake I see in tutorials — they show simplified pseudocode that skips the ID matching.
Pitfall 4: Thinking mode + tool calls requires passing reasoning_content back.
Since the tool invocation process in thinking mode requires users to pass back reasoning_content to the API to allow the model to continue reasoning, if your code does not correctly pass back reasoning_content, the API will return a 400 error. If you're using deepseek-chat with thinking mode enabled and seeing 400 errors mid-loop, this is why.
Pitfall 5: Unbounded context growth in long sessions.
Each iteration appends tool calls and results to the history. A 15-iteration agent reviewing a large codebase can hit 30K+ tokens before you know it. Implement a sliding window or summarization step when the history exceeds a threshold — don't let it grow unbounded.
At Macaron, the agent pattern that consistently works in daily workflows isn't the one with the most tools — it's the one where the model remembers what it did last session and doesn't have to re-establish context from scratch on every task. If you're building agent workflows on top of DeepSeek and want persistent memory between sessions without maintaining your own SQLite layer, that's the gap Macaron fills — try running a multi-session agent task and see how much context survives the handoff.
Q: Can I use DeepSeek V4's 1M context window for my agent's memory? Not yet via the API (as of March 2026 — the API still uses V3.2 with 128K). The 1M context is live in the web/app interface but hasn't been exposed through the API endpoint. For now, implement explicit memory management. At V4 API launch, very long context will reduce the need for aggressive windowing.
Q: Should I use deepseek-reasoner or deepseek-chat for my agent?deepseek-chat for any agent that needs tool calling. deepseek-reasoner for tasks that are pure reasoning without tool calls — math proofs, complex analysis, code architecture planning. For a hybrid, run a reasoning planning pass with deepseek-reasoner, extract the plan as text, then execute with deepseek-chat.
Q: How do I prevent my agent from looping indefinitely? Two mechanisms working together: a max_iterations counter (15–20 is usually sufficient), and a total session timeout (120–180 seconds for most tasks). The counter handles the "model can't figure out how to stop" case; the wall-clock timeout handles tool call latency adding up unexpectedly.
Q: What's the right tool_choice setting?"auto" (the default when tools are present) for ReAct agents — let the model decide when to call tools. Use "required" only when you know the task always needs a tool call and you want to force it. "none" means the model will not call any tool and instead generates a message. "required" means the model must call one or more tools. Avoid "required" for general agents — it will cause the model to force tool calls even when a natural language answer would be better.
Q: Does LangChain support async agent execution with DeepSeek? Yes. ChatDeepSeek supports ainvoke, astream, and abatch. For high-throughput agents processing many tasks in parallel, use asyncio.gather with llm.ainvoke rather than running sequential blocking calls.
Q: How do I handle a tool that takes a long time to execute? Return a pending result immediately and implement a polling tool. Define a check_status(job_id) tool alongside your long-running start_job(...) tool, and let the ReAct loop poll for completion. This keeps the conversation history progressing without blocking the event loop on a slow external call.
Q: What's the maximum number of tools I can register? A maximum of 128 functions are supported. In practice, agents with more than 10–15 tools tend to make worse routing decisions — the model has to navigate a larger decision space. Group related actions into higher-level tools rather than exposing every granular operation.
From next article:
DeepSeek V4 vs R1: Which Model Should You Actually Use?
DeepSeek V4 Parameters: 671B MoE Architecture Explained
DeepSeek V4 Benchmarks: MMLU, HumanEval & SWE-bench
DeepSeek V4 Version History: V3 → V3-0324 → V4 Timeline (2026)