how-to 10 min read May 04, 2026

Implement the ReAct Pattern in 50 Lines of Python (Then 10 Lines with Claude Agent SDK)

Q: What is the ReAct pattern in AI agents?

ReAct (Reasoning + Acting) is an agent pattern introduced by Yao et al. (ICLR 2023) that interleaves natural-language reasoning with tool calls. The agent loops through Thought, Action, and Observation steps until it produces a final Answer. It outperformed pure chain-of-thought reasoning on HotpotQA and ALFWorld and is now the default loop in Claude Code, LangGraph, and the OpenAI Agents API.

Q: How is ReAct different from chain-of-thought prompting?

Chain-of-thought reasons using only what the model already knows, which makes it prone to hallucination. ReAct lets the model reason AND call external tools, then read the tool output back as an Observation before reasoning again. The result is lower hallucination and higher factual accuracy.

Q: Can I implement ReAct without LangChain or LlamaIndex?

Yes. A working ReAct agent fits in roughly 50 lines of Python: a system prompt that defines the Thought/Action/Observation grammar, a regex parser for the Action line, a tool registry, and a while loop that stops on 'Answer:' or max_turns. Frameworks add streaming, retries, and observability, not the core loop.

By Peter Foy

Build a working ReAct agent from scratch in 50 lines of Python, then rebuild it in 10 lines with Claude Agent SDK. Code, loop diagram, failure modes.

TL;DR

The ReAct pattern is a Thought -> Action -> Observation loop introduced by Yao et al. (ICLR 2023) that lets an LLM interleave reasoning with tool calls. You can implement a working ReAct agent in 50 lines of Python with just a system prompt, a regex parser, a tool dict, and a while loop -- or rebuild the same agent in 10 lines using the Claude Agent SDK.

ReAct = Thought + Action + Observation in a loop until the model emits an Answer
50-line from-scratch implementation: system prompt, regex parser, tool registry, stop condition
Claude Agent SDK collapses the same loop into 10 lines with retries, history, and budget caps
Original paper: Yao et al., ICLR 2023 (arxiv 2210.03629) -- still the canonical reference
ReAct fails on hallucinated tools, infinite loops, and quadratic token cost -- graduate to Plan-and-Execute or ReWOO when workflows stabilize

The ReAct pattern is a loop: Thought -> Action -> Observation -> Thought, repeated until the agent produces a final answer. Introduced in Yao et al. (ICLR 2023), it now powers Claude Code, LangGraph, CrewAI, and the OpenAI Agents API. This guide skips the paper summary and builds a working ReAct agent in 50 lines of Python (no framework), then rebuilds the same agent in 10 lines with the Claude Agent SDK so you can see exactly what frameworks add and what they hide.

What is the ReAct pattern in AI agents?

ReAct (Reasoning + Acting) is an agent pattern that interleaves natural-language reasoning traces with tool calls inside a single loop. The model emits a Thought, picks an Action, reads back the Observation, and repeats until it produces an Answer.

It was introduced by Yao et al. at ICLR 2023 (Princeton + Google Research). The paper showed that letting the model both reason and act, instead of doing one or the other, materially reduces hallucination and improves task success.

On the ALFWorld text-game benchmark, ReAct hit 71% success vs 45% for action-only baselines and 37% for the imitation-learning BUTLER baseline. On HotpotQA, ReAct combined with chain-of-thought outperformed either approach alone.

The loop is dead simple:

flowchart LR
    U[User question] --> T[Thought]
    T --> A[Action: tool call]
    A --> O[Observation: tool result]
    O --> T
    T -->|done| F[Final Answer]

Every production agent framework today -- Claude Agent SDK, LangGraph, CrewAI, OpenAI Agents API -- is a variant of this state machine with retries, streaming, and observability bolted on.

ReAct vs Baselines on ALFWorld (Success Rate)

Act-only

45%

BUTLER (imitation)

37%

ReAct

71%

Source: Yao et al., ReAct paper (ICLR 2023)

How does ReAct differ from chain-of-thought prompting?

Chain-of-thought (CoT) reasons in a closed loop using only the model's parametric knowledge. ReAct lets the model reason AND call external tools, reading the tool output back into the next reasoning step. That single change cuts hallucination and unlocks tasks the model has never seen.

Here is the difference on a real query:

Step	Chain-of-Thought	ReAct
1	Thought: I think Django was created in 2003...	Thought: I should look this up.
2	Answer: 2003 (probably wrong)	Action: wikipedia("Django web framework")
3	--	Observation: Released July 2005.
4	--	Answer: 2005 (correct)

In the original ReAct paper, ReAct overcame the fact hallucination and error propagation that plague CoT on HotpotQA by interacting with a Wikipedia API.

The followup work FireAct (Chen et al. 2023) measured this directly: few-shot ReAct on GPT-3.5 hit 31.4 EM on HotpotQA, fine-tuning on 500 ReAct trajectories pushed it to 39.2 EM (+25%), and a ReAct + CoT mix hit 41.0 EM (+31%).

The practical takeaway: use CoT when the answer is fully inside the model's training data. Use ReAct the moment you need fresh information, computation, or side effects.

ReAct on HotpotQA: Few-shot vs Fine-tuned (Exact Match)

Few-shot ReAct (GPT-3.5)

31.4 EM

Fine-tuned ReAct (500 traj.)

39.2 EM

Fine-tuned ReAct + CoT

41 EM

Source: Chen et al., FireAct (2023)

How do you implement ReAct from scratch in 50 lines of Python?

A working ReAct agent needs four things: a system prompt that locks the model into the Thought/Action/Observation grammar, a tool registry, a regex parser for the Action line, and a while loop with a stop condition. No framework required.

Here is the full implementation. Save it as react.py:

import re
import anthropic

client = anthropic.Anthropic()

SYSTEM = """You run in a loop of Thought, Action, PAUSE, Observation.
At the end of the loop you output an Answer.

Use Thought to describe your reasoning.
Use Action to call one of the tools below, then return PAUSE.
Observation will be the result of running that action.

Available tools:
- calculate: evaluate a Python math expression. e.g. calculate: 4 * 7 / 3
- wikipedia: look up a topic on Wikipedia. e.g. wikipedia: Django

Example:
Question: What is the capital of France times 2?
Thought: I need the capital, then multiply.
Action: wikipedia: France
PAUSE
"""

TOOLS = {
    "calculate": lambda q: str(eval(q, {"__builtins__": {}})),
    "wikipedia": lambda q: f"[stub article on {q}]",
}

ACTION_RE = re.compile(r"^Action: (\w+): (.*)$", re.MULTILINE)

def react(question: str, max_turns: int = 6) -> str:
    messages = [{"role": "user", "content": question}]
    for turn in range(max_turns):
        resp = client.messages.create(
            model="claude-sonnet-4-5",
            system=SYSTEM,
            messages=messages,
            max_tokens=1024,
            stop_sequences=["PAUSE"],
        )
        text = resp.content[0].text
        print(f"--- Turn {turn} ---\n{text}\n")
        messages.append({"role": "assistant", "content": text})
        if "Answer:" in text:
            return text.split("Answer:", 1)[1].strip()
        match = ACTION_RE.search(text)
        if not match:
            return text  # model gave up the grammar
        tool, arg = match.group(1), match.group(2).strip()
        obs = TOOLS[tool](arg) if tool in TOOLS else f"Unknown tool: {tool}"
        messages.append({"role": "user", "content": f"Observation: {obs}"})
    return "Max turns reached."

if __name__ == "__main__":
    print(react("What is 23 times 47, plus the year Django was first released?"))

That is 47 lines including imports and a runnable __main__. Walk through what each piece does:

System prompt locks the output format. The model now emits Thought:, Action:, and Answer: lines on every turn.
stop_sequences=["PAUSE"] halts generation the moment the model finishes an Action, so you do not pay for tokens after the tool call.
Regex parser extracts (tool_name, argument) from the Action line. One regex, no parser combinators.
Tool registry is a plain dict. Adding a tool is one line.
Stop condition is a 3-way OR: Answer: in text, no Action match, or max_turns exhausted. Without this you get infinite loops.

The full file is published as a public GitHub gist -- fork it and swap in real tools.

How does the Claude Agent SDK implement ReAct in 10 lines?

The Claude Agent SDK collapses the same loop into a query() call that yields messages as the agent thinks, calls tools, and observes results. You define tools with a decorator and the SDK handles parsing, retries, history, and budget caps.

Same agent, rebuilt:

import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions, tool, create_sdk_mcp_server

@tool("calculate", "Evaluate a math expression", {"expr": str})
async def calculate(args):
    return {"content": [{"type": "text", "text": str(eval(args["expr"], {"__builtins__": {}}))}]}

async def main():
    server = create_sdk_mcp_server(name="tools", tools=[calculate])
    options = ClaudeAgentOptions(mcp_servers={"tools": server}, max_turns=6)
    async for msg in query(prompt="What is 23 * 47?", options=options):
        print(msg)

asyncio.run(main())

What the SDK is doing under the hood (per the agent loop docs):

Runs a while loop that keeps calling Claude until stop_reason != "tool_use".
Maintains the conversation history across turns by appending tool_use and tool_result blocks (you don't manage messages yourself).
Yields AssistantMessage events for each model response and UserMessage events for each tool result.
Enforces max_turns (counts only tool-use turns) and max_budget_usd caps so a runaway loop cannot drain your account.
Handles parallel tool calls, streaming, and MCP servers out of the box.

The 10-line version is doing strictly more than the 50-line version: structured tool schemas, automatic retries on transient API errors, parallel tool execution, and budget enforcement. The 50-line version is for understanding. The SDK is for production.

When does ReAct fail and what are the alternatives?

ReAct fails in three predictable ways: hallucinated tool names, infinite action loops, and cost blow-ups from quadratic context growth. Once your workflows stabilize, you usually graduate to a different pattern.

The failure modes (Towards Data Science, 2025):

Hallucinated tool names. The model invents a tool that does not exist (web_browse instead of wikipedia). Each hallucination burns a full round-trip.
Loop traps. The agent calls the same tool with the same argument 5 times in a row, getting the same answer, never converging.
Quadratic token cost. Every turn re-sends the entire history. A 10-step task costs roughly 50x a 1-step task in tokens.
No strategic foresight. ReAct optimizes the next action, not the full plan. It cannot parallelize independent subtasks.
Stochastic failures. Same task, different run, different outcome. SLAs become impossible to commit to.

The main alternatives:

Plan-and-Execute (dev.to deep-dive): one planner LLM call produces a DAG of subtasks, an executor runs tools deterministically, replanner only fires on failure. Drastically fewer LLM calls. The plan is inspectable before execution.
ReWOO (Nutrient blog): plans once with placeholder variables, runs all tools in parallel, synthesizes at the end. Two LLM calls total. About 5x more token-efficient than ReAct, but breaks if a tool returns something unexpected.
Reflexion (Shinn et al. 2023): ReAct plus a self-critique step that retries with the critique in memory. Best when you have a verifiable success signal.

When should you roll your own vs reach for a framework?

Roll your own when you are learning the pattern, when you have fewer than five tools, or when token spend is a hard constraint. Reach for a framework when you need streaming, parallel tools, MCP servers, persistent memory, subagents, or human-in-the-loop approval. Production agents almost always graduate to a framework.

A decision matrix:

Situation	Roll your own	Use a framework
Learning ReAct	YES	No
Demo / prototype	YES	YES (either works)
< 5 tools, simple flow	YES	Optional
Streaming UI	No	YES
Parallel tool calls	Hard	Built-in
MCP server integration	No	YES (Claude SDK, mcp-go)
Subagents / orchestration	Hard	YES
Human-in-the-loop approval	Hard	YES
Budget enforcement	DIY	YES (Claude SDK `max_budget_usd`)
Persistent memory across sessions	DIY	YES

The 50-line version teaches you the loop. The Claude Agent SDK runs it in production. Read the original ReAct paper, build the from-scratch version once so the abstraction is no longer magic, then never write your own again unless you have a specific reason to.

Pattern	LLM calls per task	Adaptiveness	Cost / Latency	Best for
ReAct	N (one per step)	High -- replans every turn	High	Open-ended tasks, unknown step count
Plan-and-Execute	1 plan + N tool calls + 1 replan on failure	Low -- follows the plan	Low	Predictable workflows with clear subtasks
ReWOO	2 (plan + synthesize)	None -- placeholders only	Lowest (~5x cheaper)	Tasks where tool outputs don't change strategy
Reflexion	N + critique loop	High -- self-corrects	Highest	Tasks with verifiable success signals

Frequently asked questions

What is the ReAct pattern in AI agents?

ReAct (Reasoning + Acting) is an agent pattern introduced by Yao et al. (ICLR 2023) that interleaves natural-language reasoning traces with tool calls. The agent loops through Thought, Action, and Observation steps until it produces a final Answer. It outperformed pure chain-of-thought reasoning on HotpotQA and ALFWorld and is now the default loop inside Claude Code, LangGraph, and the OpenAI Agents API.

How is ReAct different from chain-of-thought prompting?

Chain-of-thought reasons in a closed loop using only what the model already knows, which makes it prone to hallucination. ReAct lets the model reason AND call external tools (search, calculators, APIs), then read the tool output back as an Observation before reasoning again. The result: lower hallucination, higher factual accuracy, and the ability to handle tasks the model has never seen.

Can I implement ReAct without LangChain or LlamaIndex?

Yes. A working ReAct agent fits in roughly 50 lines of Python: a system prompt that defines the Thought/Action/Observation grammar, a regex parser for the Action line, a tool registry (a dict of name -> function), and a while loop that stops on 'Answer:' or hitting max_turns. Frameworks add streaming, retries, and observability, not the core loop.

How does the Claude Agent SDK implement ReAct under the hood?

The Claude Agent SDK runs a while loop that keeps calling the model and executing tools until stop_reason is no longer tool_use. It maintains conversation history across turns (appending tool_use and tool_result blocks), enforces max_turns and max_budget_usd caps, and yields AssistantMessage and UserMessage events as the loop progresses. It is ReAct with production guardrails.

When does ReAct fail?

ReAct fails in three predictable ways: (1) hallucinated tool names that burn the retry budget, (2) infinite loops where the agent repeats the same action with no progress, and (3) cost blow-ups because every step is a fresh LLM call with growing context. It also lacks strategic foresight, optimizing for the next action instead of the full plan.

What are the alternatives to ReAct?

The main alternatives are Plan-and-Execute (plan once, execute deterministically, replan on failure), ReWOO (plan with placeholders and synthesize at the end -- ~5x cheaper than ReAct), and Reflexion (ReAct plus a self-critique step). Use Plan-and-Execute for governed workflows, ReWOO for parallelizable read-only tasks, and Reflexion when you have a verifiable success signal.

How many LLM calls does a ReAct agent make per task?

One per Thought/Action/Observation cycle, plus a final synthesis call. For typical multi-hop QA, that is 4 to 8 calls. Each call sends the full conversation history, so token cost grows roughly quadratically with steps. This is the main reason teams move to Plan-and-Execute or ReWOO once their workflows stabilize.

Should I roll my own ReAct loop or use a framework?

Roll your own when you are learning the pattern, when latency and token spend matter, or when you need a tight tool registry under 5 tools. Reach for the Claude Agent SDK or LangGraph when you need streaming, parallel tool calls, persistent memory, subagents, MCP servers, or human-in-the-loop approval. Production agents almost always graduate to a framework.

End of article, after readers understand the loop

See more agent patterns in our open-source playbook