The ReAct pattern is a loop: Thought -> Action -> Observation -> Thought, repeated until the agent produces a final answer. Introduced in Yao et al. (ICLR 2023), it now powers Claude Code, LangGraph, CrewAI, and the OpenAI Agents API. This guide skips the paper summary and builds a working ReAct agent in 50 lines of Python (no framework), then rebuilds the same agent in 10 lines with the Claude Agent SDK so you can see exactly what frameworks add and what they hide.

What is the ReAct pattern in AI agents?

ReAct (Reasoning + Acting) is an agent pattern that interleaves natural-language reasoning traces with tool calls inside a single loop. The model emits a Thought, picks an Action, reads back the Observation, and repeats until it produces an Answer.

It was introduced by Yao et al. at ICLR 2023 (Princeton + Google Research). The paper showed that letting the model both reason and act, instead of doing one or the other, materially reduces hallucination and improves task success.

On the ALFWorld text-game benchmark, ReAct hit 71% success vs 45% for action-only baselines and 37% for the imitation-learning BUTLER baseline. On HotpotQA, ReAct combined with chain-of-thought outperformed either approach alone.

The loop is dead simple:

flowchart LR
    U[User question] --> T[Thought]
    T --> A[Action: tool call]
    A --> O[Observation: tool result]
    O --> T
    T -->|done| F[Final Answer]

Every production agent framework today -- Claude Agent SDK, LangGraph, CrewAI, OpenAI Agents API -- is a variant of this state machine with retries, streaming, and observability bolted on.

ReAct vs Baselines on ALFWorld (Success Rate)
Act-only
45%
BUTLER (imitation)
37%
ReAct
71%
Source: Yao et al., ReAct paper (ICLR 2023)

How does ReAct differ from chain-of-thought prompting?

Chain-of-thought (CoT) reasons in a closed loop using only the model's parametric knowledge. ReAct lets the model reason AND call external tools, reading the tool output back into the next reasoning step. That single change cuts hallucination and unlocks tasks the model has never seen.

Here is the difference on a real query:

Step Chain-of-Thought ReAct
1 Thought: I think Django was created in 2003... Thought: I should look this up.
2 Answer: 2003 (probably wrong) Action: wikipedia("Django web framework")
3 -- Observation: Released July 2005.
4 -- Answer: 2005 (correct)

In the original ReAct paper, ReAct overcame the fact hallucination and error propagation that plague CoT on HotpotQA by interacting with a Wikipedia API.

The followup work FireAct (Chen et al. 2023) measured this directly: few-shot ReAct on GPT-3.5 hit 31.4 EM on HotpotQA, fine-tuning on 500 ReAct trajectories pushed it to 39.2 EM (+25%), and a ReAct + CoT mix hit 41.0 EM (+31%).

The practical takeaway: use CoT when the answer is fully inside the model's training data. Use ReAct the moment you need fresh information, computation, or side effects.

ReAct on HotpotQA: Few-shot vs Fine-tuned (Exact Match)
Few-shot ReAct (GPT-3.5)
31.4 EM
Fine-tuned ReAct (500 traj.)
39.2 EM
Fine-tuned ReAct + CoT
41 EM
Source: Chen et al., FireAct (2023)

How do you implement ReAct from scratch in 50 lines of Python?

A working ReAct agent needs four things: a system prompt that locks the model into the Thought/Action/Observation grammar, a tool registry, a regex parser for the Action line, and a while loop with a stop condition. No framework required.

Here is the full implementation. Save it as react.py:

import re
import anthropic

client = anthropic.Anthropic()

SYSTEM = """You run in a loop of Thought, Action, PAUSE, Observation.
At the end of the loop you output an Answer.

Use Thought to describe your reasoning.
Use Action to call one of the tools below, then return PAUSE.
Observation will be the result of running that action.

Available tools:
- calculate: evaluate a Python math expression. e.g. calculate: 4 * 7 / 3
- wikipedia: look up a topic on Wikipedia. e.g. wikipedia: Django

Example:
Question: What is the capital of France times 2?
Thought: I need the capital, then multiply.
Action: wikipedia: France
PAUSE
"""

TOOLS = {
    "calculate": lambda q: str(eval(q, {"__builtins__": {}})),
    "wikipedia": lambda q: f"[stub article on {q}]",
}

ACTION_RE = re.compile(r"^Action: (\w+): (.*)$", re.MULTILINE)

def react(question: str, max_turns: int = 6) -> str:
    messages = [{"role": "user", "content": question}]
    for turn in range(max_turns):
        resp = client.messages.create(
            model="claude-sonnet-4-5",
            system=SYSTEM,
            messages=messages,
            max_tokens=1024,
            stop_sequences=["PAUSE"],
        )
        text = resp.content[0].text
        print(f"--- Turn {turn} ---\n{text}\n")
        messages.append({"role": "assistant", "content": text})
        if "Answer:" in text:
            return text.split("Answer:", 1)[1].strip()
        match = ACTION_RE.search(text)
        if not match:
            return text  # model gave up the grammar
        tool, arg = match.group(1), match.group(2).strip()
        obs = TOOLS[tool](arg) if tool in TOOLS else f"Unknown tool: {tool}"
        messages.append({"role": "user", "content": f"Observation: {obs}"})
    return "Max turns reached."

if __name__ == "__main__":
    print(react("What is 23 times 47, plus the year Django was first released?"))

That is 47 lines including imports and a runnable __main__. Walk through what each piece does:

  1. System prompt locks the output format. The model now emits Thought:, Action:, and Answer: lines on every turn.
  2. stop_sequences=["PAUSE"] halts generation the moment the model finishes an Action, so you do not pay for tokens after the tool call.
  3. Regex parser extracts (tool_name, argument) from the Action line. One regex, no parser combinators.
  4. Tool registry is a plain dict. Adding a tool is one line.
  5. Stop condition is a 3-way OR: Answer: in text, no Action match, or max_turns exhausted. Without this you get infinite loops.

The full file is published as a public GitHub gist -- fork it and swap in real tools.

How does the Claude Agent SDK implement ReAct in 10 lines?

The Claude Agent SDK collapses the same loop into a query() call that yields messages as the agent thinks, calls tools, and observes results. You define tools with a decorator and the SDK handles parsing, retries, history, and budget caps.

Same agent, rebuilt:

import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions, tool, create_sdk_mcp_server

@tool("calculate", "Evaluate a math expression", {"expr": str})
async def calculate(args):
    return {"content": [{"type": "text", "text": str(eval(args["expr"], {"__builtins__": {}}))}]}

async def main():
    server = create_sdk_mcp_server(name="tools", tools=[calculate])
    options = ClaudeAgentOptions(mcp_servers={"tools": server}, max_turns=6)
    async for msg in query(prompt="What is 23 * 47?", options=options):
        print(msg)

asyncio.run(main())

What the SDK is doing under the hood (per the agent loop docs):

  • Runs a while loop that keeps calling Claude until stop_reason != "tool_use".
  • Maintains the conversation history across turns by appending tool_use and tool_result blocks (you don't manage messages yourself).
  • Yields AssistantMessage events for each model response and UserMessage events for each tool result.
  • Enforces max_turns (counts only tool-use turns) and max_budget_usd caps so a runaway loop cannot drain your account.
  • Handles parallel tool calls, streaming, and MCP servers out of the box.

The 10-line version is doing strictly more than the 50-line version: structured tool schemas, automatic retries on transient API errors, parallel tool execution, and budget enforcement. The 50-line version is for understanding. The SDK is for production.

When does ReAct fail and what are the alternatives?

ReAct fails in three predictable ways: hallucinated tool names, infinite action loops, and cost blow-ups from quadratic context growth. Once your workflows stabilize, you usually graduate to a different pattern.

The failure modes (Towards Data Science, 2025):

  1. Hallucinated tool names. The model invents a tool that does not exist (web_browse instead of wikipedia). Each hallucination burns a full round-trip.
  2. Loop traps. The agent calls the same tool with the same argument 5 times in a row, getting the same answer, never converging.
  3. Quadratic token cost. Every turn re-sends the entire history. A 10-step task costs roughly 50x a 1-step task in tokens.
  4. No strategic foresight. ReAct optimizes the next action, not the full plan. It cannot parallelize independent subtasks.
  5. Stochastic failures. Same task, different run, different outcome. SLAs become impossible to commit to.

The main alternatives:

  • Plan-and-Execute (dev.to deep-dive): one planner LLM call produces a DAG of subtasks, an executor runs tools deterministically, replanner only fires on failure. Drastically fewer LLM calls. The plan is inspectable before execution.
  • ReWOO (Nutrient blog): plans once with placeholder variables, runs all tools in parallel, synthesizes at the end. Two LLM calls total. About 5x more token-efficient than ReAct, but breaks if a tool returns something unexpected.
  • Reflexion (Shinn et al. 2023): ReAct plus a self-critique step that retries with the critique in memory. Best when you have a verifiable success signal.

When should you roll your own vs reach for a framework?

Roll your own when you are learning the pattern, when you have fewer than five tools, or when token spend is a hard constraint. Reach for a framework when you need streaming, parallel tools, MCP servers, persistent memory, subagents, or human-in-the-loop approval. Production agents almost always graduate to a framework.

A decision matrix:

Situation Roll your own Use a framework
Learning ReAct YES No
Demo / prototype YES YES (either works)
< 5 tools, simple flow YES Optional
Streaming UI No YES
Parallel tool calls Hard Built-in
MCP server integration No YES (Claude SDK, mcp-go)
Subagents / orchestration Hard YES
Human-in-the-loop approval Hard YES
Budget enforcement DIY YES (Claude SDK max_budget_usd)
Persistent memory across sessions DIY YES

The 50-line version teaches you the loop. The Claude Agent SDK runs it in production. Read the original ReAct paper, build the from-scratch version once so the abstraction is no longer magic, then never write your own again unless you have a specific reason to.

PatternLLM calls per taskAdaptivenessCost / LatencyBest for
ReActN (one per step)High -- replans every turnHighOpen-ended tasks, unknown step count
Plan-and-Execute1 plan + N tool calls + 1 replan on failureLow -- follows the planLowPredictable workflows with clear subtasks
ReWOO2 (plan + synthesize)None -- placeholders onlyLowest (~5x cheaper)Tasks where tool outputs don't change strategy
ReflexionN + critique loopHigh -- self-correctsHighestTasks with verifiable success signals