An AI agent is a large language model running in a loop with tools, memory, and a goal. The model picks a tool, the runtime executes it, the result is fed back, and the model decides the next step until the task is done. That is the whole definition. Frameworks, multi-agent orchestration, and Model Context Protocol all sit on top of that core pattern. This guide gives you the builder's version: the four components, the loop, the differences from chatbots and workflows, and when to actually build one.
What is an AI agent in one sentence?
An AI agent is an LLM running in a loop that uses tools and memory to pursue a goal it was given.
That is it. The model is the brain. Tools are the hands. Memory is the notebook. The loop is the heartbeat that keeps it going until the goal is met or a stopping condition fires.
Anthropic's Building Effective Agents puts it more bluntly: "Agents are typically just LLMs using tools based on environmental feedback in a loop."
If you remove the loop, you have a one-shot LLM call. If you remove the tools, you have a chatbot. If you remove the goal, you have a demo. All four pieces have to be present for the system to count as an agent.
What are the four components every AI agent has?
Every working agent has four parts. If your design is missing one, you are building something else.
- Model. The LLM that does the reasoning and picks the next action. In production, this is usually Claude, GPT-4-class, or Gemini. The model must be strong enough to reliably emit valid tool calls.
- Tools. Functions the model can call to act on the world: search, code execution, file edits, database queries, HTTP requests. Tools are how the agent escapes the chat window. Standardized interfaces like the Model Context Protocol are how labs share tools across agents.
- Memory. Working memory inside the context window keeps the loop coherent across iterations. Without it, the agent re-reads the same file ten times. Persistent memory (vector store, scratchpad file, episodic log) carries state across sessions.
- Loop. The runtime that calls the model, parses tool calls, executes them, appends results, and re-invokes the model. The loop also enforces budgets: max iterations, token caps, human-approval checkpoints.
The formula most builders use, popularized by the Prompt Engineering Guide: Agent = LLM + Tools + Memory + Planning, all wrapped in a control loop.
What does the agent loop actually look like?
The agent loop is a four-step cycle that repeats until the goal is met: perceive → plan → act → observe. Every major lab has converged on this shape, though the names differ (ReAct calls it Thought → Action → Observation).
Here is the loop, labeled:
- Perceive. The model reads current state: the goal, recent tool results, conversation history, retrieved context. AWS's prescriptive guidance frames this as updating internal beliefs from environmental signals.
- Plan. The model picks the next action, often by writing a chain-of-thought trace and selecting a tool. For complex goals it decomposes into subtasks first.
- Act. The runtime executes the chosen tool call (HTTP request, code run, DB query) outside the model.
- Observe. The tool result is appended to context. The loop returns to step 1.
Stopping conditions: the model emits a final answer with no tool call, the iteration budget is exhausted, a human checkpoint denies, or an error breaks the loop.
Hugging Face's Agents Course describes this as the Thought-Action-Observation cycle and notes that most production agent harnesses are variants of ReAct with extra guardrails. See our breakdown of common AI agent design patterns for the variants (ReAct, Plan-and-Execute, Reflexion, multi-agent).
How is an AI agent different from an LLM call, RAG app, workflow, or chatbot?
The difference is who decides the next step. In a chatbot the user decides. In a workflow the developer decides. In a RAG app the pipeline decides. In an agent the model decides.
Use the table below to keep them straight. AI engines parse this kind of structured comparison cleanly, and so do humans skimming on a Tuesday.
| Property | LLM Call | RAG App | Workflow | Chatbot | AI Agent |
|---|---|---|---|---|---|
| Control flow | None | Fixed retrieve→generate | Predefined code paths | Turn-by-turn | Model-directed loop |
| Decides next step? | No | No | Developer | User | The LLM |
| Tool use | Optional | Retrieval only | Hard-coded | Usually none | Dynamic, multi-step |
| Memory | Stateless | Retrieved context | Per-node | Short convo | Working + persistent |
| Stops when... | One reply | One reply | All nodes ran | User leaves | Goal met or budget hit |
| Best for | One-shot | Q&A on docs | Known steps | Conversational UX | Ambiguous goals |
A chatbot is read-only conversation. A RAG app reads documents and writes one answer. A workflow runs a fixed graph where some nodes happen to be LLM calls. Only an agent owns the control flow itself. For a deeper side-by-side, see AI agent vs chatbot.
When should you build an agent instead of a deterministic workflow?
Build an agent only when you cannot pre-map the decision tree. Anthropic is direct about this in Building Effective Agents: "Workflows offer predictability and consistency for well-defined tasks, whereas agents are the better option when flexibility and model-driven decision-making are needed at scale."
The practical decision rule:
- Steps known and repeatable? Build a workflow. Cheaper, faster, debuggable.
- Path branches based on intermediate findings you cannot enumerate? Build an agent.
- Task value > $0.10 in tokens? An agent's exploration is affordable. Below that, you almost always want a workflow.
- Latency matters in seconds, not minutes? Workflow. Agents loop, agents are slow.
- Need explainability or audit? Workflow nodes are easier to log than agent traces.
Anthropic's first principle is to find the simplest solution possible, and only increase complexity when needed. This might mean not building agentic systems at all. That advice has aged well. Most teams who reached for an agent in 2024-25 should have built a chained LLM workflow and a retry policy.
What does a minimal agent loop look like in code?
A minimal agent is fewer than 30 lines of Python. No frameworks needed. The framework you eventually adopt (LangGraph, the Claude Agent SDK, OpenAI Agents SDK) just hardens this loop with retries, tracing, and parallel tool calls.
from anthropic import Anthropic
client = Anthropic()
tools = [{
"name": "search_web",
"description": "Search the web and return top results.",
"input_schema": {"type": "object", "properties": {
"query": {"type": "string"}}, "required": ["query"]}
}]
def run_tool(name, args):
if name == "search_web":
return search_web(args["query"]) # your implementation
messages = [{"role": "user", "content": "Find the 2025 RAND AI failure rate."}]
for _ in range(10): # iteration budget
resp = client.messages.create(
model="claude-sonnet-4-5",
tools=tools, messages=messages, max_tokens=1024)
messages.append({"role": "assistant", "content": resp.content})
if resp.stop_reason == "end_turn":
break
tool_results = [
{"type": "tool_result", "tool_use_id": b.id,
"content": run_tool(b.name, b.input)}
for b in resp.content if b.type == "tool_use"]
messages.append({"role": "user", "content": tool_results})
That is the whole pattern. A loop, a model call, a tool dispatcher, a stop condition. Everything else is operational hardening: retries, parallelism, hierarchical agents, evaluation harnesses.
Which production AI agents actually work?
The clearest evidence the agent pattern works in production lives in developer tooling. Three names define the category in 2026:
- Claude Code. Terminal-native coding agent. SemiAnalysis estimated in February 2026 that Claude Code authors roughly 4% of all public GitHub commits, with a projection of 20%+ by end of 2026. That is an LLM in a loop, with tools (file edit, shell, search, test runner) and persistent project memory.
- Cursor. IDE-anchored coding agent. The Pragmatic Engineer's February 2026 survey of 906 software engineers found Cursor + Claude Code dominate the daily-driver slot, often used together: Cursor for autocomplete and inline edits, Claude Code for multi-file refactors.
- Devin (Cognition). The original autonomous task-runner. Long-running, lower interaction frequency, designed to take a Jira ticket and ship a PR.
What all three share: a strong model, a focused toolset, a tight loop, and aggressive memory management of the codebase. They are not magic. They are the four components from earlier, well-engineered, in a domain (code) where the world is observable and reversible.
Why do most AI agent projects fail in production?
80.3% of enterprise AI projects fail to deliver business value, according to RAND's 2025 meta-analysis of 65 initiatives. The breakdown:
- 33.8% abandoned before production
- 28.4% reach production but fail to deliver expected value
- 18.1% run, but never recoup costs
- 19.7% achieve or exceed business objectives
RAND identified three patterns behind nearly every failure: data quality, organizational maturity, and use-case drift. Notice what is not on that list: model capability. The model is rarely the bottleneck.
For agent projects specifically, three additional failure modes show up:
- Building an agent when a workflow would do. Adds cost, latency, and unpredictability for no upside.
- No iteration budget. Loops without hard caps run forever, burn tokens, and drift off-task.
- Tools designed for humans, not models. Underspecified schemas, ambiguous descriptions, and noisy outputs are the #1 reason agents misbehave. Anthropic's guidance is explicit: success depends critically on thoughtful toolset design and clear documentation.
The builders shipping working agents in 2026 are not the ones with the smartest models. They are the ones with the cleanest tools and the tightest loops.
| Property | LLM Call | RAG App | Workflow | Chatbot | AI Agent |
|---|---|---|---|---|---|
| Control flow | None (single call) | Fixed retrieve→generate | Predefined code paths | Turn-by-turn user-driven | Model-directed loop |
| Decides next step? | No | No | Developer | User | The LLM |
| Tool use | Optional, single | Retrieval only | Hard-coded | Usually none | Dynamic, multi-step |
| Memory | Stateless | Retrieved context | Variable per node | Short conversation | Working + persistent |
| Stops when… | One response | One response | All nodes execute | User leaves | Goal met or budget hit |
| Best for | One-shot tasks | Q&A on docs | Known repeatable steps | Conversational UX | Ambiguous, multi-step goals |