Module 224 min read · Agentic AI and Autonomous Systems

The Agent Loop: Perceive, Reason, Act

Every AI agent, from the simplest to the most sophisticated, executes some variant of the same fundamental cycle: perceive the current state of the world, reason about what that state means and what should be done, take an action, observe what changed, and repeat. This observe-think-act loop is not an implementation detail — it is the defining structure of agency itself. Understanding it deeply, in all its nuances and variations, is the foundation for building agents that work and diagnosing agents that do not.

The observe-think-act cycle unpacked

The canonical agent loop has three phases. In the observe phase, the agent gathers information about its current state — this might mean reading the output of a previous tool call, loading the contents of a file, querying a database, or simply reading the most recent message in a conversation. The key insight is that observation is always mediated: agents do not perceive the world directly but through specific channels that return specific data types. What an agent can observe is determined entirely by its toolset and the information placed in its context window.

In the think phase, the language model processes the current context — everything the agent knows, has done, and has observed — and produces a response. This response may include reasoning (explicit chain-of-thought), a decision about which action to take, parameters for that action, and any other text the model generates. The think phase is where the model's reasoning capability determines the quality of the agent's decisions. A better model, given the same context, will generally make better decisions — but a well-designed context can compensate for a weaker model to a significant degree.

In the act phase, the agent executes whatever action the model decided on. This might be calling a tool, sending a message, writing to a file, or — crucially — deciding that the task is complete and returning a final answer. The act phase is where the agent interacts with the external world, and where the consequences of decisions become real.

The Loop as Code

In its simplest form, the agent loop is literally a while loop: while the task is not complete, observe the current state, call the model, parse the response, execute any actions, add the results to context, and loop. The sophistication of an agentic system lies not in replacing this loop but in enriching each phase — more sophisticated observation, better context management, more careful action parsing, richer tool ecosystems.

How agents process environmental state

The "environment" of an AI agent is the totality of what the agent can observe. For a code-writing agent, the environment might be the codebase it is editing, the terminal output of tests it runs, and the error messages produced by the compiler. For a research agent, the environment might be web search results, document contents, and the conversation history. For a computer-use agent, the environment might be screenshots of the current screen state, rendered as image tokens alongside text.

State representation is one of the most important design decisions in agent architecture. The agent can only reason about what is in its context window — everything outside the context is invisible until the agent explicitly retrieves it. This creates a fundamental tension: you want the context to contain enough information for good decisions, but context windows are finite and expensive, and too much irrelevant information degrades reasoning quality.

Expert practitioners develop a clear sense of what state their agents actually need. A coding agent reviewing a bug does not need the entire codebase in context — it needs the file where the bug is, the related test file, and perhaps the relevant parts of the documentation. Providing all 50,000 lines of the codebase would overwhelm the context and degrade performance. Selective state representation — deciding what to include and exclude — is as important as any other architectural decision.

How agents decide what to do

The decision-making process inside the think phase is where the language model's capability most directly determines agent quality. The model reads the current context and must produce a decision: what action to take next, or whether to terminate and return an answer. Different prompting strategies produce different decision quality:

Direct action selection
The model reads context and immediately outputs an action — no intermediate reasoning. Fast and token-efficient, but produces lower-quality decisions on complex tasks because the model has no explicit reasoning trace to build on. Appropriate for simple, well-defined tasks.
Chain-of-thought reasoning
The model reasons through the problem step-by-step in text before committing to an action. The reasoning trace is part of the output — the model thinks out loud. This significantly improves decision quality on complex tasks because the explicit reasoning reduces errors that arise from making implicit leaps.
ReAct (Reason + Act)
Interleaved reasoning and action: the model thinks, then acts, then thinks about what it observed, then acts again. This is the dominant paradigm for tool-using agents because it allows the model to incorporate tool results into its reasoning before deciding the next action. Produces higher-quality multi-step reasoning than either pure action or pure reasoning approaches alone.
Plan-then-execute
The model first produces a complete plan for how to accomplish the task, then executes each step. Better for tasks where the structure is known in advance; less good for tasks where early actions produce information that should change the plan. Plan-then-execute agents often need replanning capability to handle discoveries.

Action execution and feedback loops

When an agent calls a tool, the orchestration layer executes that tool and returns the result. This result — whether it is a web page's HTML, a code execution output, a database query result, or an API response — is added to the agent's context and becomes part of the state for the next reasoning cycle. This is the feedback loop: actions produce observations, observations inform the next decision.

The quality of this feedback loop is determined by two factors: the richness of the tool output and the agent's ability to extract the signal from it. A web search that returns the full HTML of a webpage might contain the answer the agent needs buried in thousands of tokens of navigation links, ads, and boilerplate — the agent has to extract the relevant content. This is why many agent systems preprocess tool outputs before adding them to context: web pages get cleaned to main content, code execution outputs get truncated to manageable length, API responses get filtered to relevant fields.

Feedback loops can also fail in characteristic ways. The most common failure mode is silent failure: a tool call that technically succeeds but returns an output that does not contain what the agent expected, and the agent treats this null result as confirmation rather than investigating further. An agent that searches for information, gets back a search result page rather than the specific answer, and proceeds as if it found the answer has fallen into a silent failure. Designing agent systems to be skeptical of tool outputs — explicitly checking whether they contain the expected information before proceeding — is a critical reliability practice.

The Confirmation Bias Problem

Language models have a tendency toward what researchers call "sycophantic" reasoning — interpreting ambiguous evidence in the direction that supports proceeding rather than stopping to investigate. In an agent context, this means agents tend to interpret ambiguous tool outputs as successful and keep moving forward, even when a more careful reading would reveal that the tool call didn't actually accomplish what was intended. This is one of the most important failure modes to design against explicitly.

Reactive vs deliberative agents

A useful conceptual distinction in agent design is between reactive and deliberative architectures. Reactive agents are purely stimulus-response: they observe some state and immediately produce an action without any internal deliberation or planning. Pure reactive agents were common in classical robotics (like Braitenberg vehicles) and are still useful for narrow, fast-response tasks where reasoning overhead is too slow.

Deliberative agents maintain an internal model of the world, form goals, and reason about plans to achieve those goals. Modern language model-based agents are deliberative by nature — the language model's context window serves as the internal model, and the chain-of-thought output is the deliberation. The quality of deliberation scales with model capability and context quality.

Most practical agent architectures are hybrid: deliberative at the level of task planning and action selection, but with reactive sub-routines for fast, simple responses to well-defined stimuli. A research agent might deliberatively plan which sources to consult, but reactively decide whether a search result is relevant based on simple heuristics before passing it to the model for deeper analysis.

Tool use as the core action primitive

The most important category of agent action is tool use. Tools are functions that the agent can call to interact with the world: search engines, code interpreters, file systems, databases, APIs, web browsers, email clients, calendars. The set of tools available to an agent defines the boundary of what it can do.

Tool use is implemented via function calling APIs (covered in depth in Module 3), where the model outputs a structured specification of which function to call and with what arguments. The orchestration layer intercepts this, executes the function, and returns the result. From the model's perspective, tool calls appear in its context as structured input/output pairs.

The design of the tool interface — what tools exist, what they accept, what they return, how errors are communicated — has an enormous effect on agent capability. A web search tool that returns raw HTML will produce a very different agent than one that returns cleaned, truncated main content. A code execution tool that returns only the final output will produce different agent behavior than one that returns stdout, stderr, and a return code separately.

A concrete worked example: Research agent loop

Let us trace through a concrete example to make the agent loop tangible. Suppose we have a research agent tasked with: "What is the current market share of the top three cloud providers, and how has it changed in the last two years?"

Iteration 1 — Observe: The agent sees only the initial task. Think: The model reasons that to answer this, it needs current market share data from a reliable source. It decides to search for recent cloud market share reports. Act: It calls the search tool with query "cloud provider market share 2024 2025 AWS Azure GCP".

Iteration 2 — Observe: The agent reads the search results — a list of URLs and snippets. Think: It identifies which results look most authoritative (Synergy Research Group, Gartner, IDC reports). It decides to fetch the most relevant URL. Act: It calls the fetch_url tool on the most promising result.

Iteration 3 — Observe: The agent reads the fetched page content, which contains market share percentages for Q1 2025. Think: It has 2025 data but needs 2023 data for comparison. It searches for historical data. Act: Another search call.

Iteration 4–6: Similar pattern — fetch, read, identify what is still missing, search for it. Iteration 7 — Observe: The agent now has data from both 2023 and 2025. Think: It has enough to answer the question. It reasons through the numbers, identifies the trend, and decides to generate a final answer. Act: It returns the final structured answer with the data and trend analysis, terminating the loop.

What Made This Agent Work

Several design elements made this hypothetical agent succeed: a clear termination criterion (the agent knows when it has enough data), tools that return useful structured content (not raw noisy HTML), a model that can identify what information is missing and form targeted follow-up queries, and a context window large enough to hold all the gathered information for final synthesis. Remove any of these and the agent quality degrades significantly.

Stopping criteria and termination

One of the most underappreciated design problems in agent architecture is knowing when to stop. An agent that runs forever on a task that has no clear stopping point will exhaust its token budget and fail. An agent that stops too early will return incomplete results. Getting this right requires explicit design attention.

The most common stopping criteria are: the model explicitly decides to return a final answer (the model self-terminates), a maximum number of iterations is reached (a safety circuit breaker), a maximum token budget is consumed, or an explicit stopping signal is returned from a tool (e.g., a test suite that returns "all tests passing").

The model's ability to self-terminate reliably is a function of how clearly the task and completion criteria are defined in the system prompt. Agents given vague tasks tend to continue indefinitely, generating increasingly speculative actions. Agents given clearly defined success criteria terminate appropriately. This is one of the most important contributions of good prompt engineering to agent reliability: not better reasoning, but better stopping.

The role of context management in the loop

As the agent loop runs, the context window fills up. Each tool call adds observation data; each reasoning step adds output text. In long-running agents, the context can grow to hundreds of thousands of tokens. Context management — deciding what to keep, what to summarize, what to discard — becomes critical.

Naive context management (keep everything until you hit the limit, then error out) is the most common approach and the worst. Better approaches include: summarization (compressing earlier parts of the conversation into summaries, losing some detail but preserving key facts), selective retention (keeping only the most recent N tool calls, discarding older ones), and external memory (offloading older context to a vector database and retrieving it as needed, covered in Module 4).

The choice of context management strategy has a direct effect on agent quality over long runs. Agents that naively truncate their context lose important earlier observations. Agents that summarize intelligently can maintain coherent operation over much longer horizons. This is one of the key architectural decisions that separates toy agents from production-grade systems.

Worked example: The agent loop in code

Here is a simplified but realistic representation of what an agent loop looks like in Python. Understanding this structure makes it possible to reason about any agent framework you encounter:

def run_agent(task: str, tools: list, max_iterations: int = 20):
    context = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": task}
    ]

    for iteration in range(max_iterations):
        # THINK: call the model with current context
        response = model.chat(messages=context, tools=tools)

        # Check if model wants to terminate
        if response.stop_reason == "end_turn":
            return response.content  # final answer

        # ACT: execute any tool calls
        if response.tool_calls:
            context.append({"role": "assistant", "content": response})
            for tool_call in response.tool_calls:
                result = execute_tool(tool_call.name, tool_call.args)
                # OBSERVE: add result to context
                context.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": str(result)
                })

    return "Max iterations reached"  # safety termination

This structure — context accumulation, model call, tool execution, result observation — is the skeleton of every agent, regardless of how sophisticated the surrounding framework. LangChain, AutoGen, CrewAI, and Anthropic's agent patterns all implement this same fundamental loop with varying degrees of abstraction and capability layered on top.