Module 122 min read · Agentic AI and Autonomous Systems

What Are AI Agents?

Something fundamental shifted in AI around 2023. Models that had previously answered questions in isolation began taking actions in the world — browsing websites, writing and executing code, sending emails, booking calendar events, and completing multi-step tasks with minimal human involvement. This shift from model to agent represents one of the most significant transitions in the history of artificial intelligence, with implications that reach far beyond any individual technology product. Understanding what an AI agent actually is — and how it differs from everything that came before — is the prerequisite to everything else in this course.

The problem with the word "agent"

Before we can define what an AI agent is, we need to clear away some conceptual debris. The term "agent" is used loosely in popular discourse to describe everything from a simple chatbot with a friendly persona to a fully autonomous system that can complete week-long tasks without human input. This ambiguity is not just semantic; it causes real confusion when organizations try to evaluate what agentic AI can and cannot do for them today.

In the technical literature, the definition is more precise. An AI agent is a system that perceives its environment, reasons about what to do, takes actions to affect that environment, and continues this cycle autonomously toward a goal. The key distinguishing elements are: perception of state, autonomous reasoning, action-taking capability, and goal-directedness across multiple steps. Each of these elements is necessary; none alone is sufficient.

By this definition, a large language model accessed via API is not an agent. It is a model — a sophisticated function that maps inputs to outputs. It perceives nothing independently, takes no actions, and operates on a single turn. When you add the scaffolding to let that model perceive state, choose tools, and iterate toward a goal, you have built an agent. The model is the brain; the agent is the body plus the brain operating in an environment.

Working Definition

An AI agent is a system that perceives inputs from an environment, uses an AI model to reason about those inputs, selects and executes actions (including tool calls), observes the results, and repeats this loop autonomously until a goal condition is met or a stopping criterion is reached.

The key word is autonomously. An agent is not being told what to do at each step by a human. It decides its own next action based on its current state and goal.

The spectrum from model call to full autonomy

It is most useful to think of agentic capability as a spectrum rather than a binary. At one end sit simple, single-turn model API calls — you send a prompt, you receive a response, the interaction ends. These are immensely useful but strictly non-agentic. At the other end sit fully autonomous long-horizon agents that operate for days or weeks, managing complex multi-step workflows across many tools and environments with minimal human oversight.

Between these poles lies most of the interesting territory. Multi-turn conversational systems like ChatGPT maintain state across turns but are still largely reactive — they do not initiate actions without human prompting. Tool-augmented assistants can call functions like web search or code execution, but only when a human requests it and only for one step at a time. ReAct-style agents (Reasoning plus Acting) can chain multiple tool calls autonomously to answer a question, but still terminate when they produce a final answer. Task agents can work through complex, multi-step tasks with limited human checkpoints. Autonomous agents at the far end can self-assign subtasks, manage their own memory, spawn other agents, and operate continuously.

System Type	Autonomy	Actions	Examples
Model API call	None	Text generation only	Raw GPT-4 API, Claude API
Chatbot	Conversational	Multi-turn text	ChatGPT, Claude.ai
Tool-augmented assistant	Step-wise	Single tool calls on request	ChatGPT with browsing
ReAct agent	Chain of tools	Multi-tool reasoning loops	LangChain agents
Task agent	Goal-directed	Multi-step autonomous work	Devin, Claude computer use
Autonomous agent	Full	Self-directed long-horizon operation	AutoGPT, OpenAI Operator

Why agents represent a new paradigm

The shift from single-turn inference to agentic operation is not merely quantitative — it is qualitative. When a model makes a single inference, the scope of possible error is bounded and the consequences are easily reversed: you read the output, decide it is wrong, and try again. When an agent executes a sequence of actions in the world, each action can have real consequences that may be difficult or impossible to undo. An agent that sends an email, makes a purchase, modifies a file, or executes code has changed the state of the world, not just produced text about it.

This distinction changes what "reliability" means. A language model that is right 95% of the time is excellent. An agent that executes 20 steps and is right at each step with 95% probability has only a 36% chance of completing the entire task correctly without any error — because errors compound multiplicatively. Understanding this compounding property of agent reliability is one of the most important conceptual shifts for practitioners moving from model deployment to agent deployment.

The paradigm shift also changes the nature of the interface between AI and humans. With a model API, the human is always in the loop — they read each response and decide the next step. With an agent, the human defines a goal and the agent handles the intermediary steps. This is enormously more powerful and enormously more risky. The interface design problem — when to ask for human clarification, when to proceed autonomously, how to handle ambiguity — becomes a first-class engineering concern, not an afterthought.

The current state of agentic AI

As of mid-2020s, agentic AI is genuinely capable but also genuinely limited. This is important to understand clearly, because both the hype and the dismissal surrounding AI agents tend to be wrong in characteristic ways.

What agents do well today: well-defined, bounded tasks with clear success criteria and recoverable failure modes. An agent tasked with "research the top 5 competitors of company X and summarize their pricing strategies" can do this reliably. An agent tasked with "write all the unit tests for this Python module" can handle it with strong results. An agent given access to a browser and asked to "fill out this form on this website" will usually succeed. These are tasks that are hard for humans to scale but easy to define and verify.

What agents struggle with: long-horizon tasks requiring persistent coherent strategy across many steps, tasks that require understanding of nuanced human social context, tasks where small mistakes early cascade into large failures later, and tasks that require genuinely novel problem-solving rather than sophisticated recombination of known patterns.

Common Misconception

AI agents today are not general-purpose autonomous workers that can be given any task and left to complete it. They are powerful tools for specific categories of structured, semi-structured, and information-processing tasks. The gap between "impressive demo" and "reliably deployed at scale in production" is real and substantial. Practitioners who paper over this gap with optimism will be disappointed; those who understand it will build systems that work.

Current state-of-the-art agents succeed on roughly 15–50% of complex real-world software engineering benchmarks (like SWE-Bench), depending on the task type and scaffolding. The best agents on the best benchmarks approach 50%; most deployed agents are lower. This is genuinely impressive — and genuinely limited.

Real systems: Devin, Claude computer use, AutoGPT, and Operator

Devin, released by Cognition AI in early 2024, was the first AI system marketed explicitly as an "AI software engineer." Devin has access to a code editor, terminal, and browser. Given a software task, it plans an approach, writes code, runs tests, reads error messages, and iterates. It can complete tasks spanning multiple hours. Devin highlighted both the genuine capability of agentic coding systems and their current limitations — it handles routine coding tasks well, but struggles with novel architectural decisions and long-horizon consistency.

Claude computer use, released by Anthropic in late 2024, allows Claude to control a computer — literally moving the cursor, clicking, typing, taking screenshots, and observing the results. This is the most literal form of agentic operation: an AI navigating real software interfaces the same way a human would. Computer use is powerful precisely because it requires no API integration — any software with a GUI becomes available to the agent. It is slow and error-prone compared to API-based tool use, but it unlocks legacy systems and complex interfaces that have no programmatic access.

AutoGPT, released as an open-source project in 2023, was one of the earliest demonstrations of a fully autonomous looping agent. AutoGPT could spawn sub-agents, manage its own memory, browse the web, write and execute code, and continue working toward a goal without human input. It became a viral phenomenon and also a lesson in the limits of unconstrained autonomy — AutoGPT frequently lost coherent track of its goals over long runs, generated substantial token costs, and produced mixed results on real tasks. It was, however, enormously important as an early demonstration of what the agentic paradigm looked like in practice.

OpenAI Operator represents the commercial productization of agentic capability: an agent that can use web browsers to complete tasks on behalf of users — making reservations, filling out forms, navigating e-commerce flows. Operator illustrates where the commercial application of agents is headed: not general research assistants, but specialized agents with narrow but reliable capabilities embedded in specific workflows.

Comparing agents to chatbots and raw model calls

Understanding what makes an agent different from a chatbot requires examining the role of state and action. A chatbot like the standard ChatGPT interface maintains conversational state (what was said before) but takes no independent actions in the world. Every response is the terminal output of one conversation turn. The chatbot does not search the web unless told to, does not write to files, does not make decisions about what to do next — it only responds.

A raw model API call is even more stripped down. There is no persistent state, no memory of previous interactions. You send a prompt, the model returns a completion. The model has no concept of "what it should do next" because there is no "next" — each call is independent and atomic.

The agent, by contrast, has a goal that persists across multiple reasoning cycles. It has memory of what it has done. It has the ability to take actions that change its environment. And it has a loop — a mechanism to observe the results of its actions, update its understanding, and decide what to do next. This loop is the essential architectural difference between a model and an agent.

Analogy

A model API call is like asking someone a question and receiving an answer. A chatbot is like a sustained conversation. An agent is like hiring a contractor: you describe what you want built, and they plan the work, acquire the materials, execute the construction, handle unexpected problems, and report back when done — or when they need a decision from you.

The building blocks of any agent

Every agent system, regardless of sophistication, is built from a small set of core components. Understanding these components makes it possible to analyze any agentic system clearly:

The model (brain)

The language model that does the reasoning. This is typically a frontier model — GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro — though smaller models are increasingly used for specific subtasks. The model reads the current state, reasons about it, and decides what to do next.

The context window (working memory)

Everything the model can "see" at one time: the system prompt defining its role and capabilities, the conversation history, tool call results, retrieved documents, and any other information it needs to reason about the current state. Context windows in frontier models now range from 128K to over 1M tokens.

Tools (actions)

The set of functions the agent can call to interact with the world. These might include web search, code execution, file read/write, database queries, API calls, email sending, or browser automation. Tool use is the mechanism by which agents change the state of their environment.

The orchestration loop (control flow)

The scaffolding code that runs the agent: send state to model, receive response, parse any tool calls, execute those tools, add results to context, repeat. This loop can be simple (a basic while loop) or complex (branching, parallel execution, handoffs to sub-agents).

Memory systems (persistence)

Mechanisms for storing and retrieving information beyond what fits in the context window — vector databases, conversation summaries, file systems, databases. Memory allows agents to work on tasks longer than a single context window and to learn from past interactions.

Capabilities and limits today

The honest assessment of agentic AI capability in the current moment requires acknowledging both what is genuinely impressive and where the walls are. Practitioners who build with agents quickly discover that the failure modes are specific and consistent — they are not random noise but predictable categories of limitation that good design can partially mitigate.

Current agents excel at: web research and synthesis, code generation and execution, document processing and analysis, structured data extraction, form-filling and interface navigation, and coordination of multi-step information workflows. These capabilities, while bounded, are commercially valuable and practically useful today.

Current agents struggle with: maintaining coherent strategy across very long task horizons (more than ~50 steps), tasks requiring deep world knowledge or physical intuition, complex social and organizational navigation, tasks where the right answer requires genuine creativity rather than sophisticated recombination, and any task where a mistake early on is catastrophic and unrecoverable.

The Practitioner's Orientation

The most productive stance toward current agentic AI is: bounded autonomy with human oversight. Design agents to handle well-defined subtasks reliably, build in checkpoints where humans can review and correct before proceeding to consequential actions, and build recovery mechanisms for the inevitable failures. Agents deployed with this philosophy outperform both fully manual workflows (because the agent handles the tedious steps) and fully autonomous agents (because humans catch the agent's mistakes before they compound). The goal is not to remove humans from the loop — it is to put humans in the right parts of the loop.

Why this matters now

The reason to care about agentic AI right now — not in five years, not when the technology is more mature — is that the transition from single-turn to agentic operation is happening in production systems across every industry. Organizations that understand the paradigm now are building the competencies to deploy agents safely and effectively. Those that wait will find the conceptual gap harder to close as systems grow more complex and the competitive landscape shifts.

The rest of this course builds the technical foundation for working with agents seriously: the perception-reasoning-action loop, tool use, memory systems, multi-agent architectures, planning and task decomposition, retrieval-augmented generation, safety and reliability, major frameworks, and the trajectory of the field. By the end, you will have the conceptual vocabulary and technical understanding to build, evaluate, and deploy agentic systems in production — and to make informed decisions about when agents are the right tool and when they are not.