Module 932 min read · Agentic AI and Autonomous Systems

Building with Agent Frameworks

The gap between understanding how agents work in theory and shipping an agent that works reliably in production is where most practitioners get stuck. Agent frameworks — LangChain, LangGraph, AutoGen, CrewAI, and the emerging wave of newer entrants — exist to close that gap. They provide abstractions for the agent loop, memory management, tool orchestration, and multi-agent coordination that would otherwise require hundreds of lines of scaffolding code. This module examines the major frameworks, their trade-offs, and the practical decisions involved in building production-grade agentic systems.

Why frameworks exist

An agent built from scratch using only a raw LLM API quickly accumulates boilerplate: you need to handle the tool call → execute → observe loop, manage conversation history so it doesn't exceed context limits, route between multiple tools, handle retries on API failures, log every step for debugging, and integrate with persistent storage. Each of these is solvable, but solving all of them repeatedly across projects is expensive and error-prone.

Frameworks abstract these concerns. They provide a defined interface for tools (register a function and the framework handles schema generation, call routing, and result injection), built-in memory management patterns, composable chain-of-processing abstractions, and increasingly, first-class support for multi-agent workflows. The cost is abstraction overhead: framework code adds indirection that can make debugging harder and makes it more difficult to deviate from the framework's opinionated design when your use case doesn't fit its model.

The decision of when to use a framework versus building custom is not primarily about capability — anything a framework does, you can implement yourself — but about development velocity and maintenance burden. For prototyping and common patterns, frameworks win. For highly specialized requirements, custom implementations often end up cleaner.

LangChain: the dominant but divisive framework

LangChain is the most widely adopted agent framework, with millions of downloads and extensive community support. It provides abstractions for chains (sequences of LLM calls and operations), agents (LLM-driven decision makers with tools), and memory (persistent conversation state). Its ecosystem is large: there are community integrations for hundreds of LLM providers, vector databases, and external services.

Core concepts

LangChain's architecture centers on the LCEL (LangChain Expression Language) syntax for composing processing pipelines. You define components — prompts, LLMs, output parsers, tools — and compose them with the pipe operator (|). The result is a chain that takes an input dict and produces an output, with each step's output feeding the next. For agents specifically, LangChain provides AgentExecutor, which wraps the agent loop: call the LLM, check whether it wants to use a tool, execute the tool, inject the result, repeat until the LLM produces a final answer.

LangChain's strengths

The integration ecosystem is LangChain's primary strength. If you need to connect to a specific vector database, cloud storage provider, or external service, there is almost certainly a LangChain integration that handles authentication, schema mapping, and error handling. This reduces the cost of switching infrastructure components — you can swap the underlying vector store from Pinecone to Weaviate with a one-line change rather than rewriting integration code. For teams iterating quickly on infrastructure decisions, this matters enormously.

LangChain's criticisms

LangChain has faced persistent criticism for abstraction layers that make simple tasks unnecessarily complex, frequent breaking changes across major versions, and performance overhead from deeply nested chain invocations. The framework's early rapid growth led to architectural decisions that accumulated technical debt, and some teams have found that LangChain's abstractions obscure what is actually happening in a way that makes debugging slow.

The criticism is not unfair, but it is also not universal. For teams building CRUD-style LLM apps — document question-answering, conversational interfaces, structured data extraction — LangChain's abstractions fit well and the integration ecosystem provides genuine value. For teams building complex custom agent logic, the framework's opinions can become constraints.

LangGraph: stateful agent workflows

LangGraph, built by the LangChain team, addresses a fundamental limitation of simple chain-based agents: they cannot handle state that persists across multiple decision points, loops, or conditional branches. LangGraph models agent execution as a directed graph where nodes are processing steps (LLM calls, tool executions, human interactions) and edges represent transitions that can be conditional, looping, or parallel.

When graphs beat chains

Consider an agent that must: (1) search for information, (2) determine whether the search result is sufficient, (3) if not, refine the query and search again, (4) if yes, draft a response, (5) validate the response against the original requirements, and (6) if validation fails, revise. This workflow has conditional branches and loops that a simple sequential chain cannot represent. LangGraph encodes this as a graph with nodes for each step and edges that include conditional logic for routing.

The graph representation makes complex agent workflows explicit and visualizable. You can inspect the state at any node, add human-in-the-loop interrupts at specific edges, and reason about what paths through the graph are possible in a way that sequential code does not support.

LangGraph's killer feature: persistence

LangGraph has first-class support for persistent checkpoints. The graph's state — every node's output, the current position in the workflow, accumulated context — can be saved to a database after each step. This enables: resuming workflows that were interrupted (a web server restart, a user abandoning a long task), implementing human-in-the-loop approval gates that require the workflow to pause until a human takes action outside the system, and debugging by replaying a failed workflow from any checkpoint. For production agents that run long tasks, persistence is not a nice-to-have — it is essential.

AutoGen: multi-agent conversation

Microsoft Research's AutoGen takes a different architectural stance: it models agent systems as conversations between multiple specialized agents. Where LangChain and LangGraph think in terms of chains and graphs, AutoGen thinks in terms of agents sending messages to each other. A typical AutoGen workflow involves an orchestrator agent that manages a task, specialist agents that have domain-specific tools and instructions, and a human proxy agent that allows humans to participate in the conversation at defined points.

AutoGen's conversation model maps naturally to problems that genuinely require diverse expertise: a coding task might involve a planner agent that breaks down requirements, a coder agent that writes code, a reviewer agent that checks the code for bugs, and a tester agent that executes tests and reports results. Each agent has its own system prompt, tool access, and decision authority. The conversation between them is the actual work product.

AutoGen's trade-offs

The conversation-centric model creates natural structure for multi-agent workflows but can lead to verbose, expensive token usage — agents are essentially sending each other full context windows. AutoGen works best when the diversity of agent specializations is genuine (different tools, different domain knowledge, different decision authorities) rather than artificially imposed structure. Using AutoGen to orchestrate two agents that both have the same tools and instructions is worse than using a single agent.

CrewAI: role-based agent teams

CrewAI provides a higher-level abstraction than AutoGen, centering on the concept of a "crew" — a team of role-defined agents with explicit task assignments and a defined process (sequential or hierarchical) for executing those assignments. The API is deliberately opinionated: you define agents by role ("Senior Data Analyst", "Report Writer", "QA Reviewer"), assign tasks to agents, and define whether tasks execute sequentially or whether a manager agent routes tasks to the most appropriate specialist.

CrewAI's opinionation makes it faster to get started with multi-agent patterns but less flexible for custom orchestration logic. It is particularly well-suited for document processing pipelines, research assistants, and content generation workflows where the role structure maps naturally to how a human team would approach the problem.

Choosing a framework: a decision framework

Single-agent, linear workflow
Use LangChain with LCEL or a direct API implementation. The framework's chain abstractions and integration ecosystem give you integrations with databases, vector stores, and external services without boilerplate. LangGraph is overkill here — you don't need the graph machinery.
Single-agent, complex stateful workflow
Use LangGraph. Workflows with conditional branches, loops, or human-in-the-loop requirements are exactly what LangGraph was built for. The checkpoint persistence system is a major practical advantage for anything running longer than a few seconds.
Multi-agent with diverse specialist roles
AutoGen or CrewAI. Use CrewAI if the role structure maps naturally and you want faster setup. Use AutoGen if you need fine-grained control over the conversation protocol or if some agents need different LLM backends (AutoGen supports mixing models across agents).
Highly customized or performance-critical
Consider a minimal custom implementation using only the LLM SDK directly. Frameworks add latency and abstraction overhead. If your use case has unique requirements that consistently fight the framework's design, the battle to make the framework do what you want will cost more than building the specific scaffolding you need.

Production concerns common to all frameworks

Regardless of which framework you choose, the same production concerns apply to every agentic deployment.

Observability. You need structured logs of every LLM call, tool invocation, tool result, and decision point. LangSmith (for LangChain/LangGraph), Arize Phoenix, and Weights & Biases Weave all provide agent-specific tracing. Without tracing, debugging a failed run requires reconstructing what happened from incomplete outputs — a painful process that gets worse as agent complexity grows.

Cost controls. Agents can consume surprisingly large amounts of tokens if they loop, retrieve large documents, or run in parallel. Set hard limits on tokens per run, implement budget checks before each LLM call, and alert when per-task cost exceeds a threshold. An agent that loops 50 times instead of the expected 5 can cost 10x more than expected — and without cost controls, you won't notice until the billing cycle closes.

Graceful degradation. Production agents encounter rate limits, API outages, malformed tool responses, and unexpected inputs. Every tool call should have retry logic with exponential backoff. Every agent run should have a total timeout. Every agent should have a defined fallback behavior when it cannot complete the task — returning partial results with an explanation is almost always preferable to timing out silently or throwing an unhandled exception.

Framework lock-in

Agent frameworks are evolving quickly, and the framework that is dominant today may not be the best choice in two years. Architecturally, this argues for keeping your business logic — the actual task instructions, tool implementations, and domain-specific logic — as decoupled from the framework as possible. If your tool implementations are pure Python functions that happen to be wrapped by LangChain's tool decorator, switching frameworks costs a wrapper change. If your tool logic is deeply entangled with LangChain's abstractions, switching costs a rewrite. Design for portability from the start.

In the final module, we zoom out from the mechanics of building agents to consider where the technology is heading: what autonomous AI systems will be capable of in the near term, what societal and governance questions they raise, and what it means to build responsibly in a space that is moving this fast.