Planning and Task Decomposition
The gap between a user's ambiguous request and the sequence of concrete actions needed to fulfill it is bridged by planning. An agent that can plan well can take on complex, multi-step tasks that would be impossible to accomplish in a single inference step. An agent that plans poorly will either attempt to do too much at once and fail, or decompose tasks in ways that don't respect real-world dependencies. Planning is simultaneously one of the most powerful capabilities in agentic AI and one of its most fragile — the place where compounding errors most visibly accumulate.
How agents break down ambiguous tasks
When a user says "write me a competitive analysis of the top five players in the EV battery market," they are giving the agent a goal, not a plan. The agent must convert that goal into a sequence of concrete, executable steps. This conversion process — task decomposition — is one of the core cognitive operations in agentic reasoning.
Effective task decomposition requires the agent to: clarify ambiguities in the goal (which aspects of "competitive analysis" are most important?), identify what information is needed and where to get it, determine the dependencies between subtasks (market share data must be gathered before it can be compared), estimate what resources and tools are required, and structure the steps in a logical sequence that builds toward the goal.
Models accomplish this primarily through their general reasoning capabilities, guided by prompting. The most reliable approach is to explicitly ask the model to "think step by step" before acting — this forces the model to surface its decomposition reasoning rather than jumping to the first action that comes to mind.
Hierarchical planning: goals, subtasks, and actions
Complex tasks benefit from hierarchical decomposition, where the planning happens at multiple levels of abstraction simultaneously.
At the goal level, the task is expressed in terms of desired outcomes: "Produce a competitive analysis report covering market share, technology differentiation, and strategic positioning for five EV battery companies." This is what the user wants; it is abstract and not directly executable.
At the subtask level, the goal is broken into major components, each representing a meaningful unit of work: (1) identify the top five EV battery companies by market share, (2) research each company's technology approach, (3) research each company's strategic positioning, (4) compile and synthesize findings, (5) draft and format the report. Each subtask is concrete enough to assign but still represents multiple individual actions.
At the action level, each subtask is broken into the specific tool calls and operations needed to execute it: "search web for 'EV battery market share 2025'", "extract relevant data from search results", "search for '[company name] battery technology approach'", and so on. Actions are the unit of actual execution — the things that interact with the world.
Hierarchical planning lets the agent maintain a coherent view of the overall task while working at the action level. Without it, agents often lose track of why they are doing a particular action and can drift away from the original goal.
Task graphs and dependency management
The relationships between subtasks can be represented as a directed acyclic graph (DAG) where nodes are subtasks and edges represent dependencies. Subtask B depends on subtask A means B cannot start until A is complete. Subtasks with no dependency between them can run in parallel.
Explicit dependency tracking enables several important capabilities: parallelism detection (which subtasks can run simultaneously), scheduling (in what order must tasks execute), and failure propagation analysis (if subtask A fails, which downstream subtasks are blocked?). In multi-agent systems, the dependency graph can guide which subagents to launch first and when to aggregate their results.
# Simple task graph representation
task_graph = {
"identify_companies": {"depends_on": [], "status": "pending"},
"research_company_A": {"depends_on": ["identify_companies"], "status": "pending"},
"research_company_B": {"depends_on": ["identify_companies"], "status": "pending"},
"research_company_C": {"depends_on": ["identify_companies"], "status": "pending"},
"research_company_D": {"depends_on": ["identify_companies"], "status": "pending"},
"research_company_E": {"depends_on": ["identify_companies"], "status": "pending"},
"synthesize_findings": {"depends_on": [
"research_company_A", "research_company_B", "research_company_C",
"research_company_D", "research_company_E"
], "status": "pending"},
"draft_report": {"depends_on": ["synthesize_findings"], "status": "pending"}
}
# After identify_companies completes, research_company_A through E
# can all run in parallel — 5 subagents simultaneously
The ReAct framework in agentic contexts
The ReAct framework (Reason + Act) is the dominant approach to interleaving planning and execution in agentic systems. Rather than generating a complete plan upfront and then executing it, ReAct alternates between reasoning steps (thinking about what to do) and action steps (actually doing something), using the results of each action to inform the next reasoning step.
A ReAct trace looks like:
Thought: I need to find the top 5 EV battery companies by market share.
The most reliable source for this would be recent industry reports.
Action: web_search("EV battery market share 2025 top companies")
Observation: [search results returned — CATL 37%, BYD 16%, LGES 14%, ...]
Thought: CATL, BYD, LG Energy Solution, Panasonic, and Samsung SDI
appear to be the top 5. Now I need to research each one's technology.
I can run these searches in parallel.
Action: parallel [
web_search("CATL battery technology LFP NMC 2025"),
web_search("BYD Blade battery technology differentiation"),
...
]
Observation: [results for all parallel searches]
Thought: I now have the technology information. I notice that CATL and BYD
are both heavily invested in LFP chemistry, while the Korean players
focus on NMC. This is a key differentiator to highlight.
The key property of ReAct is that reasoning and action are interleaved — the agent doesn't have to commit to a full plan before starting. This is adaptive: if an early action returns surprising information, the agent can revise its plan for subsequent steps. This adaptability is critical for tasks where the information landscape can't be fully anticipated.
Pure chain-of-thought prompting asks the model to reason before answering, but without taking real-world actions. ReAct extends this by allowing the model to actually execute actions between reasoning steps and observe their results. This grounds the reasoning in real information rather than letting it drift into confabulation. For factual tasks, ReAct agents consistently outperform pure reasoning approaches because they can verify claims against actual data rather than relying on training knowledge that may be incomplete or outdated.
Plan validation before execution
For high-stakes tasks, generating and immediately executing a plan is risky. A superior approach is to generate a plan, validate it before execution, and only proceed once the plan meets quality criteria.
Plan validation can take several forms. Self-critique: the agent reviews its own plan and identifies potential issues, missing steps, or logical flaws before executing. Human review: for high-stakes or irreversible actions, the plan is presented to a human for approval before execution begins — this is a core element of supervised autonomy. Simulated execution: the agent traces through the plan logically to predict its outcomes, identifying failure points before real-world execution.
A simple but highly effective validation pattern is to ask the model "What could go wrong with this plan? What assumptions am I making that might be incorrect?" before beginning execution. This adversarial self-review surfaces hidden assumptions and missing contingencies that are not apparent when the plan is first generated.
Replanning when steps fail
Even well-constructed plans encounter failures during execution. A web search returns no results. An API is unavailable. A subtask produces output that invalidates an assumption the plan was built on. Handling these failures gracefully is what separates robust agents from fragile ones.
The replanning loop consists of: detecting that a step has failed or produced unexpected output, diagnosing why it failed, determining whether to retry, substitute an alternative approach, or abandon the subtask, updating the overall plan to reflect the new information, and continuing execution with the revised plan.
The most important property of replanning is that it must be bounded. An agent that can replan indefinitely in response to failures can end up in infinite loops, exhausting compute and time budgets without ever completing the task. Implement explicit limits: a maximum number of retries per step, a maximum number of replanning cycles per task, and a graceful degradation path when limits are reached (return partial results with an explanation of what failed and why).
Each planning and execution step has some probability of error. In a 5-step plan with 90% reliability per step, the probability that all 5 steps succeed is only 0.9^5 ≈ 59%. In a 20-step plan, it drops to 0.9^20 ≈ 12%. Long-horizon tasks compound errors dramatically — an early mistake that isn't detected propagates forward and gets built upon, potentially invalidating many subsequent steps. This is the fundamental reliability challenge of agentic systems. Mitigation strategies: validate outputs at each step before using them as inputs to the next, build in checkpoints where an overseer reviews progress, and design tasks to minimize the number of sequential dependent steps.
When to generate a plan vs act immediately
Not every task benefits from upfront planning. For simple, well-defined tasks with a clear sequence of actions, generating a full plan before acting adds latency without adding value. For complex, ambiguous tasks where the path forward is unclear, planning is essential.
Practical heuristics for when to plan upfront:
- Plan when the task has more than 3–4 sequential steps, when subtasks have complex dependencies, when the task involves irreversible actions, or when partial failure would be very costly
- Act immediately when the task is a single tool call or simple lookup, when the path forward is unambiguous, or when latency is a higher priority than optimality
- Use ReAct as the default for most tasks — it provides the benefits of planning (structured reasoning) without the cost of committing to a full upfront plan that may be invalidated by early results
Plan representation formats
How a plan is represented — in the agent's context or in an external data structure — affects how easily it can be manipulated, validated, and tracked.
Natural language lists are the most common format for in-context plans. Easy to generate, easy for the model to reason over, but difficult to programmatically parse or track status. Good for short plans in single-agent settings.
Structured JSON task graphs provide machine-readable plans that can be tracked, modified, and executed programmatically. Each step has an ID, description, dependencies, status, and result. More complex to generate but far more amenable to monitoring and manipulation. Required for multi-agent systems where different agents execute different steps.
Markdown checklists occupy a middle ground: human-readable, easy to render in UIs, and partially parseable. The agent can update checkboxes as steps complete, giving users visibility into progress. Useful for tasks where human oversight is part of the design.
Experiments across multiple agent benchmarks consistently show that agents prompted to generate an explicit plan before acting outperform agents that jump directly to action, even when the plan is never explicitly validated. The act of writing out a plan forces the model to clarify its approach, identify dependencies, and surface hidden assumptions — all of which improve execution quality. The cost is a few hundred extra tokens and a fraction of a second of latency. For any multi-step task, the investment is almost always worth it.