Prompt Chaining and Decomposition
The single biggest constraint on what AI can accomplish in one pass is attention quality — complex tasks degrade when you try to force everything into a single prompt. Prompt chaining, the practice of breaking work into coordinated sequences of smaller prompts, unlocks a qualitatively different class of task complexity.
Why Single-Prompt Approaches Fail at Scale
Context windows have grown dramatically, but the underlying problem is not window size — it is attention quality. As prompts grow longer, models distribute attention across more content, and reasoning on any individual piece tends to decrease. The phenomenon known as "lost in the middle" — where information buried in the center of long contexts receives less attention than content at the edges — has been empirically documented across major models.
Beyond attention, single-prompt approaches suffer from compounding errors. When a complex task requires ten sequential judgments and each has a 90% accuracy rate, the final output has only a 35% chance of being fully correct. Chaining allows you to verify and correct at each step, preventing early errors from propagating through the entire pipeline.
A third failure mode is scope collapse: when given a massive task, models tend to satisfice rather than optimize — producing something that looks like a complete answer without genuinely addressing every dimension. Decomposition forces completeness by making each subtask explicit and independently evaluable.
If your prompt has more than three distinct goals, more than five distinct constraints, or produces an output you cannot fully evaluate in a single read, it is a strong candidate for decomposition into a chain.
Sequential Chains
The simplest chain architecture is sequential: the output of Prompt N becomes an input to Prompt N+1. Each link handles exactly one well-defined responsibility. This is the most natural decomposition pattern and the right starting point for most tasks.
Consider a research report pipeline. Rather than asking the model to "write a comprehensive 3000-word report on X," a sequential chain looks like this:
Parallel Chains and Map-Reduce
Not all decomposition is sequential. When a complex task has multiple independent subtasks, you can run them in parallel and aggregate. Parallel chains reduce total latency and allow each prompt to be independently optimized for its specific subtask.
The map-reduce pattern is the canonical example. In the map phase, you apply the same prompt to many pieces of input independently. In the reduce phase, you aggregate the outputs into a final result. This is essential for tasks that exceed a single context window — document summarization at scale, corpus analysis, or batch evaluation.
The aggregation step in a map-reduce chain is where most pipelines fail. If you feed 500 raw summaries into a single prompt and ask for a synthesis, the model faces the same attention problem you were trying to solve. Consider multi-level reduction: aggregate batches of 20-30, then aggregate the batch summaries.
Using Outputs as Inputs: Context Passing
The defining characteristic of a chain is that outputs from earlier prompts become inputs to later ones. How you structure this handoff significantly affects chain quality. There are three approaches, each with different tradeoffs:
- Full output passing: Pass the complete output of step N to step N+1. Maximally preserves information but can bloat context rapidly in long chains. Best when every detail from the prior step is potentially relevant.
- Structured extraction: After each step, run a brief extraction prompt that pulls out only the essential outputs (key decisions, values, constraints) in a structured format. Pass only the structured extract to the next step. Keeps context lean and forces clarity about what actually matters.
- Running summary: Maintain a continuously updated summary of all prior steps. After each step, run a brief prompt: "Update this running summary with the key output of the step just completed." Pass only the summary forward. Best for very long chains where early context is mostly background.
Managing Context Across Chains
Context management is the hardest engineering challenge in prompt chaining. Two problems arise most often: context pollution (irrelevant prior outputs degrading later prompt quality) and context loss (necessary prior outputs being dropped as the chain grows).
The solution is explicit context architecture. Before building a chain, design a context object — a structured document specifying what information each step needs from prior steps and in what format. This makes context passing intentional rather than ad hoc.
Orchestration Patterns
Advanced chaining goes beyond fixed pipelines into dynamic orchestration — where the output of one step determines which prompts run next. This is the territory of agentic AI, where a model (or orchestration layer) decides the chain structure at runtime based on intermediate results.
The key orchestration patterns are:
In any chain longer than three steps, save intermediate outputs to persistent storage before proceeding. If a later step fails or produces poor output, you can restart from the last good checkpoint rather than re-running the entire chain from scratch. This is especially important when early steps involve expensive operations like web retrieval or large document processing.
Prompt chaining solves the quality and scale limitations of single-pass prompting. Sequential chains handle multi-stage tasks with quality checkpoints at each step. Parallel chains reduce latency and allow subtask specialization. Map-reduce handles scale beyond a single context window. Context management — deciding what to pass forward and in what format — is the key engineering discipline. Dynamic orchestration patterns like routers, validator loops, and conditional branching enable AI pipelines that adapt to their own intermediate outputs.