Module 7 · Expert Track15 min read · Prompt Engineering Mastery

Prompt Chaining and Decomposition

The single biggest constraint on what AI can accomplish in one pass is attention quality — complex tasks degrade when you try to force everything into a single prompt. Prompt chaining, the practice of breaking work into coordinated sequences of smaller prompts, unlocks a qualitatively different class of task complexity.

Why Single-Prompt Approaches Fail at Scale

Context windows have grown dramatically, but the underlying problem is not window size — it is attention quality. As prompts grow longer, models distribute attention across more content, and reasoning on any individual piece tends to decrease. The phenomenon known as "lost in the middle" — where information buried in the center of long contexts receives less attention than content at the edges — has been empirically documented across major models.

Beyond attention, single-prompt approaches suffer from compounding errors. When a complex task requires ten sequential judgments and each has a 90% accuracy rate, the final output has only a 35% chance of being fully correct. Chaining allows you to verify and correct at each step, preventing early errors from propagating through the entire pipeline.

A third failure mode is scope collapse: when given a massive task, models tend to satisfice rather than optimize — producing something that looks like a complete answer without genuinely addressing every dimension. Decomposition forces completeness by making each subtask explicit and independently evaluable.

Rule of Thumb

If your prompt has more than three distinct goals, more than five distinct constraints, or produces an output you cannot fully evaluate in a single read, it is a strong candidate for decomposition into a chain.

Sequential Chains

The simplest chain architecture is sequential: the output of Prompt N becomes an input to Prompt N+1. Each link handles exactly one well-defined responsibility. This is the most natural decomposition pattern and the right starting point for most tasks.

Consider a research report pipeline. Rather than asking the model to "write a comprehensive 3000-word report on X," a sequential chain looks like this:

Step 1 — Scope and Outline

Generate a detailed outline: main sections, key questions each section answers, and suggested sources or data types. Output: structured outline for human review before proceeding.

Step 2 — Section Drafting (repeated per section)

"Write a detailed draft of Section N: [title]. Context: [outline summary]. This section should answer: [questions]. Approximately 400-500 words." Run once per section with the shared outline as context.

Step 3 — Integration

"Here are N section drafts. Integrate them into a cohesive document: add transitions, eliminate redundancy, ensure consistent terminology, and write a 150-word executive summary."

Step 4 — Quality Review

"Review this report for factual consistency, logical flow, and completeness. Note any claims that need verification, any gaps in argumentation, and any sections that feel rushed."

Parallel Chains and Map-Reduce

Not all decomposition is sequential. When a complex task has multiple independent subtasks, you can run them in parallel and aggregate. Parallel chains reduce total latency and allow each prompt to be independently optimized for its specific subtask.

The map-reduce pattern is the canonical example. In the map phase, you apply the same prompt to many pieces of input independently. In the reduce phase, you aggregate the outputs into a final result. This is essential for tasks that exceed a single context window — document summarization at scale, corpus analysis, or batch evaluation.

# Map-Reduce for Large Document Analysis (pseudocode)

documents = load_all_customer_feedback()  # 500 documents

# MAP: process each document independently
chunk_summaries = []
for doc in documents:
    summary = llm(
        "Extract: main complaint, sentiment (1-5), product area, "
        "and any feature requests. Output as JSON.",
        context=doc
    )
    chunk_summaries.append(summary)

# REDUCE: synthesize all summaries
final_report = llm(
    "Here are 500 structured feedback summaries. Identify: "
    "top 5 complaint themes with frequency counts, "
    "average sentiment by product area, "
    "top 10 feature requests ranked by mention count. "
    "Output a structured executive report.",
    context=chunk_summaries
)

Aggregation Prompt Design

The aggregation step in a map-reduce chain is where most pipelines fail. If you feed 500 raw summaries into a single prompt and ask for a synthesis, the model faces the same attention problem you were trying to solve. Consider multi-level reduction: aggregate batches of 20-30, then aggregate the batch summaries.

Using Outputs as Inputs: Context Passing

The defining characteristic of a chain is that outputs from earlier prompts become inputs to later ones. How you structure this handoff significantly affects chain quality. There are three approaches, each with different tradeoffs:

Full output passing: Pass the complete output of step N to step N+1. Maximally preserves information but can bloat context rapidly in long chains. Best when every detail from the prior step is potentially relevant.
Structured extraction: After each step, run a brief extraction prompt that pulls out only the essential outputs (key decisions, values, constraints) in a structured format. Pass only the structured extract to the next step. Keeps context lean and forces clarity about what actually matters.
Running summary: Maintain a continuously updated summary of all prior steps. After each step, run a brief prompt: "Update this running summary with the key output of the step just completed." Pass only the summary forward. Best for very long chains where early context is mostly background.

Managing Context Across Chains

Context management is the hardest engineering challenge in prompt chaining. Two problems arise most often: context pollution (irrelevant prior outputs degrading later prompt quality) and context loss (necessary prior outputs being dropped as the chain grows).

The solution is explicit context architecture. Before building a chain, design a context object — a structured document specifying what information each step needs from prior steps and in what format. This makes context passing intentional rather than ad hoc.

CHAIN CONTEXT OBJECT (example structure)

{
  "task_goal": "Competitive analysis of payments market",
  "constraints": ["B2B focus", "US market only", "2023-2024"],
  "step_1_output": {
    "competitors_identified": ["Stripe", "Adyen", "Braintree"],
    "analysis_framework": "features / pricing / positioning"
  },
  "step_2_output": {
    "features_analysis": "...",   // from parallel step 2a
    "pricing_analysis": "...",    // from parallel step 2b
    "positioning_analysis": "..." // from parallel step 2c
  },
  "current_step": 3,
  "next_step_goal": "Synthesize into strategic recommendations"
}

Orchestration Patterns

Advanced chaining goes beyond fixed pipelines into dynamic orchestration — where the output of one step determines which prompts run next. This is the territory of agentic AI, where a model (or orchestration layer) decides the chain structure at runtime based on intermediate results.

The key orchestration patterns are:

Router Pattern

A classifier prompt examines the input and routes it to one of several specialist chains. A customer query might be routed to technical support, billing, or feature request chains based on the first step's classification output.

Validator Loop

After each step, a validation prompt checks whether the output meets quality criteria. If not, it either retries the step with additional guidance or escalates to a human review queue. Sets a maximum retry count to prevent infinite loops.

Conditional Branching

A decision prompt examines intermediate state and selects which branch of the pipeline to follow. "If the analysis reveals missing data in category X, run the data-gathering branch; otherwise proceed to synthesis."

Best Practice: Checkpointing

In any chain longer than three steps, save intermediate outputs to persistent storage before proceeding. If a later step fails or produces poor output, you can restart from the last good checkpoint rather than re-running the entire chain from scratch. This is especially important when early steps involve expensive operations like web retrieval or large document processing.

Summary

Prompt chaining solves the quality and scale limitations of single-pass prompting. Sequential chains handle multi-stage tasks with quality checkpoints at each step. Parallel chains reduce latency and allow subtask specialization. Map-reduce handles scale beyond a single context window. Context management — deciding what to pass forward and in what format — is the key engineering discipline. Dynamic orchestration patterns like routers, validator loops, and conditional branching enable AI pipelines that adapt to their own intermediate outputs.