Module 6 · Expert Track15 min read · Prompt Engineering Mastery

Advanced Reasoning Techniques

Standard chain-of-thought prompting was just the beginning — the frontier of prompt engineering involves teaching models to explore multiple solution paths, critique their own outputs, and iteratively refine their reasoning in ways that dramatically exceed the quality of single-pass responses. These techniques separate practitioners who occasionally get good outputs from those who engineer reliable ones.

Tree of Thought: Exploring the Solution Space

Chain-of-thought prompting asks a model to reason step by step along a single path. Tree of Thought (ToT) prompting, introduced by Yao et al. in 2023, asks the model to generate multiple candidate reasoning paths simultaneously, evaluate each branch, and pursue the most promising one — essentially a search algorithm over the space of possible solutions.

The practical implementation doesn't require special infrastructure. You can simulate ToT in a single prompt by asking the model to generate several candidate approaches, briefly evaluate the merits of each, and then develop the most promising one in full. This mimics the deliberation a good expert performs before committing to a solution.

You are solving: [PROBLEM] Step 1 — Generate three distinct approaches to this problem. For each approach, write 2-3 sentences describing the strategy and its core tradeoff. Step 2 — Evaluate each approach. Which has the best combination of correctness, efficiency, and implementability for this specific context? Explain your reasoning in 3-4 sentences. Step 3 — Implement the best approach in full detail.
When to Use Tree of Thought

ToT is most valuable for problems with multiple plausible solution strategies, where the "right" approach depends on tradeoffs that aren't obvious at the outset — algorithm selection, architectural decisions, strategic planning. For problems with a clear solution path, the overhead of ToT is unnecessary.

ReAct: Reason + Act

The ReAct (Reasoning + Acting) pattern, developed by researchers at Princeton and Google, structures the model's output as an interleaved sequence of reasoning steps ("Thought:") and actions ("Action:") followed by observations ("Observation:"). This creates a transparent trace of the model's decision-making process that is both easier to debug and more reliable for complex tasks.

In agentic contexts where the model has access to tools (web search, code execution, database queries), ReAct is effectively the standard interaction pattern. But even without real tools, you can use the ReAct structure to improve reasoning quality on tasks that benefit from explicit deliberation between steps:

Solve this problem using the ReAct format: Thought: [Analyze what you know and what you need to figure out] Action: [What you will do or calculate next] Observation: [The result of that action] Thought: [What the observation tells you, what to do next] Action: [Next step] Observation: [Result] ... (continue until solution) Answer: [Final answer, synthesized from the reasoning trace] Problem: A company's revenue grew 15% in Q1, declined 8% in Q2, grew 22% in Q3, and declined 5% in Q4. If starting revenue was $2.4M, what was the year-end revenue and the net percentage change?

The power of ReAct is that it prevents the model from "skipping steps" — a common failure mode where a model leaps from a problem statement to a conclusion without showing (and therefore checking) the intermediate reasoning. Each Observation forces the model to ground its next thought in a concrete result.

Self-Critique and Reflection Prompts

One of the most powerful upgrades you can make to any prompt is to ask the model to critique its own initial response before finalizing it. Models are significantly better at evaluating outputs than at generating perfect outputs in one pass — a fact that mirrors how human experts work. A good writer revises; a good analyst double-checks; a good engineer reviews their own code.

The self-critique pattern works in two stages: generate, then evaluate. You can implement this in a single prompt or across two separate calls:

Phase 1 — Initial Response: Answer the following question with your best analysis: [QUESTION] Phase 2 — Self-Critique: Now review your response above. Specifically: 1. Are there any factual claims you are less than 90% confident in? 2. Are there important counterarguments or nuances you omitted? 3. Is your conclusion well-supported by the evidence you cited? 4. What would a skeptical expert in this domain push back on? Phase 3 — Revised Response: Incorporating your critique, write a revised and improved answer. Mark any claims you remain uncertain about with [UNCERTAIN].
Why This Works

The model's initial response activates a set of associations and framings. The critique phase forces the model to step outside that frame and apply an evaluative rather than generative mode. The revision integrates both, typically producing substantially better output than either phase alone.

Asking Models to Find Their Own Errors

A more targeted version of self-critique is to ask a model to actively search for errors in a specific response — its own or someone else's. This is particularly effective for mathematical reasoning, logical arguments, and code, where errors are discrete and verifiable.

The framing matters enormously. "Check your work" produces shallow review. "Assume this response contains at least one error. Your task is to find it" produces much more aggressive and useful error-finding behavior, because you've set an expectation that an error exists rather than giving the model permission to conclude everything is fine.

The following solution to a logic problem is known to contain at least one error. Your task is to find and explain every error. Do not accept the solution as correct even if it appears sound — look harder. [SOLUTION TO REVIEW] For each error you find: - Quote the exact text containing the error - Explain what is wrong - Provide the correct reasoning or value
Limitation to Note

Models can miss errors in their own outputs, particularly in long mathematical derivations or complex logical chains. Self-critique improves accuracy substantially but does not eliminate errors. For high-stakes verification, always combine model self-review with external validation.

Iterative Refinement

Iterative refinement is the practice of treating the first model output as a draft and explicitly requesting successive improvements, each targeting a specific dimension of quality. Unlike the single self-critique pass, iterative refinement uses multiple rounds, each with a targeted improvement goal.

Effective iterative refinement sequences look like this:

Round 1 — Completeness Pass
Generate the initial response. Then: "Review for completeness. What important points, edge cases, or stakeholder perspectives are missing? Add them without removing existing content."
Round 2 — Accuracy Pass
"Review for factual accuracy. Flag any claims that require verification, any numbers that seem off, and any logical inferences that don't follow. Correct what you can; mark what you can't."
Round 3 — Clarity Pass
"Rewrite for clarity and concision. Remove redundancy, simplify jargon where possible, and ensure every sentence earns its place. Target: 20% shorter without losing meaning."
Round 4 — Audience Pass
"Re-read this as [target audience]. Does the tone match their expectations? Is the assumed knowledge level correct? Adjust accordingly."

This multi-pass approach consistently outperforms single-pass prompting for documents, analyses, and complex technical explanations. The cost is proportionally more tokens, but the quality improvement often justifies it for important outputs.

Step-Back Prompting

Step-back prompting, a technique developed at Google DeepMind, addresses a common failure mode: models answering the question they were asked rather than the question they should have been asked. By prompting the model to first identify the underlying principle or category the question belongs to, you get responses grounded in first principles rather than shallow pattern matching.

The technique works in two moves: first, ask for the abstraction; second, use that abstraction to answer the original question.

Original question: "Should we use a relational database or a document database for our user profiles feature?" Step-back prompt: Step 1 — Before answering, identify the underlying principles that govern database selection decisions. What are the key factors that determine when to use relational vs. document databases, independent of this specific case? Step 2 — Now apply those principles to this specific context: [describe your use case, data structure, query patterns, scale requirements, team familiarity] Step 3 — Give your recommendation with explicit reference to which principles drove it.

Step-back is especially powerful for technical decisions, medical reasoning, and policy analysis — domains where the correct answer requires applying general principles to specific cases, and where pattern-matching to superficially similar past cases is a common failure mode.

Structured Reasoning for Complex Multi-Step Problems

When a problem genuinely requires sustained multi-step reasoning — a business case analysis, a complex debugging session, a research synthesis — giving the model an explicit reasoning scaffold prevents the output from collapsing into vague generalities as the problem complexity increases.

The scaffold defines both what to think about and in what order. It forces the model to work through each step fully before proceeding, preventing the common failure of promising problem decomposition followed by inadequate execution of each part:

Analyze the following business scenario using this structured framework: SITUATION ASSESSMENT - What is the core problem or decision? - Who are the key stakeholders and what do they want? - What constraints are non-negotiable? OPTION GENERATION - List at least 4 distinct courses of action - For each: what does this enable? what does it foreclose? EVIDENCE AND ASSUMPTIONS - What do we know with high confidence? - What are we assuming that could be wrong? - What data would change our recommendation? RECOMMENDATION - State the recommended option - Give the top 3 reasons in order of importance - State the biggest risk and how to mitigate it IMPLEMENTATION PATH - What are the first three actions to take? - What does success look like in 90 days? Scenario: [YOUR SCENARIO HERE]
Summary

Advanced reasoning techniques move prompt engineering from input optimization to process design. Tree of Thought explores solution space before committing. ReAct makes reasoning transparent and verifiable. Self-critique and iterative refinement harness the model's evaluative capabilities. Step-back grounds answers in first principles. Structured scaffolds prevent complex reasoning from deteriorating. Each technique has a cost in tokens and time — apply them selectively where output quality justifies it.