Chain-of-thought (CoT)

Definition

Chain-of-thought (CoT) prompting asks the model to output intermediate reasoning steps before the final answer. This often improves accuracy on math, logic, and multi-step tasks by forcing the model to make its reasoning explicit rather than leaping directly to a conclusion.

CoT works because language models are autoregressive: each generated token attends to prior tokens. By generating a chain of reasoning steps first, the model essentially conditions its final answer on a more structured and elaborated context — reducing errors caused by skipping steps or making implicit assumptions.

It is one of the simplest reasoning patterns: no tools or search, just prompting. Use it when the task benefits from explicit steps (e.g. arithmetic, deduction) and you want to avoid fine-tuning. For exploring multiple solution paths, see tree of thoughts; for tool-using agents, see ReAct.

How it works

Zero-shot CoT

Few-shot CoT

You give the model a question (or task) and ask it to reason step by step. The model produces Step 1, Step 2, … (intermediate reasoning) and then the answer. Zero-shot CoT: add "Let's think step by step" (or similar) to the prompt — no examples needed. Few-shot CoT: include example (question, steps, answer) triples so the model mimics the format. The model generates the full sequence in one pass; you can optionally parse the steps and verify or score them. Quality depends on prompt engineering and model capability.

When to use / When NOT to use

Scenario	Use CoT	Don't use CoT
Multi-step arithmetic or algebra	Yes — intermediate steps prevent calculation errors	No — simple single-step math doesn't need it
Logical deduction or inference	Yes — explicit steps make reasoning auditable	No — factual recall tasks don't benefit
Code planning or design decisions	Yes — writing out steps before code reduces bugs	No — generating boilerplate from a template
High-volume, low-latency inference	No — extra tokens increase cost and latency	Yes — avoid for simple classification or extraction
Model with strong built-in reasoning	Maybe — newer models reason internally (o1, o3)	Yes — forcing explicit CoT on thinking models adds redundancy

Comparisons

Criteria	CoT	Self-consistency	Step-back prompting
Core idea	Single reasoning chain	Multiple CoT paths + majority vote	Abstract question first, then answer
Reliability	Moderate — one path may err	High — voting filters errors	High — abstraction reduces confusion
Cost (API calls)	1 call	N calls (typically 5–20)	2 calls
Best for	Math, logic, multi-step tasks	Tasks with verifiable answers	Knowledge-heavy, complex questions
Composability	Standalone or as building block	Builds on CoT	Builds on CoT

Pros and cons

Pros	Cons
Simple to implement — just prompt engineering	Increases output length and token cost
No fine-tuning or special training needed	Model may generate plausible but incorrect steps
Makes reasoning inspectable and debuggable	Does not help with tasks that need external information
Works across many domains (math, logic, code)	Weaker benefit on small models vs. large ones

Code examples

from openai import OpenAI

client = OpenAI()

SYSTEM_PROMPT = (
    "You are a careful reasoning assistant. "
    "When solving problems, always show your reasoning step by step "
    "before giving the final answer."
)

def cot_query(question: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": question},
        ],
    )
    return response.choices[0].message.content

# Few-shot example
FEW_SHOT = """
Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 balls. How many does he have?
A: Roger starts with 5 balls. He buys 2 cans × 3 balls = 6 balls. Total: 5 + 6 = 11 balls.

Q: {question}
A:"""

def few_shot_cot(question: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": FEW_SHOT.format(question=question)}],
    )
    return response.choices[0].message.content

print(cot_query("A store has 40 apples. They sell 15 and receive 3 new shipments of 10. How many are left?"))

Practical resources

Chain-of-Thought Prompting (Wei et al.) — Original paper introducing CoT prompting
OpenAI – Prompt engineering — Includes reasoning and step-by-step guidance
Self-consistency improves CoT (Wang et al.) — Majority-voting over multiple CoT paths for higher reliability

Sources

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022) — Seminal paper introducing few-shot chain-of-thought prompting and demonstrating dramatic reasoning improvements.
Large Language Models Are Zero-Shot Reasoners (Kojima et al., 2022) — Introduces zero-shot CoT ("Let's think step by step") without requiring examples.
Self-Consistency Improves Chain of Thought Reasoning in Language Models (Wang et al., 2022) — Majority-voting over multiple CoT paths significantly improves reliability.
Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Yao et al., 2023) — Extends CoT to multi-path search, providing context for CoT's single-path limitation.