AI Summary Hub

Chain-of-thought (CoT)

Step-by-step reasoning to improve LLM outputs.

Definition

Chain-of-thought (CoT) prompting asks the model to output intermediate reasoning steps before the final answer. This often improves accuracy on math, logic, and multi-step tasks by forcing the model to make its reasoning explicit rather than leaping directly to a conclusion.

CoT works because language models are autoregressive: each generated token attends to prior tokens. By generating a chain of reasoning steps first, the model essentially conditions its final answer on a more structured and elaborated context — reducing errors caused by skipping steps or making implicit assumptions.

It is one of the simplest reasoning patterns: no tools or search, just prompting. Use it when the task benefits from explicit steps (e.g. arithmetic, deduction) and you want to avoid fine-tuning. For exploring multiple solution paths, see tree of thoughts; for tool-using agents, see ReAct.

How it works

Zero-shot CoT

Few-shot CoT

You give the model a question (or task) and ask it to reason step by step. The model produces Step 1, Step 2, … (intermediate reasoning) and then the answer. Zero-shot CoT: add "Let's think step by step" (or similar) to the prompt — no examples needed. Few-shot CoT: include example (question, steps, answer) triples so the model mimics the format. The model generates the full sequence in one pass; you can optionally parse the steps and verify or score them. Quality depends on prompt engineering and model capability.

When to use / When NOT to use

ScenarioUse CoTDon't use CoT
Multi-step arithmetic or algebraYes — intermediate steps prevent calculation errorsNo — simple single-step math doesn't need it
Logical deduction or inferenceYes — explicit steps make reasoning auditableNo — factual recall tasks don't benefit
Code planning or design decisionsYes — writing out steps before code reduces bugsNo — generating boilerplate from a template
High-volume, low-latency inferenceNo — extra tokens increase cost and latencyYes — avoid for simple classification or extraction
Model with strong built-in reasoningMaybe — newer models reason internally (o1, o3)Yes — forcing explicit CoT on thinking models adds redundancy

Comparisons

CriteriaCoTSelf-consistencyStep-back prompting
Core ideaSingle reasoning chainMultiple CoT paths + majority voteAbstract question first, then answer
ReliabilityModerate — one path may errHigh — voting filters errorsHigh — abstraction reduces confusion
Cost (API calls)1 callN calls (typically 5–20)2 calls
Best forMath, logic, multi-step tasksTasks with verifiable answersKnowledge-heavy, complex questions
ComposabilityStandalone or as building blockBuilds on CoTBuilds on CoT

Pros and cons

ProsCons
Simple to implement — just prompt engineeringIncreases output length and token cost
No fine-tuning or special training neededModel may generate plausible but incorrect steps
Makes reasoning inspectable and debuggableDoes not help with tasks that need external information
Works across many domains (math, logic, code)Weaker benefit on small models vs. large ones

Code examples

from openai import OpenAI

client = OpenAI()

SYSTEM_PROMPT = (
    "You are a careful reasoning assistant. "
    "When solving problems, always show your reasoning step by step "
    "before giving the final answer."
)

def cot_query(question: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": question},
        ],
    )
    return response.choices[0].message.content

# Few-shot example
FEW_SHOT = """
Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 balls. How many does he have?
A: Roger starts with 5 balls. He buys 2 cans × 3 balls = 6 balls. Total: 5 + 6 = 11 balls.

Q: {question}
A:"""

def few_shot_cot(question: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": FEW_SHOT.format(question=question)}],
    )
    return response.choices[0].message.content

print(cot_query("A store has 40 apples. They sell 15 and receive 3 new shipments of 10. How many are left?"))

Practical resources

Sources

See also