AI Summary Hub

Few-shot learning

Learning from very few examples.

Definition

Few-shot learning is the ability of a model to generalize to new tasks or classes from a very small number of labeled examples — typically 1 to 5 per class (1-shot, 5-shot). Rather than requiring hundreds or thousands of labeled samples, few-shot learning systems leverage prior knowledge (from pretraining or meta-training) to extract maximum signal from minimal data. The challenge is distinct from standard supervised learning: the model must adapt quickly at test time, not just fit a large training set.

Two main paradigms have emerged. Meta-learning (learning to learn) trains models over many different few-shot tasks sampled from a meta-train set, so the model explicitly learns how to adapt. MAML (Model-Agnostic Meta-Learning) optimizes for a parameter initialization that can be fine-tuned in a few gradient steps on any new task. Metric-based methods (Prototypical Networks, Matching Networks) learn an embedding space where classification reduces to nearest-neighbor search relative to class prototypes computed from support examples.

The third paradigm — in-context learning — is specific to large LLMs: the support examples are simply prepended to the prompt as demonstrations, and the model conditions on them without any gradient updates. GPT-3 popularized this approach, demonstrating that sufficiently large language models can perform novel tasks from just a handful of examples in the context window. Few-shot learning sits between transfer learning (which requires more labeled target data) and zero-shot learning (which requires none).

How it works

Episodic task structure

Every few-shot task is defined by a support set (N classes × K examples = N-way K-shot) and a query set (examples to classify). The model adapts to the support set and predicts labels for the query set.

Meta-learning (MAML)

MAML learns a model initialization θ such that a few gradient steps on the support set of any new task yields good performance on that task's query set. The meta-objective is: update θ so that θ − α·∇L_task is good across all sampled tasks.

Metric-based methods

Prototypical Networks compute a prototype for each class by averaging the embeddings of its support examples. Query examples are classified by their distance to the nearest prototype in the embedding space.

In-context few-shot (LLMs)

No gradient updates occur. The prompt contains the support examples formatted as demonstrations, and the model completes the query based on pattern matching from pretraining.

Episodic training

When to use / When NOT to use

ScenarioUse few-shot learningAvoid few-shot learning
Only 1–20 labeled examples per classYes — purpose-built for data scarcityNo — standard supervised learning if data is sufficient
LLM inference with examples in the promptYes — in-context few-shot is free at inferenceNo — fine-tuning is better for consistent, high-volume tasks
Rapid adaptation to new classes without retrainingYes — prototypical networks or MAMLNo — if new classes are stable and labeled data can be collected
Entirely new domain with no pretrained modelNo — pretraining is a prerequisite
High accuracy on a fixed, well-labeled datasetNo — supervised learning outperforms

Comparisons

ApproachExamples neededAdaptation mechanismGradient updates at test time
Zero-shot learning0Prompt / text descriptionNo
Few-shot learning (in-context)1–10In-context demonstrationsNo
Few-shot learning (MAML)1–10Inner-loop gradient stepsYes (few steps)
Transfer learning / fine-tuning100–10K+Full or partial fine-tuningYes (many steps)
Supervised learning1K–1M+Standard SGDYes

Pros and cons

ProsCons
Generalizes to new tasks with minimal labeled dataPerformance typically below fully supervised approaches
In-context few-shot requires no training — just promptingSensitive to prompt format and example order for LLMs
Meta-learning enables fast adaptation across domainsMeta-training is compute-intensive (many tasks required)
Useful for rare categories and personalizationSupport set quality heavily impacts predictions

Code examples

Prototypical Network inference (few-shot image classification):

import torch
import torch.nn as nn

class PrototypicalNet(nn.Module):
    """Simple CNN encoder for few-shot image classification."""
    def __init__(self, embedding_dim=64):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(1, 32, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
            nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(), nn.AdaptiveAvgPool2d(4),
            nn.Flatten(),
            nn.Linear(64 * 4 * 4, embedding_dim),
        )

    def forward(self, x):
        return self.encoder(x)

def prototypical_predict(model, support_images, support_labels, query_images, n_classes):
    """
    support_images: (N*K, C, H, W) — K examples per class, N classes
    support_labels: (N*K,)
    query_images:   (Q, C, H, W)
    Returns predicted labels for query_images.
    """
    model.eval()
    with torch.no_grad():
        support_emb = model(support_images)   # (N*K, D)
        query_emb   = model(query_images)     # (Q, D)

        # Compute class prototypes (mean embedding per class)
        prototypes = torch.stack([
            support_emb[support_labels == c].mean(0)
            for c in range(n_classes)
        ])  # (N, D)

        # Euclidean distance from each query to each prototype
        dists = torch.cdist(query_emb, prototypes)  # (Q, N)
        return dists.argmin(dim=1)  # Nearest prototype = predicted class

# Example: 5-way 1-shot, 10 query images (28x28 grayscale)
model = PrototypicalNet(embedding_dim=64)
support = torch.randn(5, 1, 28, 28)   # 1 example per class
labels  = torch.arange(5)             # Classes 0–4
queries = torch.randn(10, 1, 28, 28)

preds = prototypical_predict(model, support, labels, queries, n_classes=5)
print("Predicted labels:", preds)

In-context few-shot with an LLM via the OpenAI API:

from openai import OpenAI

client = OpenAI()

# 3-shot sentiment classification via chat messages
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Classify the sentiment as positive or negative."},
        {"role": "user",   "content": "Review: 'Absolutely loved this movie!' Sentiment:"},
        {"role": "assistant", "content": "positive"},
        {"role": "user",   "content": "Review: 'Terrible experience, never coming back.' Sentiment:"},
        {"role": "assistant", "content": "negative"},
        {"role": "user",   "content": "Review: 'Best product I have ever bought.' Sentiment:"},
        {"role": "assistant", "content": "positive"},
        {"role": "user",   "content": "Review: 'Waste of money, very disappointed.' Sentiment:"},
    ]
)
print(response.choices[0].message.content)  # Expected: negative

Practical resources

See also