AI Summary Hub

Deep learning

Deep neural networks and representation learning.

Definition

Deep learning uses neural networks with many layers to learn hierarchical representations from data. It has driven progress in vision, language, and other domains by scaling data and compute.

It extends machine learning by using differentiable, layered models (see neural networks) that learn features automatically instead of hand-crafted ones. Depth allows the model to build increasingly abstract representations (e.g. edges -> textures -> parts -> objects in vision).

The defining characteristic of deep learning is end-to-end learning: raw inputs (pixels, tokens, audio samples) are transformed through successive non-linear layers, and the entire pipeline is optimized jointly by gradient descent. This removes the need for domain-specific feature engineering that traditional ML relies on. The tradeoff is that deep models need substantially more data and compute — GPUs, TPUs, and large memory — and are harder to interpret than classical models.

How it works

Forward pass

Data is fed into the input layer. Each layer applies a linear transformation (matrix multiply + bias) followed by a nonlinearity (e.g. ReLU). Stacking layers produces progressively more abstract representations. The final layer maps to the task output (class scores, regression value, or token logits).

Backward pass and optimization

The loss (e.g. cross-entropy for classification) is computed between predictions and targets. Backpropagation uses the chain rule to compute gradients of the loss with respect to every weight in the network. An optimizer (SGD, Adam) then updates the weights in the direction that reduces loss.

Architectures

Architecture choice tailors connectivity to data type: CNNs exploit spatial locality for images; RNNs handle variable-length sequences; Transformers use global self-attention and now dominate both vision and language tasks at scale.

When to use / When NOT to use

ScenarioUse deep learning?Notes
Large-scale image or video recognitionYesCNNs are the standard backbone
Text understanding or generationYesTransformers set state-of-the-art across NLP
Small structured/tabular datasetNoGradient boosting typically outperforms
Need full model interpretabilityNoDeep models are largely black boxes
Limited compute / edge deploymentWith cautionUse quantization or distilled models
Speech and audio recognitionYesDeep models outperform classical signal processing

Comparisons

AspectClassical MLDeep Learning
Feature engineeringManualAutomatic (end-to-end)
Data requirementsLow to mediumHigh
Compute requirementsLowHigh (GPU/TPU)
InterpretabilityHigh (e.g. trees)Low
Performance on unstructured dataModerateVery high

Pros and cons

ProsCons
Automatic feature learningData hungry
State-of-the-art on vision and languageRequires GPU/TPU
End-to-end optimizationHard to interpret
Transfer learning reduces data needsLong training times

Code examples

# Feedforward network with PyTorch for image classification (MNIST)
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Data loaders
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])
train_loader = DataLoader(
    datasets.MNIST('.', train=True, download=True, transform=transform),
    batch_size=64, shuffle=True
)
test_loader = DataLoader(
    datasets.MNIST('.', train=False, download=True, transform=transform),
    batch_size=1000
)

# Model definition
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28 * 28, 256), nn.ReLU(),
            nn.Linear(256, 128),     nn.ReLU(),
            nn.Linear(128, 10),
        )

    def forward(self, x):
        return self.net(x)

device  = "cuda" if torch.cuda.is_available() else "cpu"
model   = MLP().to(device)
opt     = optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()

# Training
for epoch in range(3):
    model.train()
    for X, y in train_loader:
        X, y = X.to(device), y.to(device)
        opt.zero_grad()
        loss_fn(model(X), y).backward()
        opt.step()

# Evaluation
model.train(False)
correct = sum(
    (model(X.to(device)).argmax(1) == y.to(device)).sum().item()
    for X, y in test_loader
)
print(f"Test accuracy: {correct / len(test_loader.dataset):.2%}")

Practical resources

See also