AI Summary Hub

PyTorch

Deep learning framework with dynamic computation graphs.

Definition

PyTorch is a Python-first deep learning framework developed by Meta AI, characterized by dynamic computation graphs and an imperative programming model. Every operation executes immediately (eager mode), and the computational graph for backpropagation is constructed on the fly. This makes it straightforward to write, run, and debug neural network code using standard Python tools — print statements, debuggers, and the Python REPL all work exactly as expected.

PyTorch has become the dominant framework in research and is the foundation of the modern ML ecosystem: Hugging Face's Transformers library defaults to PyTorch, most academic papers release PyTorch implementations, and libraries such as torchvision, torchaudio, torchtext, and PyTorch Geometric extend it to computer vision, audio, text, and graph domains. The framework supports CPU, GPU, Apple Silicon (MPS backend), and multi-GPU training through torch.distributed, with higher-level wrappers like HuggingFace Accelerate and PyTorch Lightning reducing distributed boilerplate.

Compared to TensorFlow, PyTorch is preferred for research and rapid prototyping due to its Python-native debugging experience and faster iteration cycle. TensorFlow maintains an advantage in mobile deployment (TFLite), TPU training, and production pipeline tooling. For deployment, PyTorch provides TorchScript (static graph for production), ONNX export (cross-framework interoperability), and PyTorch Mobile. Most LLM training and fine-tuning work happens in PyTorch through the HuggingFace ecosystem.

How it works

Training loop

Deployment pipeline

Key abstractions

nn.Module — base class for all models; define __init__ (layers) and forward (computation). autograd — automatic differentiation; loss.backward() computes gradients for all parameters. DataLoader — batching, shuffling, and multi-process data loading. torch.optim — optimizers (Adam, SGD, AdamW). torch.distributed — data parallel and model parallel distributed training.

When to use / When NOT to use

ScenarioUse PyTorchDo NOT use PyTorch
Research and experimenting with new architecturesYes — eager mode, Python-native debugging
Fine-tuning HuggingFace modelsYes — default backend for HuggingFace
LLM training and inference workloadsYes — dominant in LLM ecosystem
Mobile or edge deployment (iOS, Android)TensorFlow Lite is more mature for this
Training on Google TPUsTensorFlow or JAX have better TPU support
Production ML pipelines with managed servingTF Serving + TFX provide a more integrated stack

Comparisons

FeaturePyTorchTensorFlow / Keras
Execution modeEager (default) + TorchScriptEager (default) + tf.function
Debugging experiencePython-native (pdb, print)tf.function can obscure errors
Research adoptionDominantDecreasing
Mobile / edgePyTorch Mobile (experimental)TFLite (first-class)
HuggingFace ecosystemDefault backendSupported but secondary
TPU supportVia PyTorch/XLAFirst-class
High-level APILightning, Ignite (third-party)Keras (built-in)

Pros and cons

ProsCons
Python-native debugging with eager executionDistributed training requires more manual setup
Dominant in research; most papers release PyTorch codeNo built-in high-level training API (need Lightning or similar)
Foundation of HuggingFace ecosystemMobile deployment less mature than TFLite
Flexible; easy to implement custom layers and lossesModel serialization (TorchScript) has limitations vs SavedModel

Code examples

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset

# Define a simple feedforward network
class MLP(nn.Module):
    def __init__(self, in_features: int, hidden: int, num_classes: int):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(in_features, hidden),
            nn.ReLU(),
            nn.Linear(hidden, num_classes),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.net(x)

model = MLP(in_features=784, hidden=256, num_classes=10)
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()

# Explicit training loop
for epoch in range(5):
    for x_batch, y_batch in train_loader:
        optimizer.zero_grad()
        logits = model(x_batch)
        loss = loss_fn(logits, y_batch)
        loss.backward()          # compute gradients
        optimizer.step()         # update weights

# Export for cross-framework deployment
dummy_input = torch.randn(1, 784)
torch.onnx.export(model, dummy_input, "mlp.onnx")

Practical resources

See also