Neural networks
Introduction to artificial neural networks and their building blocks.
Definition
Neural networks are function approximators built from layers of units (neurons) with learnable weights and nonlinear activations. They can approximate complex mappings from inputs to outputs when trained on data.
They are the building blocks of deep learning. Variants like CNNs and RNNs add inductive biases (e.g. locality, recurrence) for specific data types; the same training machinery (backprop, gradient descent) applies.
The universal approximation theorem guarantees that a sufficiently wide, single-hidden-layer network can approximate any continuous function — but in practice, depth (stacking many layers) is far more parameter-efficient than width alone. Each additional layer increases the model's ability to compose simpler features into more complex ones. Modern neural networks range from a few hundred parameters (tiny edge models) to hundreds of billions (frontier LLMs), all sharing the same fundamental building blocks: linear transformations, activation functions, and gradient-based optimization.
How it works
Forward pass
Input is passed to the first layer. Each layer computes a linear combination of its inputs (weights + bias) and then a nonlinear activation (e.g. ReLU, sigmoid, GELU). The output of one layer becomes the input to the next; stacking layers allows the network to learn hierarchical features.
Loss and backpropagation
The final output layer maps to predictions (e.g. class scores or a scalar). A loss function (e.g. cross-entropy for classification, MSE for regression) measures how far predictions are from targets. Backpropagation computes gradients through the chain rule from output back to input.
Gradient descent and regularization
Gradient descent (or its stochastic variants: SGD, Adam, AdamW) updates weights to minimize the loss. Depth and width determine capacity; regularization (dropout, weight decay, batch normalization) and data size control overfitting.
When to use / When NOT to use
| Scenario | Use neural networks? | Notes |
|---|---|---|
| Unstructured data (images, text, audio) | Yes | NNs learn features automatically |
| Small tabular datasets | No | Gradient boosting often outperforms |
| Need interpretable model | No | NNs are largely black boxes |
| Abundant labeled data + compute | Yes | NNs scale well with both |
| Real-time inference on constrained hardware | With caution | Quantize or use smaller architectures |
| Transfer learning available for your domain | Yes | Fine-tuning a pretrained NN beats training from scratch |
Comparisons
| Architecture | Inductive bias | Best for | Key limitation |
|---|---|---|---|
| Feedforward (MLP) | None | Tabular, general | Ignores spatial/temporal structure |
| CNN | Spatial locality | Images, grids | Less effective for long sequences |
| RNN / LSTM | Temporal order | Sequences, time series | Slow to train, vanishing gradients |
| Transformer | Global attention | Text, multimodal | High memory at long context |
Pros and cons
| Pros | Cons |
|---|---|
| Universal function approximation | Requires significant data |
| Scales with data and compute | Computationally expensive |
| Transfer learning reduces labeled data needs | Difficult to interpret |
| Flexible architecture design | Sensitive to hyperparameters |
Code examples
# Basic feedforward neural network with PyTorch
import torch
import torch.nn as nn
# Define a simple two-hidden-layer network
class FeedForward(nn.Module):
def __init__(self, input_dim: int, hidden_dim: int, output_dim: int):
super().__init__()
self.layers = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(hidden_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, output_dim),
)
def forward(self, x: torch.Tensor) -> torch.Tensor:
return self.layers(x)
# Instantiate, pass a dummy batch, inspect output shape
model = FeedForward(input_dim=20, hidden_dim=64, output_dim=3)
x = torch.randn(32, 20) # batch of 32 samples, 20 features
logits = model(x)
print(f"Output shape: {logits.shape}") # (32, 3)
# Count parameters
n_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Trainable parameters: {n_params:,}")Practical resources
- Neural Networks and Deep Learning (Nielsen) — Free online book with mathematical depth
- 3Blue1Brown – Neural networks — Visual and intuitive introduction
- PyTorch Tutorials — Official hands-on tutorials from simple to advanced