AI Summary Hub

Neural networks

Introduction to artificial neural networks and their building blocks.

Definition

Neural networks are function approximators built from layers of units (neurons) with learnable weights and nonlinear activations. They can approximate complex mappings from inputs to outputs when trained on data.

They are the building blocks of deep learning. Variants like CNNs and RNNs add inductive biases (e.g. locality, recurrence) for specific data types; the same training machinery (backprop, gradient descent) applies.

The universal approximation theorem guarantees that a sufficiently wide, single-hidden-layer network can approximate any continuous function — but in practice, depth (stacking many layers) is far more parameter-efficient than width alone. Each additional layer increases the model's ability to compose simpler features into more complex ones. Modern neural networks range from a few hundred parameters (tiny edge models) to hundreds of billions (frontier LLMs), all sharing the same fundamental building blocks: linear transformations, activation functions, and gradient-based optimization.

How it works

Forward pass

Input is passed to the first layer. Each layer computes a linear combination of its inputs (weights + bias) and then a nonlinear activation (e.g. ReLU, sigmoid, GELU). The output of one layer becomes the input to the next; stacking layers allows the network to learn hierarchical features.

Loss and backpropagation

The final output layer maps to predictions (e.g. class scores or a scalar). A loss function (e.g. cross-entropy for classification, MSE for regression) measures how far predictions are from targets. Backpropagation computes gradients through the chain rule from output back to input.

Gradient descent and regularization

Gradient descent (or its stochastic variants: SGD, Adam, AdamW) updates weights to minimize the loss. Depth and width determine capacity; regularization (dropout, weight decay, batch normalization) and data size control overfitting.

When to use / When NOT to use

ScenarioUse neural networks?Notes
Unstructured data (images, text, audio)YesNNs learn features automatically
Small tabular datasetsNoGradient boosting often outperforms
Need interpretable modelNoNNs are largely black boxes
Abundant labeled data + computeYesNNs scale well with both
Real-time inference on constrained hardwareWith cautionQuantize or use smaller architectures
Transfer learning available for your domainYesFine-tuning a pretrained NN beats training from scratch

Comparisons

ArchitectureInductive biasBest forKey limitation
Feedforward (MLP)NoneTabular, generalIgnores spatial/temporal structure
CNNSpatial localityImages, gridsLess effective for long sequences
RNN / LSTMTemporal orderSequences, time seriesSlow to train, vanishing gradients
TransformerGlobal attentionText, multimodalHigh memory at long context

Pros and cons

ProsCons
Universal function approximationRequires significant data
Scales with data and computeComputationally expensive
Transfer learning reduces labeled data needsDifficult to interpret
Flexible architecture designSensitive to hyperparameters

Code examples

# Basic feedforward neural network with PyTorch
import torch
import torch.nn as nn

# Define a simple two-hidden-layer network
class FeedForward(nn.Module):
    def __init__(self, input_dim: int, hidden_dim: int, output_dim: int):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, output_dim),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.layers(x)

# Instantiate, pass a dummy batch, inspect output shape
model = FeedForward(input_dim=20, hidden_dim=64, output_dim=3)
x = torch.randn(32, 20)          # batch of 32 samples, 20 features
logits = model(x)
print(f"Output shape: {logits.shape}")  # (32, 3)

# Count parameters
n_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Trainable parameters: {n_params:,}")

Practical resources

See also