Natural language processing (NLP)
AI for understanding and generating human language.
Definition
Natural language processing (NLP) is the branch of AI that deals with the intersection of computers and human language — enabling machines to read, understand, interpret, and generate text and speech. The field covers a wide spectrum of tasks: text classification (spam detection, sentiment analysis), structured extraction (named entity recognition, relation extraction), question answering, summarization, translation, and open-ended generation. Each task requires a model that can map variable-length natural language inputs to useful outputs.
Modern NLP is dominated by pretrained transformer models. The pretraining-finetuning paradigm — training a large model on massive corpora to learn general language representations, then adapting it to specific tasks — has replaced hand-engineered feature pipelines and task-specific architectures. Models like BERT (bidirectional, good for classification and extraction) and GPT (autoregressive, good for generation) represent different ends of the transformer spectrum. LLMs like GPT-4, Claude, and Llama 3 extend this further: a single model handles many tasks with the right prompt, reducing the need for separate fine-tuned models per task.
Inputs to NLP models are discrete tokens (subwords or words) produced by tokenization. Models learn rich contextual embeddings where the representation of a word depends on its context. RAG and agents extend NLP systems by adding retrieval and tool use on top of language models, enabling grounded question answering and multi-step task completion beyond what fits in a single context window.
How it works
Tokenization and embedding
Raw text is first split into subword tokens using algorithms like BPE (byte-pair encoding) or WordPiece. Each token is mapped to a learned embedding vector. Positional encodings are added to preserve word order. The result is a sequence of vectors that the transformer processes.
Transformer encoding and task heads
Transformer layers apply multi-head self-attention and feed-forward sublayers to produce contextual representations — each token's embedding now reflects its full context. A task head maps these representations to outputs: a classification head adds a linear layer over the [CLS] token; a generation head predicts the next token autoregressively; a span head predicts start and end positions for QA.
Pretraining and adaptation
Models are pretrained on large corpora using self-supervised objectives (masked language modeling for BERT-style, next-token prediction for GPT-style). Adaptation to downstream tasks happens via fine-tuning (updating all or some weights on labeled data) or prompting (providing instructions and examples in context without weight updates). Parameter-efficient methods like LoRA allow fine-tuning with far fewer trainable parameters.
When to use / When NOT to use
| Use when | Avoid when |
|---|---|
| Input or output is natural language text at any scale | Data is purely numerical, tabular, or structured — classical ML may suffice |
| Tasks include classification, extraction, QA, summarization, or generation | Strict latency or memory constraints rule out transformer inference |
| You want to leverage pretrained models to reduce labeled data needs | You need symbolic, rule-based processing that must be 100% auditable |
| Building chatbots, search, or document understanding pipelines | Domain vocabulary is so specialized that pretrained models need extensive retraining |
Comparisons
| Approach | Strengths | Limitations |
|---|---|---|
| BERT-style encoders | Strong classification and extraction | Not generative; needs fine-tuning per task |
| GPT-style decoders (LLMs) | Generalist, few-shot, generative | Higher compute; harder to constrain output format |
| Fine-tuned task models | High performance on specific tasks | Requires labeled data; one model per task |
| Prompt engineering (zero/few-shot) | Fast iteration, no training | Less reliable for complex structured tasks |
Pros and cons
| Pros | Cons |
|---|---|
| Pretrained models transfer well across tasks and domains | Large models are computationally expensive to run and fine-tune |
| A single LLM handles many tasks with prompting | Output quality depends heavily on prompt design and context |
| Rich ecosystem of open-source models and tooling | Tokenization introduces artifacts and limits handling of rare words |
| Strong zero-shot and few-shot capabilities | Hallucination and inconsistency remain challenges for generation tasks |
Code examples
Text classification with Hugging Face Transformers (Python)
from transformers import pipeline
# Zero-shot classification — no fine-tuning needed
classifier = pipeline(
"zero-shot-classification",
model="facebook/bart-large-mnli",
)
text = "The new firmware update fixed the battery drain issue on the smartphone."
candidate_labels = ["technology", "sports", "finance", "politics"]
result = classifier(text, candidate_labels)
print(f"Text: {text}")
for label, score in zip(result["labels"], result["scores"]):
print(f" {label}: {score:.3f}")Practical resources
- Hugging Face – NLP course — Hands-on course covering transformers, fine-tuning, and deployment
- Stanford CS224N – NLP with Deep Learning — University course with lecture notes and assignments
- Hugging Face Model Hub — Thousands of pretrained models for every NLP task
- NLTK Book — Classic introduction to NLP fundamentals
- The Illustrated Transformer (Jay Alammar) — Visual explanation of transformer architecture