Vector databases
Storing and searching embeddings for RAG.
Definition
Vector databases store high-dimensional vectors (embeddings) and support fast similarity search using algorithms such as k-nearest-neighbor (k-NN) and approximate nearest neighbor (ANN). They are the backbone of the retrieval layer in RAG systems, enabling semantic search at scale over millions of document chunks.
They sit between embeddings (which produce the vectors) and the RAG retriever (which needs the top-k chunks for a given query). Unlike traditional keyword-based databases, vector databases measure semantic distance: "customer support" can match "help desk" if the embedding model places them close together. Most vector databases also support metadata filtering — you can restrict retrieval to documents from a certain date, category, or source.
Choosing the right vector database depends on your requirements: managed vs. self-hosted, scale (thousands vs. hundreds of millions of vectors), metadata filtering capabilities, hybrid search support (dense + sparse), and whether you need multi-tenancy or access control. See RAG architecture for how the index fits into the full pipeline.
How it works
Indexing and querying
Index types
Documents are embedded and their vectors are written to an index (e.g. HNSW, IVF, or flat for small datasets). At query time, the query vector is compared against the index via k-NN (or approximate k-NN for scale); the index returns top-k ids and optionally stored metadata. You then fetch the corresponding chunks and pass them to the LLM. HNSW (Hierarchical Navigable Small World) is the most popular ANN algorithm — it offers sub-linear query time with high recall. Flat indexes are exact but O(n) and only suitable for small datasets.
When to use / When NOT to use
| Scenario | Use vector DB | Don't use vector DB |
|---|---|---|
| Semantic search over large document corpora | Yes — ANN indexes handle scale | No — keyword search if you only need exact phrase matching |
| RAG with millions of chunks | Yes — purpose-built for vector scale | No — relational DBs with pgvector may suffice below ~1M vectors |
| Hybrid search (semantic + BM25) | Yes — Weaviate, Qdrant support hybrid natively | No — pure dense if your queries are always semantic |
| Multi-tenant SaaS with isolated namespaces | Yes — Pinecone and Weaviate support namespacing | No — self-hosted FAISS has no multi-tenancy |
| Offline, local development | Yes — Chroma or FAISS with zero infra | No — managed cloud DBs add cost and network latency for dev |
Comparisons
| Database | Hosting | Scale | Hybrid search | Metadata filters | Best for |
|---|---|---|---|---|---|
| Pinecone | Managed cloud | Very large (billions) | Yes (sparse + dense) | Yes | Production at scale, no infra management |
| Chroma | Self-hosted / embedded | Small–medium | No (dense only) | Yes | Local dev, prototyping, Python-native |
| Weaviate | Self-hosted or cloud | Large | Yes (BM25 + dense) | Yes | Production with hybrid search |
| FAISS | Self-hosted (library) | Large | No | No | Research, offline batch search |
| pgvector | PostgreSQL extension | Medium | Partial (with FTS) | Yes (SQL) | Teams already on Postgres |
| Qdrant | Self-hosted or cloud | Large | Yes | Yes | Low-latency, Rust-based, open-source |
Pros and cons
| Pros | Cons |
|---|---|
| Sub-linear query time with ANN indexes | ANN introduces recall tradeoff vs. exact search |
| Supports semantic similarity out of the box | Vector storage is expensive at very high dimensions |
| Metadata filters let you combine semantic + structured queries | Managed services add ongoing cloud cost |
| Scales horizontally for large corpora | No native understanding of text — depends on embedding quality |
Code examples
import chromadb
from openai import OpenAI
openai_client = OpenAI()
chroma_client = chromadb.Client()
collection = chroma_client.create_collection("my_docs")
# Helper: embed text
def embed(text: str) -> list[float]:
return openai_client.embeddings.create(
model="text-embedding-3-small",
input=text,
).data[0].embedding
# Index documents
documents = [
"Returns are accepted within 30 days of purchase.",
"Shipping takes 3–5 business days.",
"Contact support at support@example.com.",
]
collection.add(
documents=documents,
embeddings=[embed(d) for d in documents],
ids=[f"doc_{i}" for i in range(len(documents))],
)
# Query
query = "What is the return window?"
results = collection.query(
query_embeddings=[embed(query)],
n_results=2,
)
for doc in results["documents"][0]:
print(doc)Practical resources
- Chroma – Get started — Embedded vector store for Python, ideal for local development
- Pinecone – Vector database docs — Managed cloud vector DB with serverless and pod-based options
- Weaviate – Documentation — Open-source vector DB with native hybrid search
- FAISS – GitHub — Facebook AI Similarity Search library for local, high-performance indexing
- pgvector – GitHub — Vector similarity search extension for PostgreSQL