AI Summary Hub

Vector databases

Storing and searching embeddings for RAG.

Definition

Vector databases store high-dimensional vectors (embeddings) and support fast similarity search using algorithms such as k-nearest-neighbor (k-NN) and approximate nearest neighbor (ANN). They are the backbone of the retrieval layer in RAG systems, enabling semantic search at scale over millions of document chunks.

They sit between embeddings (which produce the vectors) and the RAG retriever (which needs the top-k chunks for a given query). Unlike traditional keyword-based databases, vector databases measure semantic distance: "customer support" can match "help desk" if the embedding model places them close together. Most vector databases also support metadata filtering — you can restrict retrieval to documents from a certain date, category, or source.

Choosing the right vector database depends on your requirements: managed vs. self-hosted, scale (thousands vs. hundreds of millions of vectors), metadata filtering capabilities, hybrid search support (dense + sparse), and whether you need multi-tenancy or access control. See RAG architecture for how the index fits into the full pipeline.

How it works

Indexing and querying

Index types

Documents are embedded and their vectors are written to an index (e.g. HNSW, IVF, or flat for small datasets). At query time, the query vector is compared against the index via k-NN (or approximate k-NN for scale); the index returns top-k ids and optionally stored metadata. You then fetch the corresponding chunks and pass them to the LLM. HNSW (Hierarchical Navigable Small World) is the most popular ANN algorithm — it offers sub-linear query time with high recall. Flat indexes are exact but O(n) and only suitable for small datasets.

When to use / When NOT to use

ScenarioUse vector DBDon't use vector DB
Semantic search over large document corporaYes — ANN indexes handle scaleNo — keyword search if you only need exact phrase matching
RAG with millions of chunksYes — purpose-built for vector scaleNo — relational DBs with pgvector may suffice below ~1M vectors
Hybrid search (semantic + BM25)Yes — Weaviate, Qdrant support hybrid nativelyNo — pure dense if your queries are always semantic
Multi-tenant SaaS with isolated namespacesYes — Pinecone and Weaviate support namespacingNo — self-hosted FAISS has no multi-tenancy
Offline, local developmentYes — Chroma or FAISS with zero infraNo — managed cloud DBs add cost and network latency for dev

Comparisons

DatabaseHostingScaleHybrid searchMetadata filtersBest for
PineconeManaged cloudVery large (billions)Yes (sparse + dense)YesProduction at scale, no infra management
ChromaSelf-hosted / embeddedSmall–mediumNo (dense only)YesLocal dev, prototyping, Python-native
WeaviateSelf-hosted or cloudLargeYes (BM25 + dense)YesProduction with hybrid search
FAISSSelf-hosted (library)LargeNoNoResearch, offline batch search
pgvectorPostgreSQL extensionMediumPartial (with FTS)Yes (SQL)Teams already on Postgres
QdrantSelf-hosted or cloudLargeYesYesLow-latency, Rust-based, open-source

Pros and cons

ProsCons
Sub-linear query time with ANN indexesANN introduces recall tradeoff vs. exact search
Supports semantic similarity out of the boxVector storage is expensive at very high dimensions
Metadata filters let you combine semantic + structured queriesManaged services add ongoing cloud cost
Scales horizontally for large corporaNo native understanding of text — depends on embedding quality

Code examples

import chromadb
from openai import OpenAI

openai_client = OpenAI()
chroma_client = chromadb.Client()
collection = chroma_client.create_collection("my_docs")

# Helper: embed text
def embed(text: str) -> list[float]:
    return openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
    ).data[0].embedding

# Index documents
documents = [
    "Returns are accepted within 30 days of purchase.",
    "Shipping takes 3–5 business days.",
    "Contact support at support@example.com.",
]
collection.add(
    documents=documents,
    embeddings=[embed(d) for d in documents],
    ids=[f"doc_{i}" for i in range(len(documents))],
)

# Query
query = "What is the return window?"
results = collection.query(
    query_embeddings=[embed(query)],
    n_results=2,
)
for doc in results["documents"][0]:
    print(doc)

Practical resources

See also