AI Summary Hub

LlamaIndex

Data framework for LLM applications and RAG.

Definition

LlamaIndex (formerly GPT Index) is a data framework that bridges large language models and your own data sources. Its primary purpose is ingesting, indexing, and querying documents, databases, and APIs so that LLMs can answer questions grounded in private or domain-specific information. It provides a high degree of control over every stage of retrieval-augmented generation: data loading, node parsing (chunking), embedding selection, index construction, retrieval strategy, reranking, and response synthesis.

Where LangChain emphasizes composable orchestration and agent loops, LlamaIndex is optimized for the data layer: you can swap chunking strategies, retrieval algorithms, and synthesis approaches without rebuilding the pipeline. It ships with query engines, chat engines, and sub-question decomposition out of the box. Multiple index types (vector, summary, knowledge graph, keyword) can be combined in a single query for hybrid retrieval.

LlamaIndex also supports agents: query engines can be registered as tools, and agent reasoning loops (ReAct, OpenAI function calling) can select which engine to query. An evaluation suite (faithfulness, relevance, context precision) helps diagnose RAG quality and guides chunking or retrieval tuning for production.

How it works

Ingestion pipeline

Query pipeline

Key abstractions

Nodes are the unit of retrieval — chunks of a document with metadata. Index stores nodes and supports vector, keyword, or graph-based lookup. Query engine wraps index + retriever + synthesizer into a single callable. Chat engine maintains conversation history. Sub-question engine decomposes complex queries into simpler ones distributed across multiple indices.

When to use / When NOT to use

ScenarioUse LlamaIndexDo NOT use LlamaIndex
RAG over large document corpora with chunking controlYes — fine-grained node parsers and multiple index types
Connecting LLMs to internal databases and APIsYes — data connectors for SQL, Notion, Slack, S3, etc.
Evaluating retrieval faithfulness and relevanceYes — built-in evaluation modules
Multi-step agent workflows calling many external APIsPrefer LangChain for richer agent tooling
Simple single-turn completions without retrievalOverhead is unnecessary; call the LLM API directly
Production pipeline needing LangSmith tracingIntegrate with LangChain or use a dedicated tracing tool

Comparisons

FeatureLlamaIndexLangChain
Primary focusData indexing and retrieval (RAG)Orchestration, chains, agents
Chunking controlFine-grained node parsersHigh-level text splitters
Index typesVector, keyword, graph, summary, hybridPrimarily vector via retrievers
EvaluationBuilt-in (faithfulness, relevance)Via LangSmith
Agent supportQuery engines as tools, ReActFirst-class LCEL agent
Best forDeep RAG over large corporaMulti-step agent orchestration

Pros and cons

ProsCons
Fine-grained control over every RAG stageSteeper learning curve than simple LLM wrappers
Multiple index types including knowledge graphsFewer non-RAG integrations compared to LangChain
Built-in evaluation suite for production RAGSome abstractions add verbosity
Composable pipelines that swap components easilyDocumentation can lag behind rapid releases

Code examples

# Simple RAG pipeline with LlamaIndex
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

# Configure LLM and embedding model
Settings.llm = OpenAI(model="gpt-4o-mini")

# 1. Load documents from a directory
documents = SimpleDirectoryReader("./data").load_data()

# 2. Build a vector index (embeds and stores nodes automatically)
index = VectorStoreIndex.from_documents(documents)

# 3. Create a query engine with top-k retrieval
query_engine = index.as_query_engine(similarity_top_k=3)

# 4. Query
response = query_engine.query("What are the main topics covered?")
print(response)

Tips for effective use

  • Choose chunk size based on your documents: 256–512 tokens works well for factual Q&A; 1024+ for summarization tasks.
  • Use a reranker (e.g. SentenceTransformerRerank) to improve retrieval precision without changing the index.
  • Combine a vector index for semantic search with a keyword index for exact-match retrieval using a QueryFusionRetriever.
  • Run the built-in evaluation suite periodically during development to catch regressions in retrieval quality.
  • Use IngestionPipeline with a RedisDocumentStore for incremental ingestion so documents are not re-embedded on re-run.

Practical resources

See also