Chunking

Simple Definition

Chunking is the process of dividing large pieces of text — like a long document, PDF, or website — into smaller sections before storing them in an AI system. Each chunk is stored and searched independently, so the AI can retrieve just the relevant part of a large document rather than trying to process the whole thing at once.

Why Chunking Is Necessary

AI models have a context window limit — they can only process so much text at a time. A 500-page document won’t fit. And even if it did, the model would struggle to focus on the specific section that answers your question.

By splitting documents into chunks (e.g., paragraphs, sections, 500-word blocks), the system can:

Index each chunk as a searchable unit
Retrieve only the relevant chunks for a given question
Pass a focused, manageable set of text to the AI for answering

How Chunking Works in Practice

A document (PDF, article, internal wiki) is uploaded to a RAG system
The system splits it into chunks — maybe every 500 tokens, or by paragraph, or by heading
Each chunk is converted into an embedding and stored in a vector database
When a user asks a question, the relevant chunks are retrieved and given to the AI
The AI answers based on those chunks — not the entire document

Chunk Size Matters

Getting chunk size right is a real challenge:

Too small — individual chunks lose context (a sentence without its surrounding paragraph may be meaningless)
Too large — chunks become unfocused, retrieval becomes less precise, and you hit context limits faster

A common starting point is 300–600 tokens per chunk, with some overlap between consecutive chunks to avoid cutting off important context at the boundaries.

Types of Chunking

Fixed size — split every N tokens, regardless of content
Sentence/paragraph splitting — split at natural language boundaries
Semantic chunking — split where meaning shifts, using embeddings to detect topic changes
Hierarchical chunking — store both summaries and detailed chunks for different retrieval needs