Chunking
Simple Definition
Chunking is the process of dividing large pieces of text — like a long document, PDF, or website — into smaller sections before storing them in an AI system. Each chunk is stored and searched independently, so the AI can retrieve just the relevant part of a large document rather than trying to process the whole thing at once.
Why Chunking Is Necessary
AI models have a context window limit — they can only process so much text at a time. A 500-page document won’t fit. And even if it did, the model would struggle to focus on the specific section that answers your question.
By splitting documents into chunks (e.g., paragraphs, sections, 500-word blocks), the system can:
- Index each chunk as a searchable unit
- Retrieve only the relevant chunks for a given question
- Pass a focused, manageable set of text to the AI for answering
How Chunking Works in Practice
- A document (PDF, article, internal wiki) is uploaded to a RAG system
- The system splits it into chunks — maybe every 500 tokens, or by paragraph, or by heading
- Each chunk is converted into an embedding and stored in a vector database
- When a user asks a question, the relevant chunks are retrieved and given to the AI
- The AI answers based on those chunks — not the entire document
Chunk Size Matters
Getting chunk size right is a real challenge:
- Too small — individual chunks lose context (a sentence without its surrounding paragraph may be meaningless)
- Too large — chunks become unfocused, retrieval becomes less precise, and you hit context limits faster
A common starting point is 300–600 tokens per chunk, with some overlap between consecutive chunks to avoid cutting off important context at the boundaries.
Types of Chunking
- Fixed size — split every N tokens, regardless of content
- Sentence/paragraph splitting — split at natural language boundaries
- Semantic chunking — split where meaning shifts, using embeddings to detect topic changes
- Hierarchical chunking — store both summaries and detailed chunks for different retrieval needs
Related Terms
- RAG — chunking is a foundational step in every RAG pipeline
- Context Window — the limit that makes chunking necessary
- Semantic Search — chunks are what semantic search retrieves
- Vector Database — where chunks are stored as embeddings
- Embedding — each chunk is converted into an embedding for storage and retrieval
Continue learning
Explore related guides, tools, workflows, and prompts that help you go deeper into this topic.
Browse all AI terms.
Learn termSee these concepts in practice.
Open workflowA simple explanation of this AI concept.
Learn termA simple explanation of this AI concept.
Learn termA simple explanation of this AI concept.
Learn termA simple explanation of this AI concept.
Learn termSee AI terms in action
Browse practical AI workflows that use the concepts in this glossary.
Last updated: