Chunking Strategies That Actually Work

— Effective chunking is an information architecture problem, not a text-splitting task. This article covers practical chunking strategies that improve retrieval accuracy in real-world RAG systems.

level: intermediate topics: rag tags: rag, chunking, retrieval, vector-database, production

Why Chunking Determines RAG Success

Most RAG failures happen before the LLM is even called:

  • Query returns irrelevant documents
  • Relevant information is split across chunks
  • Retrieved chunks lack necessary context

The root cause is usually poor chunking strategy.

Chunking is not “split text into N-character pieces.” It is information architecture: how do you structure knowledge so relevant information can be found and used?

This article covers:

  • Why naive chunking fails
  • Strategies that work in production
  • Trade-offs between approaches
  • How to evaluate chunking quality

The Problem with Naive Chunking

Approach 1: Fixed Character Count

# Simple but problematic
def chunk_by_chars(text: str, size: int = 500) -> list[str]:
    chunks = []
    for i in range(0, len(text), size):
        chunks.append(text[i:i+size])
    return chunks

# Problems:
doc = "The API rate limit is 1000 requests per hour. To increase your limit, con"
# Chunk breaks mid-sentence
# Key information split across chunks

Why this fails:

  • Breaks semantic units arbitrarily
  • Context lost at boundaries
  • No concept of document structure

Approach 2: Fixed Token Count

def chunk_by_tokens(text: str, max_tokens: int = 128) -> list[str]:
    tokens = tokenize(text)
    chunks = []
    for i in range(0, len(tokens), max_tokens):
        chunk_tokens = tokens[i:i+max_tokens]
        chunks.append(detokenize(chunk_tokens))
    return chunks

# Better than characters, but still arbitrary

Why this is better but not enough:

  • Respects token boundaries (important for embeddings)
  • Still breaks semantic units
  • Ignores document structure

Approach 3: Sentence-Based

def chunk_by_sentences(text: str, sentences_per_chunk: int = 5):
    sentences = split_into_sentences(text)
    chunks = []
    for i in range(0, len(sentences), sentences_per_chunk):
        chunk = " ".join(sentences[i:i+sentences_per_chunk])
        chunks.append(chunk)
    return chunks

Progress, but limitations:

  • Preserves sentence integrity
  • But loses section/paragraph context
  • May group unrelated sentences

Strategy 1: Semantic Chunking

Principle

Chunk by meaning, not by length.

Each chunk should represent a coherent semantic unit:

  • A complete thought
  • A self-contained concept
  • An answerable piece of information

Implementation: Paragraph-Based

def chunk_by_paragraphs(text: str, max_tokens: int = 512) -> list[str]:
    """
    Chunk by paragraphs, respecting semantic boundaries.
    Combine small paragraphs, split large ones.
    """
    paragraphs = text.split('\n\n')
    chunks = []
    current_chunk = []
    current_tokens = 0

    for para in paragraphs:
        para_tokens = count_tokens(para)

        # Single paragraph too large → split at sentence boundaries
        if para_tokens > max_tokens:
            if current_chunk:
                chunks.append('\n\n'.join(current_chunk))
                current_chunk = []
                current_tokens = 0

            # Split large paragraph
            sentences = split_into_sentences(para)
            for sent in sentences:
                sent_tokens = count_tokens(sent)
                if current_tokens + sent_tokens > max_tokens:
                    chunks.append(' '.join(current_chunk))
                    current_chunk = [sent]
                    current_tokens = sent_tokens
                else:
                    current_chunk.append(sent)
                    current_tokens += sent_tokens
        else:
            # Add paragraph to current chunk if it fits
            if current_tokens + para_tokens > max_tokens:
                chunks.append('\n\n'.join(current_chunk))
                current_chunk = [para]
                current_tokens = para_tokens
            else:
                current_chunk.append(para)
                current_tokens += para_tokens

    if current_chunk:
        chunks.append('\n\n'.join(current_chunk))

    return chunks

When to use:

  • Prose documents (articles, documentation, books)
  • Content written in natural paragraphs
  • When semantic coherence matters

Strategy 2: Structure-Aware Chunking

Principle

Respect document hierarchy and structure.

Documents have inherent structure:

  • Headings and sections
  • Lists and tables
  • Code blocks
  • Metadata

Implementation: Markdown-Aware

from markdown_it import MarkdownIt

def chunk_by_structure(markdown: str, max_tokens: int = 512) -> list[dict]:
    """
    Chunk markdown respecting structure.
    Each chunk includes its context (heading hierarchy).
    """
    md = MarkdownIt()
    tokens = md.parse(markdown)

    chunks = []
    current_section = []
    heading_context = []  # Track heading hierarchy

    for token in tokens:
        if token.type == 'heading_open':
            level = int(token.tag[1])  # h1 → 1, h2 → 2, etc.

            # Finalize previous section
            if current_section:
                chunks.append({
                    'content': ''.join(current_section),
                    'context': heading_context.copy(),
                    'level': len(heading_context)
                })
                current_section = []

            # Update heading hierarchy
            heading_context = heading_context[:level-1]

        elif token.type == 'heading_close':
            # Add heading to context
            heading_text = current_section[-1] if current_section else ""
            heading_context.append(heading_text)
            current_section = []

        elif token.type == 'inline':
            current_section.append(token.content)

        else:
            current_section.append(token.content)

        # Check token limit
        chunk_tokens = count_tokens(''.join(current_section))
        if chunk_tokens > max_tokens:
            chunks.append({
                'content': ''.join(current_section),
                'context': heading_context.copy(),
                'level': len(heading_context)
            })
            current_section = []

    if current_section:
        chunks.append({
            'content': ''.join(current_section),
            'context': heading_context.copy(),
            'level': len(heading_context)
        })

    return chunks

# Example output:
# {
#   'content': 'To reset your password, click the "Forgot Password" link...',
#   'context': ['Account Management', 'Password Reset'],
#   'level': 2
# }

Augmenting Chunks with Context

def augment_chunk_with_context(chunk: dict) -> str:
    """
    Include heading hierarchy in chunk for better retrieval.
    """
    context_path = " > ".join(chunk['context'])
    return f"""
Section: {context_path}

{chunk['content']}
"""

# Query: "How do I reset my password?"
# Without context: Matches generic password text
# With context: Matches "Account Management > Password Reset" section

When to use:

  • Structured documentation
  • Technical manuals
  • Knowledge bases with clear hierarchy
  • When section context improves retrieval

Strategy 3: Sliding Window with Overlap

Principle

Prevent information loss at chunk boundaries.

If relevant information spans a boundary, neither chunk is complete.

Implementation

def chunk_with_overlap(text: str, chunk_size: int = 512, overlap: int = 128):
    """
    Create overlapping chunks to preserve boundary context.
    """
    tokens = tokenize(text)
    chunks = []

    start = 0
    while start < len(tokens):
        end = start + chunk_size
        chunk_tokens = tokens[start:end]
        chunks.append({
            'content': detokenize(chunk_tokens),
            'start': start,
            'end': min(end, len(tokens))
        })

        # Next chunk starts `overlap` tokens before current end
        start = end - overlap

    return chunks

# Example:
# Chunk 1: tokens 0-512
# Chunk 2: tokens 384-896  (overlap of 128 tokens)
# Chunk 3: tokens 768-1280 (overlap of 128 tokens)

Trade-offs

Pros:

  • Reduces boundary information loss
  • Improves recall for queries spanning sections

Cons:

  • Increases storage (duplicate content)
  • Increases retrieval complexity (duplicate results)
  • Higher embedding costs

Deduplication Strategy

def deduplicate_retrieved_chunks(chunks: list[dict]) -> list[dict]:
    """
    When retrieving overlapping chunks, merge and deduplicate.
    """
    # Sort by document position
    sorted_chunks = sorted(chunks, key=lambda c: (c['doc_id'], c['start']))

    deduplicated = []
    for chunk in sorted_chunks:
        if not deduplicated:
            deduplicated.append(chunk)
            continue

        last = deduplicated[-1]

        # Check if this chunk overlaps with previous
        if chunk['doc_id'] == last['doc_id'] and chunk['start'] < last['end']:
            # Merge overlapping chunks
            last['content'] = merge_overlapping_text(
                last['content'],
                chunk['content'],
                last['end'] - chunk['start']
            )
            last['end'] = max(last['end'], chunk['end'])
        else:
            deduplicated.append(chunk)

    return deduplicated

When to use:

  • Critical retrieval scenarios (medical, legal)
  • When information frequently spans boundaries
  • When storage cost is acceptable

Strategy 4: Query-Aware Chunking

Principle

Optimize chunks for how they will be queried.

Different query types need different chunking:

  • How-to queries: Need complete procedures
  • Definition queries: Need focused concepts
  • Comparison queries: Need related items together

Implementation: Multi-Granularity Chunking

class MultiGranularityChunker:
    """
    Create chunks at multiple levels of granularity.
    Retrieve at the granularity matching the query.
    """

    def chunk_document(self, doc: str) -> dict:
        return {
            'summary': self.extract_summary(doc),       # High-level
            'sections': self.chunk_by_sections(doc),    # Medium-level
            'paragraphs': self.chunk_by_paragraphs(doc) # Fine-grained
        }

    def retrieve(self, query: str, granularity: str):
        # Classify query type
        if self.is_summary_query(query):
            return self.search(query, index='summary')
        elif self.is_detailed_query(query):
            return self.search(query, index='paragraphs')
        else:
            return self.search(query, index='sections')

# Example queries:
# "What does this API do?" → Summary level
# "How do I authenticate?" → Section level
# "What does the timeout parameter mean?" → Paragraph level

Parent-Child Chunking

class HierarchicalChunker:
    """
    Embed small chunks, but retrieve larger parent context.
    """

    def create_chunks(self, doc: str):
        sections = split_by_sections(doc)
        chunks = []

        for section_id, section in enumerate(sections):
            paragraphs = split_by_paragraphs(section)

            for para_id, para in enumerate(paragraphs):
                chunks.append({
                    'id': f"{section_id}:{para_id}",
                    'content': para,              # Small chunk for embedding
                    'parent': section,             # Full section context
                    'metadata': {
                        'section_id': section_id,
                        'section_title': get_section_title(section)
                    }
                })

        return chunks

    def retrieve_with_context(self, query: str, top_k: int = 5):
        # Search at paragraph level (precise matching)
        matches = self.vector_search(query, top_k)

        # Return parent sections (full context)
        results = []
        for match in matches:
            results.append({
                'content': match['parent'],  # Full section, not just paragraph
                'relevance_score': match['score'],
                'metadata': match['metadata']
            })

        return results

When to use:

  • Complex documents with nested structure
  • When precise matching but broad context needed
  • When query patterns are predictable

Strategy 5: Domain-Specific Chunking

Code Documentation

def chunk_code_docs(doc: str) -> list[dict]:
    """
    Chunk technical documentation by code elements.
    """
    chunks = []

    # Each function/class is a chunk
    elements = extract_code_elements(doc)  # Functions, classes, etc.

    for element in elements:
        chunks.append({
            'type': element['type'],        # 'function', 'class', etc.
            'name': element['name'],
            'signature': element['signature'],
            'description': element['description'],
            'examples': element['examples'],
            'parameters': element['parameters'],
            'returns': element['returns']
        })

    return chunks

# Query: "How do I use the authenticate() function?"
# Retrieves: Complete function documentation including signature, params, examples

Conversational Data

def chunk_conversations(messages: list[dict]) -> list[dict]:
    """
    Chunk chat logs by conversation turns or topics.
    """
    chunks = []
    current_topic_messages = []

    for msg in messages:
        # Detect topic changes
        if detect_topic_change(current_topic_messages, msg):
            if current_topic_messages:
                chunks.append({
                    'messages': current_topic_messages,
                    'summary': summarize_conversation(current_topic_messages),
                    'participants': get_participants(current_topic_messages)
                })
            current_topic_messages = [msg]
        else:
            current_topic_messages.append(msg)

    if current_topic_messages:
        chunks.append({
            'messages': current_topic_messages,
            'summary': summarize_conversation(current_topic_messages),
            'participants': get_participants(current_topic_messages)
        })

    return chunks

Tabular Data

def chunk_tables(table: pd.DataFrame) -> list[dict]:
    """
    Chunk tables by rows or semantic groupings.
    """
    chunks = []

    # Strategy: Each row as a chunk (for small tables)
    for idx, row in table.iterrows():
        chunks.append({
            'type': 'table_row',
            'table_name': table.name,
            'columns': table.columns.tolist(),
            'values': row.to_dict(),
            'text': format_row_as_text(row, table.columns)
        })

    return chunks

# Query: "What is John's email address?"
# Retrieves: Row where name='John', including email column

When to use:

  • Specialized content types
  • When generic chunking loses critical structure
  • When domain knowledge improves retrieval

Evaluating Chunking Quality

Metric 1: Retrieval Precision

def evaluate_retrieval_precision(test_cases: list[dict], chunking_strategy):
    """
    What percentage of retrieved chunks are relevant?
    """
    results = []

    for case in test_cases:
        chunks = chunking_strategy(case['document'])
        retrieved = retrieve(case['query'], chunks, top_k=5)

        relevant_count = sum(
            1 for chunk in retrieved
            if is_relevant(chunk, case['expected_content'])
        )

        precision = relevant_count / len(retrieved)
        results.append(precision)

    return mean(results)

Metric 2: Answer Completeness

def evaluate_answer_completeness(test_cases, chunking_strategy):
    """
    Do retrieved chunks contain all information needed to answer?
    """
    results = []

    for case in test_cases:
        chunks = chunking_strategy(case['document'])
        retrieved = retrieve(case['query'], chunks, top_k=5)

        # Generate answer from retrieved chunks
        answer = generate_answer(case['query'], retrieved)

        # Check if answer contains expected information
        completeness = check_completeness(answer, case['expected_answer'])
        results.append(completeness)

    return mean(results)

Metric 3: Chunk Coherence

def evaluate_chunk_coherence(chunks: list[str]) -> float:
    """
    Do chunks represent coherent semantic units?
    """
    coherence_scores = []

    for chunk in chunks:
        # Measure semantic coherence
        sentences = split_into_sentences(chunk)
        if len(sentences) < 2:
            coherence_scores.append(1.0)
            continue

        # Compare sentence embeddings
        embeddings = [embed(sent) for sent in sentences]
        similarities = []
        for i in range(len(embeddings) - 1):
            sim = cosine_similarity(embeddings[i], embeddings[i+1])
            similarities.append(sim)

        coherence_scores.append(mean(similarities))

    return mean(coherence_scores)

Practical Implementation Guide

Step 1: Analyze Your Data

def analyze_document_structure(docs: list[str]):
    """
    Understand document characteristics before choosing strategy.
    """
    analysis = {
        'avg_length': mean([len(doc) for doc in docs]),
        'has_structure': check_for_headings(docs),
        'content_type': detect_content_type(docs),  # prose, code, tables
        'typical_queries': analyze_query_patterns()
    }
    return analysis

Step 2: Choose Base Strategy

def select_chunking_strategy(analysis: dict):
    if analysis['has_structure']:
        return structure_aware_chunking
    elif analysis['content_type'] == 'code':
        return code_specific_chunking
    elif analysis['content_type'] == 'conversation':
        return conversation_chunking
    else:
        return semantic_chunking

Step 3: Test and Iterate

# Compare strategies empirically
strategies = [
    ('naive', chunk_by_tokens),
    ('semantic', chunk_by_paragraphs),
    ('structure', chunk_by_structure),
    ('overlap', lambda d: chunk_with_overlap(d, overlap=128))
]

results = []
for name, strategy in strategies:
    precision = evaluate_retrieval_precision(test_cases, strategy)
    completeness = evaluate_answer_completeness(test_cases, strategy)
    results.append({
        'strategy': name,
        'precision': precision,
        'completeness': completeness
    })

best_strategy = max(results, key=lambda r: r['precision'] + r['completeness'])

Common Pitfalls

Pitfall 1: Optimizing for Chunk Count

# Wrong: Trying to minimize number of chunks
# Right: Optimizing for retrieval quality

Pitfall 2: One-Size-Fits-All

# Wrong: Same chunking for all document types
# Right: Strategy matched to content type

Pitfall 3: Ignoring Query Patterns

# Wrong: Chunking without understanding queries
# Right: Analyze queries, optimize chunks accordingly

Pitfall 4: No Evaluation

# Wrong: Choose strategy based on intuition
# Right: Measure retrieval quality empirically

Conclusion

Chunking is not text splitting. It is information architecture.

Effective chunking requires:

  1. Understanding content structure: Respect document hierarchy
  2. Preserving semantic units: Avoid splitting coherent information
  3. Matching query patterns: Optimize for how content will be queried
  4. Empirical validation: Measure retrieval quality, not chunk count

Start with semantic or structure-aware chunking. Add complexity (overlap, multi-granularity, domain-specific) only when justified by measurement.

The best chunking strategy is the one that surfaces relevant information when users ask questions.

Continue learning

Next in this path

Retrieval Is the Hard Part

Most RAG failures stem from poor retrieval, not weak models. This article explains why retrieval is difficult, how to improve it, and how to debug retrieval failures systematically.