How to set custom chunking on azure ai search?

Question

How to set custom chunking on azure ai search?

Mason 50

I want to upload documents to azure ai search. I would like each document be a single chunk, no matter how long or short the document is. How do I do this?

Answer accepted by question author

1 additional answer

Your answer

Answer 1

Hello Mason,

Thanks for raising this question, and a big thank you to John Burkholder for sharing such a detailed response and the official documentation link 🙌.

Let me add few more workarounds that addresses your requirement:
1)Set a Large Chunk Size to Prevent Splitting
If you must use Text Split (e.g., with Document Layout enrichment), set maximumPageLength to a number larger than your largest document (e.g., 50,000 or 100,000 characters).

{
  "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
  "context": "/document",
  "textSplitMode": "pages",
  "maximumPageLength": 50000,
  "pageOverlapLength": 0
}

2)Direct Document Upload Without Chunking

When uploading via the Azure Search REST API or SDK, simply upload each document in full—no chunking is performed unless your pipeline explicitly splits content.

Python SDK Example:

from azure.search.documents import SearchClient

search_client.upload_documents([{
    "@search.action": "upload",
    "id": "doc1",
    "content": entire_document_content,
    "title": "Document Title"
}])

Each document will be treated as a single searchable chunk.

Points to remember:
1)Embedding Model Token Limits

Most embedding models (e.g., text-embedding-ada-002) have a maximum token limit (usually 8,191 tokens; around 6,000 words).

If your document exceeds this limit, the embedding will be truncated.

For very large documents, consider summarizing or chunking only when necessary.

2)Performance Trade-offs

While treating each document as a single chunk is possible, large documents can affect search response times and relevance. Consider the use case and test for optimal performance.

Reference:
Chunking Techniques

Please feel free to accept this as an answer and upvote.
Thank you for reaching out to the Microsoft QNA Portal. 😊

Nikhil Jha (Accenture International Limited) 4,310 Reputation points Microsoft External Staff Moderator

2025-09-02T10:12:32.57+00:00

Hello Mason,
I hope this has been helpful! We appreciate hearing from you and would love to help others who may have the same question. Accepting answers helps increase visibility of this question for other members of the Microsoft Q&A community. Thank you for helping to improve Microsoft Q&A!

Answer 2

To ensure each document is treated as a single chunk in Azure AI Search—regardless of its length—you’ll need to bypass or customize the default chunking behavior typically used for embedding and indexing. Here's how you can do it:

✅ Key Steps to Set Custom Chunking (One Chunk per Document)

Avoid Built-in Chunking Skills:
- Azure AI Search uses skills like Text Split or Document Layout to chunk documents by pages, sentences, or semantic structure.
- To keep each document as a single chunk, do not use these skills in your indexing pipeline.
Use Custom Skill or Preprocessing:
- You can create a custom skill or preprocess your documents before uploading, ensuring each document is passed as a single unit.
- This means your data source (e.g., blob storage or JSON) should contain one document per record, and you should not split it during ingestion.
Disable Integrated Vectorization Chunking:
- If you're using integrated vectorization, it automatically applies chunking. To override this:
- Use external embedding generation (e.g., via LangChain or Semantic Kernel).
- Generate embeddings for the entire document and upload them manually to the index.
Indexing Strategy:
- Use one-to-one indexing: each document in your data source maps to one search document.
- Avoid one-to-many indexing or any parsing that breaks documents into smaller parts.
Example with LangChain: If you're using LangChain, you can set chunking parameters like this to effectively disable chunking:
```
   from langchain.text_splitter import RecursiveCharacterTextSplitter
```

text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000000, # Very large chunk size chunk_overlap=0 ) chunks = text_splitter.split_documents([your_document])

   
    This ensures the entire document is treated as one chunk [[1]](https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-chunk-documents).
   
1. **Embedding Size Consideration**:

   - Be aware of **token limits** for embedding models (e.g., 8191 tokens for `text-embedding-ada-002`).
   
   - If your document exceeds this, you may need to split it or truncate it manually before embedding.
   
[[1]](https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-chunk-documents): [https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-chunk-documents](https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-chunk-documents)

 References

[1] [Chunk documents in vector search - Azure AI Search](https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-chunk-documents)

Mason 50 Reputation points

2025-08-29T19:08:18.1533333+00:00

How do you upload your customized chunking from something like langchain into azure ai search?

Share via

How to set custom chunking on azure ai search?

1 additional answer

Your answer