DocumentTokenChunker Class

Definition

Processes a document by tokenizing its content and dividing it into overlapping chunks of tokens.

public ref class DocumentTokenChunker sealed : Microsoft::Extensions::DataIngestion::IngestionChunker<System::String ^>
public sealed class DocumentTokenChunker : Microsoft.Extensions.DataIngestion.IngestionChunker<string>
type DocumentTokenChunker = class
    inherit IngestionChunker<string>
Public NotInheritable Class DocumentTokenChunker
Inherits IngestionChunker(Of String)
Inheritance
DocumentTokenChunker

Remarks

This class uses a tokenizer to convert the document's content into tokens and then splits the tokens into chunks of a specified size, with a configurable overlap between consecutive chunks.

Note that tables may be split mid-row.

Constructors

Name Description
DocumentTokenChunker(IngestionChunkerOptions)

Initializes a new instance of the DocumentTokenChunker class with the specified options.

Methods

Name Description
ProcessAsync(IngestionDocument, CancellationToken)

Splits a document into chunks asynchronously.

Applies to