DocumentTokenChunker Class
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Processes a document by tokenizing its content and dividing it into overlapping chunks of tokens.
public ref class DocumentTokenChunker sealed : Microsoft::Extensions::DataIngestion::IngestionChunker<System::String ^>
public sealed class DocumentTokenChunker : Microsoft.Extensions.DataIngestion.IngestionChunker<string>
type DocumentTokenChunker = class
inherit IngestionChunker<string>
Public NotInheritable Class DocumentTokenChunker
Inherits IngestionChunker(Of String)
- Inheritance
Remarks
This class uses a tokenizer to convert the document's content into tokens and then splits the tokens into chunks of a specified size, with a configurable overlap between consecutive chunks.
Note that tables may be split mid-row.
Constructors
| Name | Description |
|---|---|
| DocumentTokenChunker(IngestionChunkerOptions) |
Initializes a new instance of the DocumentTokenChunker class with the specified options. |
Methods
| Name | Description |
|---|---|
| ProcessAsync(IngestionDocument, CancellationToken) |
Splits a document into chunks asynchronously. |