Share via


text_chunker Module

A Text splitter.

Split text in chunks, attempting to leave meaning intact. For plain text, split looking at new lines first, then periods, and so on. For markdown, split looking at punctuation first, and so on.

Functions

split_markdown_lines

Split markdown into lines.

It will split on punctuation first, and then on space and new lines.

split_markdown_lines(text: str, max_token_per_line: int, token_counter: ~collections.abc.Callable = <function _token_counter>) -> list[str]

Parameters

Name Description
text
Required
max_token_per_line
Required
token_counter

split_markdown_paragraph

Split markdown into paragraphs.

split_markdown_paragraph(text: list[str], max_tokens: int, token_counter: ~collections.abc.Callable = <function _token_counter>) -> list[str]

Parameters

Name Description
text
Required
max_tokens
Required
token_counter

split_plaintext_lines

Split plain text into lines.

it will split on new lines first, and then on punctuation.

split_plaintext_lines(text: str, max_token_per_line: int, token_counter: ~collections.abc.Callable = <function _token_counter>) -> list[str]

Parameters

Name Description
text
Required
max_token_per_line
Required
token_counter

split_plaintext_paragraph

Split plain text into paragraphs.

split_plaintext_paragraph(text: list[str], max_tokens: int, token_counter: ~collections.abc.Callable = <function _token_counter>) -> list[str]

Parameters

Name Description
text
Required
max_tokens
Required
token_counter