Share via


data Package

Packages

filter_clauses
record_definition
text_search
vector_search
vector_storage

Modules

const
kernel_search_results
search_filter
search_options

Classes

AnyTagsEqualTo

A filter clause for a any tags equals comparison.

Args: field_name: The name of the field containing the list of tags. value: The value to compare against the list of tags.

Note: This class is marked as 'experimental' and may change in the future.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

EqualTo

A filter clause for an equals comparison.

Args: field_name: The name of the field to compare. value: The value to compare against the field.

Note: This class is marked as 'experimental' and may change in the future.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

KernelSearchResults

The result of a kernel search.

Note: This class is marked as 'experimental' and may change in the future.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

OptionsUpdateFunctionType

Type definition for the options update function in Text Search.

SearchOptions

Options for a search.

Note: This class is marked as 'experimental' and may change in the future.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

TextSearch

The base class for all text searches.

Note: This class is marked as 'experimental' and may change in the future.

TextSearchFilter

A filter clause for a text search query.

Note: This class is marked as 'experimental' and may change in the future.

Initialize a new instance of SearchFilter.

TextSearchOptions

Options for a text search.

Note: This class is marked as 'experimental' and may change in the future.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

TextSearchResult

The result of a text search.

Note: This class is marked as 'experimental' and may change in the future.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

VectorSearchBase

Method for searching vectors.

Note: This class is marked as 'experimental' and may change in the future.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

VectorSearchFilter

A filter clause for a vector search query.

Note: This class is marked as 'experimental' and may change in the future.

Initialize a new instance of VectorSearchFilter.

VectorSearchOptions

Options for vector search, builds on TextSearchOptions.

Note: This class is marked as 'experimental' and may change in the future.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

VectorSearchResult

The result of a vector search.

Note: This class is marked as 'experimental' and may change in the future.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

VectorStore

Base class for vector stores.

Note: This class is marked as 'experimental' and may change in the future.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

VectorStoreRecordCollection

Base class for a vector store record collection.

Note: This class is marked as 'experimental' and may change in the future.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

VectorStoreRecordDataField

Memory record data field.

Note: This class is marked as 'experimental' and may change in the future.

VectorStoreRecordDefinition

Memory record definition.

Args: fields: The fields of the record. container_mode: Whether the record is in container mode. to_dict: The to_dict function, should take a record and return a list of dicts. from_dict: The from_dict function, should take a list of dicts and return a record. serialize: The serialize function, should take a record and return the type specific to a datastore. deserialize: The deserialize function, should take a type specific to a datastore and return a record.

Note: This class is marked as 'experimental' and may change in the future.

VectorStoreRecordKeyField

Memory record key field.

Note: This class is marked as 'experimental' and may change in the future.

VectorStoreRecordUtils

Helper class to easily add embeddings to a (set of) vector store record.

Note: This class is marked as 'experimental' and may change in the future.

Initializes the VectorStoreRecordUtils with a kernel.

VectorStoreRecordVectorField

Memory record vector field.

Most vectors stores use a list[float] as the data type for vectors. This is the default and all vector stores in SK use this internally. But in your class you may want to use a numpy array or some other optimized type, in order to support that, you can set the deserialize_function to a function that takes a list of floats and returns the optimized type, and then also supply a serialize_function that takes the optimized type and returns a list of floats.

For instance for numpy, that would be serialize_function=np.ndarray.tolist and deserialize_function=np.array, (with import numpy as np at the top of your file). if you want to set it up with more specific options, use a lambda, a custom function or a partial.

Args: property_type (str, optional): Property type. For vectors this should be the inner type of the vector. By default the vector will be a list of numbers. If you want to use a numpy array or some other optimized format, set the cast_function with a function that takes a list of floats and returns a numpy array.

  local_embedding (bool, optional): Whether to embed the vector locally. Defaults to True.
  embedding_settings (dict[str, PromptExecutionSettings], optional): Embedding settings.

     The key is the name of the embedding service to use, can be multiple ones.

  serialize_function (Callable[[Any], list[float | int]], optional): Serialize function,
     should take the vector and return a list of numbers.

  deserialize_function (Callable[[list[float | int]], Any], optional): Deserialize function,
     should take a list of numbers and return the vector.

Note: This class is marked as 'experimental' and may change in the future.

VectorStoreTextSearch

Class that wraps a Vector Store Record Collection to expose as a Text Search.

Preferably the class methods are used to create an instance of this class. Otherwise the search executes in the following order depending on which store was set:

  1. vectorizable_text_search
  2. vector_text_search
  3. vectorized_search (after calling the embedding service)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

VectorTextSearchMixin

The mixin for text search, to be used in combination with VectorSearchBase.

Note: This class is marked as 'experimental' and may change in the future.

VectorizableTextSearchMixin

The mixin for searching with text that get's vectorized downstream.

To be used in combination with VectorSearchBase.

Note: This class is marked as 'experimental' and may change in the future.

VectorizedSearchMixin

The mixin for searching with vectors. To be used in combination with VectorSearchBase.

Note: This class is marked as 'experimental' and may change in the future.

Enums

DistanceFunction

Distance functions for similarity search.

Cosine Similarity the cosine (angular) similarity between two vectors measures only the angle between the two vectors, without taking into account the length of the vectors Cosine Similarity = 1 - Cosine Distance -1 means vectors are opposite 0 means vectors are orthogonal 1 means vectors are identical

Cosine Distance the cosine (angular) distance between two vectors measures only the angle between the two vectors, without taking into account the length of the vectors Cosine Distance = 1 - Cosine Similarity 2 means vectors are opposite 1 means vectors are orthogonal 0 means vectors are identical

Dot Product measures both the length and angle between two vectors same as cosine similarity if the vectors are the same length, but more performant

Euclidean Distance measures the Euclidean distance between two vectors also known as l2-norm

Euclidean Squared Distance measures the Euclidean squared distance between two vectors also known as l2-squared

Manhattan measures the Manhattan distance between two vectors

Hamming number of differences between vectors at each dimensions

IndexKind

Index kinds for similarity search.

HNSW Hierarchical Navigable Small World which performs an approximate nearest neighbor (ANN) search. Lower accuracy than exhaustive k nearest neighbor, but faster and more efficient.

Flat Does a brute force search to find the nearest neighbors. Calculates the distances between all pairs of data points, so has a linear time complexity, that grows directly proportional to the number of points. Also referred to as exhaustive k nearest neighbor in some databases. High recall accuracy, but slower and more expensive than HNSW. Better with smaller datasets.

IVF Flat Inverted File with Flat Compression. Designed to enhance search efficiency by narrowing the search area through the use of neighbor partitions or clusters. Also referred to as approximate nearest neighbor (ANN) search.

Disk ANN Disk-based Approximate Nearest Neighbor algorithm designed for efficiently searching for approximate nearest neighbors (ANN) in high-dimensional spaces. The primary focus of DiskANN is to handle large-scale datasets that cannot fit entirely into memory, leveraging disk storage to store the data while maintaining fast search times.

Quantized Flat Index that compresses vectors using DiskANN-based quantization methods for better efficiency in the kNN search.

Dynamic Dynamic index allows to automatically switch from FLAT to HNSW indexes.

Functions

create_options

Create search options.

If options are supplied, they are checked for the right type, and the kwargs are used to update the options.

If options are not supplied, they are created from the kwargs. If that fails, an empty options object is returned.

create_options(options_class: type[SearchOptions], options: SearchOptions | None, **kwargs: Any) -> SearchOptions

Parameters

Name Description
options_class
Required

The class of the options.

options
Required

The existing options to update.

**kwargs
Required

The keyword arguments to use to create the options.

Returns

Type Description

The options.

Exceptions

Type Description
ValidationError

If the options are not valid.

default_options_update_function

The default options update function.

This function is used to update the query and options with the kwargs. You can supply your own version of this function to customize the behavior.

default_options_update_function(query: str, options: SearchOptions, parameters: list[KernelParameterMetadata] | None = None, **kwargs: Any) -> tuple[str, SearchOptions]

Parameters

Name Description
query
Required

The query.

options
Required

The options.

parameters

The parameters to use to create the options.

Default value: None
**kwargs
Required

The keyword arguments to use to update the options.

Returns

Type Description

The updated query and options

vectorstoremodel

Returns the class as a vector store model.

This decorator makes a class a vector store model. There are three things being checked:

  • The class must have at least one field with a annotation,

    of type VectorStoreRecordKeyField, VectorStoreRecordDataField or VectorStoreRecordVectorField.

  • The class must have exactly one field with the VectorStoreRecordKeyField annotation.

  • A field with multiple VectorStoreRecordKeyField annotations will be set to the first one found.

Optionally, when there are VectorStoreRecordDataFields that specify a embedding property name, there must be a corresponding VectorStoreRecordVectorField with the same name.

Args: cls: The class to be decorated.

Raises: VectorStoreModelException: If the class does not implement the serialize and deserialize methods. VectorStoreModelException: If there are no fields with a VectorStoreRecordField annotation. VectorStoreModelException: If there are fields with no name. VectorStoreModelException: If there is no key field. VectorStoreModelException: If there is a field with an embedding property name but no corresponding field. VectorStoreModelException: If there is a ndarray field without a serialize or deserialize function.

Note: This function is marked as 'experimental' and may change in the future.

vectorstoremodel(cls: Any | None = None)

Parameters

Name Description
cls
Default value: None