Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Overview
This guide covers the major vector store updates introduced in Semantic Kernel version 1.34, which represents a significant overhaul of the vector store implementation to align with the .NET SDK and provide a more unified, intuitive API. The changes consolidate everything under semantic_kernel.data.vector and improve the connector architecture.
Key Improvements Summary
- Unified Field Model: Single
VectorStoreFieldclass replaces multiple field types - Integrated Embeddings: Direct embedding generation in vector field specifications
- Simplified Search: Easy creation of search functions directly on collections
- Consolidated Structure: Everything under
semantic_kernel.data.vectorandsemantic_kernel.connectors - Enhanced Text Search: Improved text search capabilities with streamlined connectors
- Deprecation: Old
memory_storesare deprecated in favor of the new vector store architecture
1. Integrated Embeddings and Vector Store Models/Fields Updates
There are a number of changes to the way you define your vector store model, the biggest is that we now support integrated embeddings directly in the vector store field definitions. This means that when you specify a field to be a vector, the content of that field is automatically embedded using the specified embedding generator, such as OpenAI's text embedding model. This simplifies the process of creating and managing vector fields.
When you define that field, you need to make sure of three things, especially when using a Pydantic model:
- typing: The field will likely have three types,
list[float],stror something else for the input to the embedding generator, andNonefor when the field is unset. - default value: The field must have a default value of
Noneor something else, so that there is no error when getting records fromgetorsearchwithinclude_vectors=Falsewhich is the default now.
There are two concerns here, the first is that when decorating a class with vectorstoremodel, the first type annotation of the field is used to fill the type parameter of the VectorStoreField class, so you need to make sure that the first type annotation is the right type for the vector store collection to be created with, often list[float]. By default, the get and search methods do not include_vectors in the results, so the field needs a default value, and the typing needs to correspond to that, hence often None is allowed, and the default is set to None. When the field is created, the values that need to be embedded are in this field, often strings, so str also needs to be included. The reason for this change is to allow more flexibility in what is embedded and what is actually stored in data fields, this would be a common setup:
from semantic_kernel.data.vector import VectorStoreField, vectorstoremodel
from typing import Annotated
from dataclasses import dataclass
@vectorstoremodel
@dataclass
class MyRecord:
content: Annotated[str, VectorStoreField('data', is_indexed=True, is_full_text_indexed=True)]
title: Annotated[str, VectorStoreField('data', is_indexed=True, is_full_text_indexed=True)]
id: Annotated[str, VectorStoreField('key')]
vector: Annotated[list[float] | str | None, VectorStoreField(
'vector',
dimensions=1536,
distance_function="cosine",
embedding_generator=OpenAITextEmbedding(ai_model_id="text-embedding-3-small"),
)] = None
def __post_init__(self):
if self.vector is None:
self.vector = f"Title: {self.title}, Content: {self.content}"
Note the post_init method, this creates a value that get's embedded, which is more then a single field. The three types are also present.
Before: Separate Field Classes
from semantic_kernel.data import (
VectorStoreRecordKeyField,
VectorStoreRecordDataField,
VectorStoreRecordVectorField
)
# Old approach with separate field classes
fields = [
VectorStoreRecordKeyField(name="id"),
VectorStoreRecordDataField(name="text", is_filterable=True, is_full_text_searchable=True),
VectorStoreRecordVectorField(name="vector", dimensions=1536, distance_function="cosine")
]
After: Unified VectorStoreField with Integrated Embeddings
from semantic_kernel.data.vector import VectorStoreField
from semantic_kernel.connectors.ai.open_ai import OpenAITextEmbedding
# New unified approach with integrated embeddings
embedding_service = OpenAITextEmbedding(
ai_model_id="text-embedding-3-small"
)
fields = [
VectorStoreField(
"key",
name="id",
),
VectorStoreField(
"data",
name="text",
is_indexed=True, # Previously is_filterable
is_full_text_indexed=True # Previously is_full_text_searchable
),
VectorStoreField(
"vector",
name="vector",
dimensions=1536,
distance_function="cosine",
embedding_generator=embedding_service # Integrated embedding generation
)
]
Key Changes in Field Definition
- Single Field Class:
VectorStoreFieldreplaces all previous field types - Field Type Specification: Use
field_type: Literal["key", "data", "vector"]parameter, this can be a positional parameter, soVectorStoreField("key")is valid. - Enhanced properties:
storage_namehas been added, when set, that is used as the field name in the vector store, otherwise thenameparameter is used.dimensionsis now a required parameter for vector fields.distance_functionandindex_kindare both optional and will be set toDistanceFunction.DEFAULTandIndexKind.DEFAULTrespectively if not specified and only for vector fields, each vector store implementation has logic that chooses a Default for that store.
- Property Renames:
property_type→type_as an attribute andtypein constructorsis_filterable→is_indexedis_full_text_searchable→is_full_text_indexed
- Integrated Embeddings: Add
embedding_generatordirectly to vector fields, alternatively you can set theembedding_generatoron the vector store collection itself, which will be used for all vector fields in that store, this value takes precedence over the collection level embedding generator.
2. New Methods on Stores and Collections
Enhanced Store Interface
from semantic_kernel.connectors.in_memory import InMemoryStore
# Before: Limited collection methods
collection = InMemoryStore.get_collection("my_collection", record_type=MyRecord)
# After: Slimmer collection interface with new methods
collection = InMemoryStore.get_collection(MyRecord)
# if the record type has the `vectorstoremodel` decorator it can contain both the collection_name and the definition for the collection.
# New methods for collection management
await store.collection_exists("my_collection")
await store.ensure_collection_deleted("my_collection")
# both of these methods, create a simple model to streamline doing collection management tasks.
# they both call the underlying `VectorStoreCollection` methods, see below.
Enhanced Collection Interface
from semantic_kernel.connectors.in_memory import InMemoryCollection
collection = InMemoryCollection(
record_type=MyRecord,
embedding_generator=OpenAITextEmbedding(ai_model_id="text-embedding-3-small") # Optional, if there is no embedding generator set on the record type
)
# If both the collection and the record type have an embedding generator set, the record type's embedding generator will be used for the collection. If neither is set, it is assumed the vector store itself can create embeddings, or that vectors are included in the records already, if that is not the case, it will likely raise.
# Enhanced collection operations
await collection.collection_exists()
await collection.ensure_collection_exists()
await collection.ensure_collection_deleted()
# CRUD methods
# Removed batch operations, all CRUD operations can now take both a single record or a list of records
records = [
MyRecord(id="1", text="First record"),
MyRecord(id="2", text="Second record")
]
ids = ["1", "2"]
# this method adds vectors automatically
await collection.upsert(records)
# You can do get with one or more ids, and it will return a list of records
await collection.get(ids) # Returns a list of records
# you can also do a get without ids, with top, skip and order_by parameters
await collection.get(top=10, skip=0, order_by='id')
# the order_by parameter can be a string or a dict, with the key being the field name and the value being True for ascending or False for descending order.
# At this time, not all vector stores support this method.
# Delete also allows for single or multiple ids
await collection.delete(ids)
query = "search term"
# New search methods, these use the built-in embedding generator to take the value and create a vector
results = await collection.search(query, top=10)
results = await collection.hybrid_search(query, top=10)
# You can also supply a vector directly
query_vector = [0.1, 0.2, 0.3] # Example vector
results = await collection.search(vector=query_vector, top=10)
results = await collection.hybrid_search(query, vector=query_vector, top=10)
3. Enhanced Filters for search
The new vector store implementation moves from string-based FilterClause objects to more powerful and type-safe lambda expressions or callable filters.
Before: FilterClause Objects
from semantic_kernel.data.text_search import SearchFilter, EqualTo, AnyTagsEqualTo
from semantic_kernel.data.vector_search import VectorSearchFilter
# Creating filters using FilterClause objects
text_filter = SearchFilter()
text_filter.equal_to("category", "AI")
text_filter.equal_to("status", "active")
# Vector search filters
vector_filter = VectorSearchFilter()
vector_filter.equal_to("category", "AI")
vector_filter.any_tag_equal_to("tags", "important")
# Using in search
results = await collection.search(
"query text",
options=VectorSearchOptions(filter=vector_filter)
)
After: Lambda Expression Filters
# When defining the collection with the generic type hints, most IDE's will be able to infer the type of the record, so you can use the record type directly in the lambda expressions.
collection = InMemoryCollection[str, MyRecord](MyRecord)
# Using lambda expressions for more powerful and type-safe filtering
# The code snippets below work on a data model with more fields then defined earlier.
# Direct lambda expressions
results = await collection.search(
"query text",
filter=lambda record: record.category == "AI" and record.status == "active"
)
# Complex filtering with multiple conditions
results = await collection.search(
"query text",
filter=lambda record: (
record.category == "AI" and
record.score > 0.8 and
"important" in record.tags
)
)
# Combining conditions with boolean operators
results = await collection.search(
"query text",
filter=lambda record: (
record.category == "AI" or record.category == "ML"
) and record.published_date >= datetime(2024, 1, 1)
)
# Range filtering (now possible with lambda expressions)
results = await collection.search(
"query text",
filter=lambda record: 0.5 <= record.confidence_score <= 0.9
)
Migration Tips for Filters
- Simple equality:
filter.equal_to("field", "value")becomeslambda r: r.field == "value" - Multiple conditions: Chain with
and/oroperators instead of multiple filter calls - Tag/array containment:
filter.any_tag_equal_to("tags", "value")becomeslambda r: "value" in r.tags - Enhanced capabilities: Support for range queries, complex boolean logic, and custom predicates
4. Improved Ease of Creating Search Functions
Before: Search Function Creation with VectorStoreTextSearch
from semantic_kernel.connectors.in_memory import InMemoryCollection
from semantic_kernel.data import VectorStoreTextSearch
collection = InMemoryCollection(collection_name='collection', record_type=MyRecord)
search = VectorStoreTextSearch.from_vectorized_search(vectorized_search=collection, embedding_generator=OpenAITextEmbedding(ai_model_id="text-embedding-3-small"))
search_function = search.create_search(
function_name='search',
...
)
After: Direct Search Function Creation
collection = InMemoryCollection(MyRecord)
# Create search function directly on collection
search_function = collection.create_search_function(
function_name="search",
search_type="vector", # or "keyword_hybrid"
top=10,
vector_property_name="vector", # Name of the vector field
)
# Add to kernel directly
kernel.add_function(plugin_name="memory", function=search_function)
5. Connector Renames and Import Changes
Import Path Consolidation
# Before: Scattered imports
from semantic_kernel.connectors.memory.azure_cognitive_search import AzureCognitiveSearchMemoryStore
from semantic_kernel.connectors.memory.chroma import ChromaMemoryStore
from semantic_kernel.connectors.memory.pinecone import PineconeMemoryStore
from semantic_kernel.connectors.memory.qdrant import QdrantMemoryStore
# After: Consolidated under connectors
from semantic_kernel.connectors.azure_ai_search import AzureAISearchStore
from semantic_kernel.connectors.chroma import ChromaVectorStore
from semantic_kernel.connectors.pinecone import PineconeVectorStore
from semantic_kernel.connectors.qdrant import QdrantVectorStore
# Alternative after: Consolidated with lazy loading:
from semantic_kernel.connectors.memory import (
AzureAISearchStore,
ChromaVectorStore,
PineconeVectorStore,
QdrantVectorStore,
WeaviateVectorStore,
RedisVectorStore
)
Connector Class Renames
| Old Name | New Name |
|---|---|
| AzureCosmosDBforMongoDB* | CosmosMongo* |
| AzureCosmosDBForNoSQL* | CosmosNoSql* |
6. Text Search Improvements and Removed Bing Connector
Bing Connector Removed and Enhanced Text Search Interface
The Bing text search connector has been removed. Migrate to alternative search providers:
# Before: Bing Connector (REMOVED)
from semantic_kernel.connectors.search.bing import BingConnector
bing_search = BingConnector(api_key="your-bing-key")
# After: Use Brave Search or other providers
from semantic_kernel.connectors.brave import BraveSearch
# or
from semantic_kernel.connectors.search import BraveSearch
brave_search = BraveSearch()
# Create text search function
text_search_function = brave_search.create_search_function(
function_name="web_search",
query_parameter_name="query",
description="Search the web for information"
)
kernel.add_function(plugin_name="search", function=text_search_function)
Improved Search Methods
Before: Three separate search methods with different return types
from semantic_kernel.connectors.brave import BraveSearch
brave_search = BraveSearch()
# Before: Separate search methods
search_results: KernelSearchResult[str] = await brave_search.search(
query="semantic kernel python",
top=5,
)
search_results: KernelSearchResult[TextSearchResult] = await brave_search.get_text_search_results(
query="semantic kernel python",
top=5,
)
search_results: KernelSearchResult[BraveWebPage] = await brave_search.get_search_results(
query="semantic kernel python",
top=5,
)
After: Unified search method with output type parameter
from semantic_kernel.data.text_search import SearchOptions
# Enhanced search results with metadata
search_results: KernelSearchResult[str] = await brave_search.search(
query="semantic kernel python",
output_type=str, # can also be TextSearchResult or anything else for search engine specific results, default is `str`
top=5,
filter=lambda result: result.country == "NL", # Example filter
)
async for result in search_results.results:
assert isinstance(result, str) # or TextSearchResult if using that type
print(f"Result: {result}")
print(f"Metadata: {search_results.metadata}")
7. Deprecation of Old Memory Stores
All the old memory stores, based on MemoryStoreBase have been moved into semantic_kernel.connectors.memory_stores and are now marked as deprecated. Most of them have a equivalent new implementation based on VectorStore and VectorStoreCollection, which can be found in semantic_kernel.connectors.memory.
These connectors will be removed completely:
AstraDBMilvusUsearch
If you need any of these still, either, make sure to take over the code from the deprecated module and the semantic_kernel.memory folder, or implement your own vector store collection based on the new VectorStoreCollection class.
If there is a large demand based on github feedback, we will consider bringing them back, but for now, they are not maintained and will be removed in the future.
Migration from SemanticTextMemory
# Before: SemanticTextMemory (DEPRECATED)
from semantic_kernel.memory import SemanticTextMemory
from semantic_kernel.connectors.ai.open_ai import OpenAITextEmbeddingGenerationService
embedding_service = OpenAITextEmbeddingGenerationService(ai_model_id="text-embedding-3-small")
memory = SemanticTextMemory(storage=vector_store, embeddings_generator=embedding_service)
# Store memory
await memory.save_information(collection="docs", text="Important information", id="doc1")
# Search memory
results = await memory.search(collection="docs", query="important", limit=5)
# After: Direct Vector Store Usage
from semantic_kernel.data.vector import VectorStoreField, vectorstoremodel
from semantic_kernel.connectors.in_memory import InMemoryCollection
# Define data model
@vectorstoremodel
@dataclass
class MemoryRecord:
id: Annotated[str, VectorStoreField('key')]
text: Annotated[str, VectorStoreField('data', is_full_text_indexed=True)]
embedding: Annotated[list[float] | str | None, VectorStoreField('vector', dimensions=1536, distance_function="cosine", embedding_generator=OpenAITextEmbedding(ai_model_id="text-embedding-3-small"))] = None
# Create vector store with integrated embeddings
collection = InMemoryCollection(
record_type=MemoryRecord,
embedding_generator=OpenAITextEmbedding(ai_model_id="text-embedding-3-small") # Optional, if not set on the record type
)
# Store with automatic embedding generation
record = MemoryRecord(id="doc1", text="Important information", embedding='Important information')
await collection.upsert(record)
# Search with built-in function
search_function = collection.create_search_function(
function_name="search_docs",
search_type="vector"
)
Memory Plugin Migration
When you want to have a plugin that can also save information, then you can easily create that like this:
# Before: TextMemoryPlugin (DEPRECATED)
from semantic_kernel.core_plugins import TextMemoryPlugin
memory_plugin = TextMemoryPlugin(memory)
kernel.add_plugin(memory_plugin, "memory")
# After: Custom plugin using vector store search functions
from semantic_kernel.functions import kernel_function
class VectorMemoryPlugin:
def __init__(self, collection: VectorStoreCollection):
self.collection = collection
@kernel_function(name="save")
async def save_memory(self, text: str, key: str) -> str:
record = MemoryRecord(id=key, text=text, embedding=text)
await self.collection.upsert(record)
return f"Saved to {self.collection.collection_name}"
@kernel_function(name="search")
async def search_memory(self, query: str, limit: int = 5) -> str:
results = await self.collection.search(
query, top=limit, vector_property_name="embedding"
)
return "\n".join([r.record.text async for r in results.results])
# Register the new plugin
memory_plugin = VectorMemoryPlugin(collection)
kernel.add_plugin(memory_plugin, "memory")
Migration Checklist for Vector Search
Step 1: Update Imports
- [ ] Replace memory store imports with vector store equivalents
- [ ] Update field imports to use
VectorStoreField - [ ] Remove Bing connector imports
Step 2: Update Field Definitions
- [ ] Convert to unified
VectorStoreFieldclass - [ ] Update property names (
is_filterable→is_indexed) - [ ] Add integrated embedding generators to vector fields
Step 3: Update Collection Usage
- [ ] Replace memory operations with vector store methods
- [ ] Use new batch operations where applicable
- [ ] Implement new search function creation
Step 4: Update Search Implementation
- [ ] Replace manual search functions with
create_search_function - [ ] Update text search to use new providers
- [ ] Implement hybrid search where beneficial
- [ ] Migrate from
FilterClausetolambdaexpressions for filtering
Step 5: Remove Deprecated Code
- [ ] Remove
SemanticTextMemoryusage - [ ] Remove
TextMemoryPlugindependencies
Performance and Feature Benefits
Performance Improvements
- Batch Operations: New batch upsert/delete methods improve throughput
- Integrated Embeddings: Eliminates separate embedding generation steps
- Optimized Search: Built-in search functions are optimized for each store type
Feature Enhancements
- Hybrid Search: Combines vector and text search for better results
- Advanced Filtering: Enhanced filter expressions and indexing
Developer Experience
- Simplified API: Fewer classes and methods to learn
- Consistent Interface: Unified approach across all vector stores
- Better Documentation: Clear examples and migration paths
- Future-Proof: Aligned with .NET SDK for consistent cross-platform development
Conclusion
The vector store updates discussed above represent a significant improvement in the Semantic Kernel Python SDK. The new unified architecture provides better performance, enhanced features, and a more intuitive developer experience. While migration requires updating imports and refactoring existing code, the benefits in maintainability and functionality make this upgrade highly recommended.
For additional help with migration, refer to the updated samples in the samples/concepts/memory/ directory and the comprehensive API documentation.