セマンティックカーネル Python ベクターストア移行ガイド

概要

このガイドでは、セマンティックカーネルバージョン 1.34 で導入された主なベクターストアの更新プログラムについて説明します。これは、.NET SDK に合わせてベクターストアの実装を大幅に見直し、より統一された直感的な API を提供することを表しています。この変更により、 semantic_kernel.data.vector 下にあるすべてのものが統合され、コネクタアーキテクチャが向上します。

主な機能強化の概要

統合フィールドモデル: 複数のフィールド型を置き換える単一 VectorStoreField クラス
統合埋め込み: ベクトルフィールド仕様での直接埋め込み生成
簡略化された検索: コレクションに直接検索機能を簡単に作成
統合構造: semantic_kernel.data.vector と semantic_kernel.connectors のもとで全て
拡張テキスト検索: 合理化されたコネクタによるテキスト検索機能の向上
非推奨: 古い memory_stores は、新しいベクターストアアーキテクチャを優先して非推奨になりました

1. 統合埋め込みおよびベクターストアモデル/フィールドの更新

ベクターストアモデルの定義方法には多くの変更がありますが、最大の点は、ベクターストアフィールド定義で直接統合埋め込みをサポートしていることです。つまり、フィールドをベクターとして指定すると、OpenAI のテキスト埋め込みモデルなど、指定した埋め込みジェネレーターを使用してそのフィールドの内容が自動的に埋め込まれます。これにより、ベクターフィールドを作成および管理するプロセスが簡略化されます。

そのフィールドを定義するときは、特に Pydantic モデルを使用する場合に、次の 3 つのことを確認する必要があります。

typing: フィールドは、おそらく3つの種類があり、埋め込みジェネレーターへの入力に使用されるlist[float]、str、またはその他のもの、そしてフィールドが未設定の場合に使用されるNoneがあります。
既定値: getまたはsearchからレコードを取得するときにエラーが発生しないように、フィールドには既定値の None または何かが必要です。include_vectors=False現在の既定値です。

ここには 2 つの懸念事項があります。1 つ目は、vectorstoremodelを使用してクラスを修飾する場合、フィールドの最初の型注釈を使用してVectorStoreField クラスのtype パラメーターを入力するため、最初の型注釈が、多くの場合、list[float]で作成されるベクターストアコレクションの適切な型であることを確認する必要があることです。既定では、 get メソッドと search メソッドは結果にinclude_vectorsされないため、フィールドには既定値が必要であり、入力はそれに対応する必要があるため、多くの場合、 None が許可され、既定値が Noneに設定されます。フィールドが作成されると、埋め込む必要がある値がこのフィールド (多くの場合は文字列) に含まれるため、 str も含める必要があります。この変更の理由は、埋め込まれている内容と実際にデータフィールドに格納される内容の柔軟性を高めるために、これは一般的なセットアップになります。

from semantic_kernel.data.vector import VectorStoreField, vectorstoremodel
from typing import Annotated
from dataclasses import dataclass

@vectorstoremodel
@dataclass
class MyRecord:
    content: Annotated[str, VectorStoreField('data', is_indexed=True, is_full_text_indexed=True)]
    title: Annotated[str, VectorStoreField('data', is_indexed=True, is_full_text_indexed=True)]
    id: Annotated[str, VectorStoreField('key')]
    vector: Annotated[list[float] | str | None, VectorStoreField(
        'vector', 
        dimensions=1536, 
        distance_function="cosine",
        embedding_generator=OpenAITextEmbedding(ai_model_id="text-embedding-3-small"),
    )] = None

    def __post_init__(self):
        if self.vector is None:
            self.vector = f"Title: {self.title}, Content: {self.content}"

post_initメソッドに注目してください。これにより、取得される値が埋め込まれますが、これは単なる1つのフィールド以上のものです。 3 つの型も存在します。

変更前: 別々のフィールドクラス

from semantic_kernel.data import (
    VectorStoreRecordKeyField,
    VectorStoreRecordDataField, 
    VectorStoreRecordVectorField
)

# Old approach with separate field classes
fields = [
    VectorStoreRecordKeyField(name="id"),
    VectorStoreRecordDataField(name="text", is_filterable=True, is_full_text_searchable=True),
    VectorStoreRecordVectorField(name="vector", dimensions=1536, distance_function="cosine")
]

変更後: 統合された埋め込みをによる VectorStoreField の統合

from semantic_kernel.data.vector import VectorStoreField
from semantic_kernel.connectors.ai.open_ai import OpenAITextEmbedding

# New unified approach with integrated embeddings
embedding_service = OpenAITextEmbedding(
    ai_model_id="text-embedding-3-small"
)

fields = [
    VectorStoreField(
        "key",
        name="id",
    ),
    VectorStoreField(
        "data",
        name="text",
        is_indexed=True,  # Previously is_filterable
        is_full_text_indexed=True  # Previously is_full_text_searchable
    ),
    VectorStoreField(
        "vector",
        name="vector",
        dimensions=1536,
        distance_function="cosine",
        embedding_generator=embedding_service  # Integrated embedding generation
    )
]

フィールド定義の主な変更

単一フィールドクラス: VectorStoreField は、以前のすべてのフィールド型を置き換えます
フィールド型の指定: field_type: Literal["key", "data", "vector"] パラメーターを使用します。これは位置指定パラメーターにできます。そのため、 VectorStoreField("key") は有効です。
拡張プロパティ:
- storage_name が追加されました。設定すると、ベクターストアのフィールド名として使用されます。それ以外の場合は、 name パラメーターが使用されます。
- dimensions がベクターフィールドに必要なパラメーターになりました。
- distance_functionindex_kindはどちらも省略可能であり、指定されていない場合はDistanceFunction.DEFAULTとIndexKind.DEFAULTにそれぞれ設定され、ベクターフィールドに対してのみ、各ベクターストアの実装には、そのストアの既定値を選択するロジックがあります。
プロパティの名前の変更:
- property_type 属性として→ type_ し、コンストラクターで type する
- is_filterable → is_indexed
- is_full_text_searchable → is_full_text_indexed
統合埋め込み: embedding_generator をベクターフィールドに直接追加します。または、ベクターストアコレクション自体に embedding_generator を設定することもできます。これは、そのストア内のすべてのベクターフィールドに使用されます。この値は、コレクションレベルの埋め込みジェネレーターよりも優先されます。

2. ストアとコレクションの新しいメソッド

拡張ストアインターフェイス

from semantic_kernel.connectors.in_memory import InMemoryStore

# Before: Limited collection methods
collection = InMemoryStore.get_collection("my_collection", record_type=MyRecord)

# After: Slimmer collection interface with new methods
collection = InMemoryStore.get_collection(MyRecord)
# if the record type has the `vectorstoremodel` decorator it can contain both the collection_name and the definition for the collection.

# New methods for collection management
await store.collection_exists("my_collection")
await store.ensure_collection_deleted("my_collection")
# both of these methods, create a simple model to streamline doing collection management tasks.
# they both call the underlying `VectorStoreCollection` methods, see below.

拡張コレクションインターフェイス

from semantic_kernel.connectors.in_memory import InMemoryCollection

collection = InMemoryCollection(
    record_type=MyRecord,
    embedding_generator=OpenAITextEmbedding(ai_model_id="text-embedding-3-small")  # Optional, if there is no embedding generator set on the record type
)
# If both the collection and the record type have an embedding generator set, the record type's embedding generator will be used for the collection. If neither is set, it is assumed the vector store itself can create embeddings, or that vectors are included in the records already, if that is not the case, it will likely raise.

# Enhanced collection operations
await collection.collection_exists()
await collection.ensure_collection_exists()
await collection.ensure_collection_deleted()

# CRUD methods
# Removed batch operations, all CRUD operations can now take both a single record or a list of records
records = [
    MyRecord(id="1", text="First record"),
    MyRecord(id="2", text="Second record")
]
ids = ["1", "2"]
# this method adds vectors automatically
await collection.upsert(records)

# You can do get with one or more ids, and it will return a list of records
await collection.get(ids)  # Returns a list of records
# you can also do a get without ids, with top, skip and order_by parameters
await collection.get(top=10, skip=0, order_by='id')
# the order_by parameter can be a string or a dict, with the key being the field name and the value being True for ascending or False for descending order.
# At this time, not all vector stores support this method.

# Delete also allows for single or multiple ids
await collection.delete(ids)

query = "search term"
# New search methods, these use the built-in embedding generator to take the value and create a vector
results = await collection.search(query, top=10)
results = await collection.hybrid_search(query, top=10)

# You can also supply a vector directly
query_vector = [0.1, 0.2, 0.3]  # Example vector
results = await collection.search(vector=query_vector, top=10)
results = await collection.hybrid_search(query, vector=query_vector, top=10)

3. 検索用の強化されたフィルター

新しいベクターストアの実装は、文字列ベースの FilterClause オブジェクトから、より強力でタイプセーフなラムダ式または呼び出し可能なフィルターに移行します。

変更前: FilterClause オブジェクト

from semantic_kernel.data.text_search import SearchFilter, EqualTo, AnyTagsEqualTo
from semantic_kernel.data.vector_search import VectorSearchFilter

# Creating filters using FilterClause objects
text_filter = SearchFilter()
text_filter.equal_to("category", "AI")
text_filter.equal_to("status", "active")

# Vector search filters
vector_filter = VectorSearchFilter()
vector_filter.equal_to("category", "AI")
vector_filter.any_tag_equal_to("tags", "important")

# Using in search
results = await collection.search(
    "query text",
    options=VectorSearchOptions(filter=vector_filter)
)

変更後: ラムダ式フィルター

# When defining the collection with the generic type hints, most IDE's will be able to infer the type of the record, so you can use the record type directly in the lambda expressions.
collection = InMemoryCollection[str, MyRecord](MyRecord)

# Using lambda expressions for more powerful and type-safe filtering
# The code snippets below work on a data model with more fields then defined earlier.

# Direct lambda expressions
results = await collection.search(
    "query text", 
    filter=lambda record: record.category == "AI" and record.status == "active"
)

# Complex filtering with multiple conditions
results = await collection.search(
    "query text",
    filter=lambda record: (
        record.category == "AI" and 
        record.score > 0.8 and
        "important" in record.tags
    )
)

# Combining conditions with boolean operators
results = await collection.search(
    "query text",
    filter=lambda record: (
        record.category == "AI" or record.category == "ML"
    ) and record.published_date >= datetime(2024, 1, 1)
)

# Range filtering (now possible with lambda expressions)
results = await collection.search(
    "query text",
    filter=lambda record: 0.5 <= record.confidence_score <= 0.9
)

フィルターの移行に関するヒント

単純な等価性: filter.equal_to("field", "value") になります lambda r: r.field == "value"
複数の条件: 複数のフィルター呼び出しではなく、 and/or 演算子を使用してチェーンする
タグ/配列の包含: filter.any_tag_equal_to("tags", "value") は次のようになります lambda r: "value" in r.tags
強化された機能: 範囲クエリ、複合ブールロジック、カスタム述語のサポート

4. 検索機能の作成容易性の向上

Before: VectorStoreTextSearch を使用した検索機能の作成

from semantic_kernel.connectors.in_memory import InMemoryCollection
from semantic_kernel.data import VectorStoreTextSearch

collection = InMemoryCollection(collection_name='collection', record_type=MyRecord)
search = VectorStoreTextSearch.from_vectorized_search(vectorized_search=collection, embedding_generator=OpenAITextEmbedding(ai_model_id="text-embedding-3-small"))

search_function = search.create_search(
    function_name='search',
    ...
)

後: ダイレクト検索関数の作成

collection = InMemoryCollection(MyRecord)
# Create search function directly on collection
search_function = collection.create_search_function(
    function_name="search",
    search_type="vector",  # or "keyword_hybrid"
    top=10,
    vector_property_name="vector",  # Name of the vector field
)

# Add to kernel directly
kernel.add_function(plugin_name="memory", function=search_function)

5. コネクタの名前変更と変更のインポート

インポートパスの統合

# Before: Scattered imports
from semantic_kernel.connectors.memory.azure_cognitive_search import AzureCognitiveSearchMemoryStore
from semantic_kernel.connectors.memory.chroma import ChromaMemoryStore
from semantic_kernel.connectors.memory.pinecone import PineconeMemoryStore
from semantic_kernel.connectors.memory.qdrant import QdrantMemoryStore

# After: Consolidated under connectors
from semantic_kernel.connectors.azure_ai_search import AzureAISearchStore
from semantic_kernel.connectors.chroma import ChromaVectorStore
from semantic_kernel.connectors.pinecone import PineconeVectorStore
from semantic_kernel.connectors.qdrant import QdrantVectorStore

# Alternative after: Consolidated with lazy loading:
from semantic_kernel.connectors.memory import (
    AzureAISearchStore,
    ChromaVectorStore,
    PineconeVectorStore,
    QdrantVectorStore,
    WeaviateVectorStore,
    RedisVectorStore
)

コネクタクラスの名前変更

古い名前	新しい名前
AzureCosmosDBforMongoDB*	CosmosMongo*
AzureCosmosDBForNoSQL*	CosmosNoSql*

6. テキスト検索の機能強化とBingコネクタの削除

Bing コネクタが削除され、テキスト検索インターフェイスが強化されました

Bing テキスト検索コネクタが削除されました。別の検索プロバイダーに移行する:

# Before: Bing Connector (REMOVED)
from semantic_kernel.connectors.search.bing import BingConnector

bing_search = BingConnector(api_key="your-bing-key")

# After: Use Brave Search or other providers
from semantic_kernel.connectors.brave import BraveSearch
# or
from semantic_kernel.connectors.search import BraveSearch

brave_search = BraveSearch()

# Create text search function
text_search_function = brave_search.create_search_function(
    function_name="web_search",
    query_parameter_name="query",
    description="Search the web for information"
)

kernel.add_function(plugin_name="search", function=text_search_function)

検索方法の改善

Before: 戻り値の型が異なる 3 つの個別の検索メソッド

from semantic_kernel.connectors.brave import BraveSearch
brave_search = BraveSearch()
# Before: Separate search methods
search_results: KernelSearchResult[str] = await brave_search.search(
    query="semantic kernel python",
    top=5,
)

search_results: KernelSearchResult[TextSearchResult] = await brave_search.get_text_search_results(
    query="semantic kernel python",
    top=5,
)

search_results: KernelSearchResult[BraveWebPage] = await brave_search.get_search_results(
    query="semantic kernel python",
    top=5,
)

変更後: 出力型パラメーターを使用した統合検索メソッド

from semantic_kernel.data.text_search import SearchOptions
# Enhanced search results with metadata
search_results: KernelSearchResult[str] = await brave_search.search(
    query="semantic kernel python",
    output_type=str, # can also be TextSearchResult or anything else for search engine specific results, default is `str`
    top=5,
    filter=lambda result: result.country == "NL",  # Example filter
)

async for result in search_results.results:
    assert isinstance(result, str)  # or TextSearchResult if using that type
    print(f"Result: {result}")
    print(f"Metadata: {search_results.metadata}")

7. 古いメモリストアの廃止

MemoryStoreBaseに基づくすべての古いメモリストアは、semantic_kernel.connectors.memory_storesに移動され、現在は非推奨としてマークされています。そのほとんどには、VectorStore と VectorStoreCollection に基づく同等の新しい実装があります。これは、 semantic_kernel.connectors.memoryで見つけることができます。

これらのコネクタは完全に削除されます。

AstraDB
Milvus
Usearch

これらのいずれかが必要な場合は、非推奨のモジュールとsemantic_kernel.memory フォルダーからコードを引き継ぐか、新しいVectorStoreCollection クラスに基づいて独自のベクターストアコレクションを実装してください。

github フィードバックに基づく大量の需要がある場合は、それらを元に戻すことも検討しますが、現時点では維持されず、今後削除される予定です。

SemanticTextMemory からの移行

# Before: SemanticTextMemory (DEPRECATED)
from semantic_kernel.memory import SemanticTextMemory
from semantic_kernel.connectors.ai.open_ai import OpenAITextEmbeddingGenerationService

embedding_service = OpenAITextEmbeddingGenerationService(ai_model_id="text-embedding-3-small")
memory = SemanticTextMemory(storage=vector_store, embeddings_generator=embedding_service)

# Store memory
await memory.save_information(collection="docs", text="Important information", id="doc1")

# Search memory  
results = await memory.search(collection="docs", query="important", limit=5)

# After: Direct Vector Store Usage
from semantic_kernel.data.vector import VectorStoreField, vectorstoremodel
from semantic_kernel.connectors.in_memory import InMemoryCollection

# Define data model
@vectorstoremodel
@dataclass
class MemoryRecord:
    id: Annotated[str, VectorStoreField('key')]
    text: Annotated[str, VectorStoreField('data', is_full_text_indexed=True)]
    embedding: Annotated[list[float] | str | None, VectorStoreField('vector', dimensions=1536, distance_function="cosine", embedding_generator=OpenAITextEmbedding(ai_model_id="text-embedding-3-small"))] = None

# Create vector store with integrated embeddings
collection = InMemoryCollection(
    record_type=MemoryRecord,
    embedding_generator=OpenAITextEmbedding(ai_model_id="text-embedding-3-small")  # Optional, if not set on the record type
)

# Store with automatic embedding generation
record = MemoryRecord(id="doc1", text="Important information", embedding='Important information')
await collection.upsert(record)

# Search with built-in function
search_function = collection.create_search_function(
    function_name="search_docs",
    search_type="vector"
)

メモリプラグインの移行

情報を保存できるプラグインが必要な場合は、次のように簡単に作成できます。

# Before: TextMemoryPlugin (DEPRECATED)
from semantic_kernel.core_plugins import TextMemoryPlugin

memory_plugin = TextMemoryPlugin(memory)
kernel.add_plugin(memory_plugin, "memory")

# After: Custom plugin using vector store search functions
from semantic_kernel.functions import kernel_function

class VectorMemoryPlugin:
    def __init__(self, collection: VectorStoreCollection):
        self.collection = collection
    
    @kernel_function(name="save")
    async def save_memory(self, text: str, key: str) -> str:
        record = MemoryRecord(id=key, text=text, embedding=text)
        await self.collection.upsert(record)
        return f"Saved to {self.collection.collection_name}"
    
    @kernel_function(name="search") 
    async def search_memory(self, query: str, limit: int = 5) -> str:
        results = await self.collection.search(
            query, top=limit, vector_property_name="embedding"
        )        
        return "\n".join([r.record.text async for r in results.results])

# Register the new plugin
memory_plugin = VectorMemoryPlugin(collection)
kernel.add_plugin(memory_plugin, "memory")

ベクター検索の移行チェックリスト

手順 1: インポートを更新する

[ ] メモリストアのインポートをベクターストアの同等のものに置き換える
[ ] 使用するフィールドのインポートを更新する VectorStoreField
[ ] Bingコネクタのインポートを削除する

手順 2: フィールド定義を更新する

[ ] 統合 VectorStoreField クラスに変換する
[ ] プロパティ名の更新 (is_filterable → is_indexed)
[ ] ベクターフィールドに統合埋め込みジェネレーターを追加する

手順 3: コレクションの使用状況を更新する

[ ] メモリ操作をベクターストアメソッドに置き換える
[ ] 必要に応じて新しいバッチ操作を使用する
[ ] 新しい検索機能の作成を実装する

手順 4: 検索の実装を更新する

[ ] 手動の検索機能を次のように置き換える create_search_function
[ ] 新しいプロバイダーを使用するようにテキスト検索を更新する
[ ] 有益なハイブリッド検索を実装する
[ ] フィルター処理のために FilterClause から lambda 式に移行する

手順 5: 非推奨のコードを削除する

[ ] SemanticTextMemory の使用状況を削除する
[ ] TextMemoryPlugin 依存関係を削除する

パフォーマンスと機能の利点

パフォーマンスの向上

バッチ操作: 新しいバッチアップサート/削除メソッドによってスループットが向上する
統合埋め込み: 個別の埋め込み生成手順を排除
最適化された検索: 組み込みの検索機能は、ストアの種類ごとに最適化されます

機能拡張

ハイブリッド検索: ベクター検索とテキスト検索を組み合わせてより良い結果を得る
高度なフィルター処理: 強化されたフィルター式とインデックス作成

開発者エクスペリエンス

簡略化された API: 学習するクラスとメソッドが少なくなります
一貫性のあるインターフェイス: すべてのベクターストアで統一されたアプローチ
より優れたドキュメント: 明確な例と移行パス
将来性: 一貫したクロスプラットフォーム開発のために .NET SDK と連携

結論

上で説明したベクターストアの更新は、セマンティックカーネル Python SDK の大幅な改善を表しています。新しい統合アーキテクチャにより、パフォーマンスの向上、機能の強化、より直感的な開発者エクスペリエンスが提供されます。移行にはインポートの更新と既存のコードのリファクタリングが必要ですが、保守性と機能の利点により、このアップグレードを強くお勧めします。

移行に関するその他のヘルプについては、 samples/concepts/memory/ ディレクトリの更新されたサンプルと包括的な API ドキュメントを参照してください。

Last updated on 2025-07-08