使用 Java 編製 Azure Cosmos DB for NoSQL 向量資料索引並查詢這些向量資料

適用於: ✅ NoSQL

本文說明如何建立向量資料、索引資料，然後查詢容器中的資料。

使用向量索引和搜尋之前，您必須先在適用於 NoSQL 的 Azure Cosmos DB 中啟用向量搜尋。設定向量搜尋的 Azure Cosmos DB 容器之後，您可以建立向量內嵌原則。接下來，您將向量索引新增至容器索引政策。然後，您可以建立具有向量索引和向量內嵌政策的容器。最後，您對儲存的資料執行向量搜尋。

先決條件

現有的 Azure Cosmos DB for NoSQL 帳戶。
- 若還沒有 Azure 訂閱，您可以建立免費帳戶。
- 如果您有現有的 Azure 訂用帳戶，請建立新的 Azure Cosmos DB for NoSQL 帳戶。
最新版本的 Azure Cosmos DB Java SDK。

啟用此功能

若要啟用適用於 NoSQL 的 Azure Cosmos DB 向量搜尋，請遵循下列步驟：

移至適用於 NoSQL 的 Azure Cosmos DB 資源頁面。
在左窗格的 [設定] 底下，選取 [功能]。
選取 NoSQL API 的向量搜尋。
閱讀該功能的說明以確認您要啟用它。
選取 [啟用] 以在適用於 NoSQL 的 Azure Cosmos DB 中開啟向量搜尋。

秘訣

或者，使用 Azure CLI 來更新帳戶的功能，以支援適用於 NoSQL 的 Azure Cosmos DB 向量搜尋。

az cosmosdb update \
     --resource-group <resource-group-name> \
     --name <account-name> \
     --capabilities EnableNoSQLVectorSearch

註冊要求會自動核准，但可能需要 15 分鐘才能生效。

了解向量搜尋所涉及的步驟

下列步驟假設您知道如何設定適用於 NoSQL 的 Azure Cosmos DB 帳戶並建立資料庫。現有容器目前不支援向量搜尋功能。您需要建立一個新容器。建立容器時，您可以指定容器層級向量內嵌政策和向量索引政策。

讓我們舉個例子，說明如何為基於互聯網的書店創建數據庫。您想要儲存每本書的標題、作者、ISBN 和描述資訊。您也需要定義下列兩個屬性，以包含向量內嵌：

此 contentVector 屬性包含從書籍的文字內容產生的文字內嵌。例如，在建立內嵌之前，您可以串連 title、 author、 isbn和 description 內容。
該 coverImageVector 屬性是根據書籍封面的圖像生成的。

若要執行向量搜尋，請：

針對您要執行向量搜尋的欄位建立和儲存向量內嵌。
指定向量內嵌原則中的向量內嵌路徑。
在容器的索引政策中包含您想要的任何向量索引。

對於本文的後續章節，請參考容器中的儲存項目的下列結構：

{
  "title": "book-title", 
  "author": "book-author", 
  "isbn": "book-isbn", 
  "description": "book-description", 
  "contentVector": [2, -1, 4, 3, 5, -2, 5, -7, 3, 1], 
  "coverImageVector": [0.33, -0.52, 0.45, -0.67, 0.89, -0.34, 0.86, -0.78] 
}

首先，建立 CosmosContainerProperties 物件。

CosmosContainerProperties collectionDefinition = new CosmosContainerProperties(UUID.randomUUID().toString(), "Partition_Key_Def");

為您的容器建立向量內嵌政策

現在您需要定義容器向量政策。此原則提供的資訊，可通知 Azure Cosmos DB 查詢引擎如何處理系統函式中的 VectorDistance 向量屬性。如果您選擇指定向量索引政策，此政策也會提供必要資訊給向量索引政策。

下列資訊包含在容器向量原則中：

參數	Description
`path`	包含向量的屬性路徑。
`datatype`	向量元素的類型。預設值為 `Float32`。
`dimensions`	路徑中每個向量的長度。預設值為 `1536`。
`distanceFunction`	用來計算距離/相似性的計量。預設值為 `Cosine`。

對於具有書籍詳細資訊的範例，向量政策可能類似下列範例：

// Creating vector embedding policy
CosmosVectorEmbeddingPolicy cosmosVectorEmbeddingPolicy = new CosmosVectorEmbeddingPolicy();

CosmosVectorEmbedding embedding1 = new CosmosVectorEmbedding();
embedding1.setPath("/coverImageVector");
embedding1.setDataType(CosmosVectorDataType.FLOAT32);
embedding1.setDimensions(8L);
embedding1.setDistanceFunction(CosmosVectorDistanceFunction.COSINE);

CosmosVectorEmbedding embedding2 = new CosmosVectorEmbedding();
embedding2.setPath("/contentVector");
embedding2.setDataType(CosmosVectorDataType.FLOAT32);
embedding2.setDimensions(10L);
embedding2.setDistanceFunction(CosmosVectorDistanceFunction.DOT_PRODUCT);

cosmosVectorEmbeddingPolicy.setCosmosVectorEmbeddings(Arrays.asList(embedding1, embedding2, embedding3));

collectionDefinition.setVectorEmbeddingPolicy(cosmosVectorEmbeddingPolicy);

在索引原則中建立向量索引

決定向量內嵌路徑之後，您必須將向量索引新增至索引原則。目前，只有新容器支援適用於 NoSQL 的 Azure Cosmos DB 向量搜尋功能。當您建立容器時，您會套用向量政策。您稍後無法修改原則。索引原則看起來類似下列範例：

IndexingPolicy indexingPolicy = new IndexingPolicy();
indexingPolicy.setIndexingMode(IndexingMode.CONSISTENT);
ExcludedPath excludedPath1 = new ExcludedPath("/coverImageVector/*");
ExcludedPath excludedPath2 = new ExcludedPath("/contentVector/*");
indexingPolicy.setExcludedPaths(ImmutableList.of(excludedPath1, excludedPath2));

IncludedPath includedPath1 = new IncludedPath("/*");
indexingPolicy.setIncludedPaths(Collections.singletonList(includedPath1));

// Creating vector indexes
CosmosVectorIndexSpec cosmosVectorIndexSpec1 = new CosmosVectorIndexSpec();
cosmosVectorIndexSpec1.setPath("/coverImageVector");
cosmosVectorIndexSpec1.setType(CosmosVectorIndexType.QUANTIZED_FLAT.toString());

CosmosVectorIndexSpec cosmosVectorIndexSpec2 = new CosmosVectorIndexSpec();
cosmosVectorIndexSpec2.setPath("/contentVector");
cosmosVectorIndexSpec2.setType(CosmosVectorIndexType.DISK_ANN.toString());

indexingPolicy.setVectorIndexes(Arrays.asList(cosmosVectorIndexSpec1, cosmosVectorIndexSpec2, cosmosVectorIndexSpec3));

collectionDefinition.setIndexingPolicy(indexingPolicy);

最後，使用容器索引原則和向量索引原則建立容器。

database.createContainer(collectionDefinition).block();

重要事項

向量路徑將被新增至索引原則的excludedPaths區段，以確保插入時的最佳化效能。若未新增向量路徑至 excludedPaths，會導致向量插入的要求單位費用和延遲較高。

執行向量相似性搜尋查詢

使用您想要的向量政策建立容器並將向量資料插入容器之後，請在查詢中使用 VectorDistance 系統函數來執行向量搜尋。

假設您想通過查看描述來搜索有關食物食譜的書籍。您必須先取得查詢文字的內嵌。在此情況下，您可能想要產生查詢文字 food recipe的內嵌。取得搜尋查詢的內嵌之後，您可以在向量搜尋查詢的函式中 VectorDistance 使用它，以取得與您的查詢類似的所有項目：

SELECT TOP 10 c.title, VectorDistance(c.contentVector, [1,2,3,4,5,6,7,8,9,10]) AS SimilarityScore   
FROM c  
ORDER BY VectorDistance(c.contentVector, [1,2,3,4,5,6,7,8,9,10])

此查詢會擷取書籍標題，以及與您查詢相關的相似度分數。以下是 Java:

float[] embedding = new float[10];
for (int i = 0; i < 10; i++) {
    array[i] = i + 1;
}
ArrayList<SqlParameter> paramList = new ArrayList<SqlParameter>();
  paramList.add(new SqlParameter("@embedding", embedding));
  SqlQuerySpec querySpec = new SqlQuerySpec("SELECT c.title, VectorDistance(c.contentVector,@embedding) AS SimilarityScore  FROM c ORDER BY VectorDistance(c.contentVector,@embedding)", paramList);
  CosmosPagedIterable<Family> filteredFamilies = container.queryItems(querySpec, new CosmosQueryRequestOptions(), Family.class);

  if (filteredFamilies.iterator().hasNext()) {
      Family family = filteredFamilies.iterator().next();
      logger.info(String.format("First query result: Family with (/id, partition key) = (%s,%s)",family.getId(),family.getLastName()));
  }

意見反應

此頁面對您有幫助嗎？

Last updated on 2025-10-20