將擴充輸出對應至 Azure AI 搜尋服務中搜尋索引的欄位

發行項
09/03/2024

本文說明如何設定輸出欄位對應，以定義在技能集處理期間產生的記憶體內部資料之間的資料路徑，以及搜尋索引中的目標欄位。在索引子執行期間，技能產生的資訊只會存在於記憶體中。若要在搜尋索引中保存此資訊，您必須告訴索引子傳送資料的位置。

輸出欄位對應是在索引子中定義，且具有下列元素：

"outputFieldMappings": [
  {
    "sourceFieldName": "document/path-to-a-node-in-an-enriched-document",
    "targetFieldName": "some-search-field-in-an-index",
    "mappingFunction": null
  }
],

相較於會對應逐字來源欄位與索引欄位之間路徑的 fieldMappings 定義，outputFieldMappings 定義會將記憶體內部擴充對應至搜尋索引中的欄位。

必要條件

索引子、索引、資料來源和技能集。
索引欄位必須是簡單欄位或最上層欄位。您無法輸出至複雜類型，但如果您有複雜類型，則可以使用輸出欄位定義來壓平合併複雜類型的各個組件，並將其傳送至搜尋索引中的集合。

使用輸出欄位對應的時機

如果您的索引子具有可在索引中建立您所需新資訊的附加技能集，則需要輸出欄位對應。範例包含：

來自內嵌技能的向量
來自影像技能的 OCR 文字
來自實體辨識技能的位置、組織或人員

輸出欄位對應也可用來：

針對所產生的內容建立多個複本 (一對多輸出欄位對應)。
壓平合併來源文件的複雜類型。例如，假設來源文件具有複雜類型 (例如，多部分地址)，但您只想要城市。您可以使用輸出欄位對應來壓平合併巢狀資料結構，然後使用輸出欄位對應將輸出傳送至搜尋索引中的字串集合。

輸出欄位對應僅適用於搜尋索引。如果您要填入知識存放區，請使用資料路徑組態的投影。

定義輸出欄位對應

輸出欄位對應會新增至索引子定義中的 outputFieldMappings 陣列，通常放在 fieldMappings 陣列之後。輸出欄位對應包含三個部分。

您可以使用 REST API 或 Azure SDK 來定義輸出欄位對應。

提示

匯入資料精靈所建立的索引子包含精靈所產生的輸出欄位對應。如果您需要範例，請對資料來源執行精靈，以查看索引子中的輸出欄位對應。

REST API
.NET SDK (C#)

在 Azure SDK 中使用建立索引子或建立或更新索引子或對等的方法。以下是索引子定義的範例。

{
   "name": "myindexer",
   "description": null,
   "dataSourceName": "mydatasource",
   "targetIndexName": "myindex",
   "schedule": { },
   "parameters": { },
   "fieldMappings": [],
   "outputFieldMappings": [],
   "disabled": false,
   "encryptionKey": { }
 }

填寫 outputFieldMappings 陣列以指定對應。欄位對應包含三個部分。

"outputFieldMappings": [
  {
    "sourceFieldName": "/document/path-to-a-node-in-an-enriched-document",
    "targetFieldName": "some-search-field-in-an-index",
    "mappingFunction": null
  }
]

屬性	說明
sourceFieldName	必要。指定擴充內容的路徑。範例可能是 `/document/content`。如需路徑語法和範例，請參閱 Azure AI 搜尋服務技能集中的參考擴充。
targetFieldName	選擇性。指定接收擴充內容的搜尋欄位。目標欄位必須是最上層的簡單欄位或集合。不能是複雜類型中子欄位的路徑。如果您想要擷取複雜結構中的特定節點，您可以壓平合併記憶體中的個別節點，然後將輸出傳送至索引中的字串集合。
mappingFunction	選擇性。新增索引子所支援對應函式提供的額外處理。針對擴充節點，編碼和解碼是最常使用的函式。

targetFieldName 一律是搜尋索引中的欄位名稱。

sourceFieldName 是擴充的文件中節點的路徑。其為技能的輸出。路徑一律會以 /document 開頭，而且如果您從 Blob 編製索引，路徑的第二個元素是 /content。第三個元素是技能所產生的值。如需詳細資訊和範例，請參閱 Azure AI 搜尋服務技能集內的參考擴充。

本範例會將從 Blob 內容屬性擷取的實體和情感標籤新增至搜尋索引中的欄位。

{
    "name": "myIndexer",
    "dataSourceName": "myDataSource",
    "targetIndexName": "myIndex",
    "skillsetName": "myFirstSkillSet",
    "fieldMappings": [],
    "outputFieldMappings": [
        {
            "sourceFieldName": "/document/content/organizations/*/description",
            "targetFieldName": "descriptions",
            "mappingFunction": {
                "name": "base64Decode"
            }
        },
        {
            "sourceFieldName": "/document/content/organizations",
            "targetFieldName": "orgNames"
        },
        {
            "sourceFieldName": "/document/content/sentiment",
            "targetFieldName": "sentiment"
        }
    ]
}

在將欄位內容儲存在索引之前，指派轉換欄位內容所需的任何對應函式。針對擴充節點，編碼和解碼是最常使用的函式。

在適用於 .NET 的 Azure SDK 中，使用 OutputFieldMappingEntry 類別來提供「Name」和「TargetFieldName」屬性，以及選擇性的「MappingFunction」參考。

在建構索引子時指定輸出欄位對應，或是稍後藉由直接設定 SearchIndexer.OutputFieldMappings 來完成。下列 C# 範例會在建構索引子時設定輸出欄位對應。

string indexerName = "cog-search-demo";
SearchIndexer indexer = new SearchIndexer(
    indexerName,
    dataSourceConnectionName,
    indexName)
{
    // Field mappings omitted for this example (assume default mappings)
    OutputFieldMappings =
    {
        new FieldMapping("/document/content/organizations") { TargetFieldName = "orgNames" },
        new FieldMapping("/document/content/sentiment") { TargetFieldName = "sentiment" }
    },
    SkillsetName = skillsetName
};

await indexerClient.CreateIndexerAsync(indexer);

一對多輸出欄位對應

您可以使用輸出欄位對應，將單一來源欄位路由傳送至搜尋索引中的多個欄位。若要進行比較測試，或想要具有不同屬性的欄位，便可能會執行此動作。

假設有會產生向量欄位內嵌的技能集，以及有具有多個會隨演算法和壓縮設定而異之向量欄位的索引。在索引子內，將內嵌技能的輸出對應至搜尋索引中多個向量欄位的每一個。

"outputFieldMappings": [
    { "sourceFieldName" : "/document/content/text_vector", "targetFieldName" : "vector_hnsw" }, 
    { "sourceFieldName" : "/document/content/text_vector", "targetFieldName" : "vector_eknn" },
    { "sourceFieldName" : "/document/content/text_vector", "targetFieldName" : "vector_narrow" }, 
    { "sourceFieldName" : "/document/content/text_vector", "targetFieldName" : "vector_no_stored" },
    { "sourceFieldName" : "/document/content/text_vector", "targetFieldName" : "vector_scalar" }       
  ]

來源欄位路徑是技能輸出。在此範例中，輸出是 text_vector。目標名稱是選用屬性。如果您未向輸出對應提供目標名稱，路徑會是 embedding，或者，更精確地說，會是 /document/content/embedding。

{
  "name": "test-vector-size-ss",  
  "description": "Generate embeddings using AOAI",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "name": "#1",
      "description": null,
      "context": "/document/content",
      "resourceUri": "https://my-demo-eastus.openai.azure.com",
      "apiKey": null,
      "deploymentId": "text-embedding-ada-002",
      "dimensions": 1536,
      "modelName": "text-embedding-ada-002",
      "inputs": [
        {
          "name": "text",
          "source": "/document/content"
        }
      ],
      "outputs": [
        {
          "name": "embedding",
          "targetName": "text_vector"
        }
      ],
      "authIdentity": null
    }
  ]
}

將複雜結構壓平合併成字串集合

如果您的來源資料是由巢狀或階層式 JSON 所組成，則您無法使用欄位對應來設定資料路徑。相反地，您的搜尋索引必須鏡像每個層級的源資料結構，才能進行完整匯入。

本節會逐步引導您完成匯入程序，在來源和目標端產生複雜文件的一對一反映。接下來，會使用相同的來源檔案來說明將個別節點擷取和壓平合併成字串集合。

以下是 Azure Cosmos DB 中具有巢狀 JSON 的文件範例：

{
   "palette":"primary colors",
   "colors":[
      {
         "name":"blue",
         "medium":[
            "acrylic",
            "oil",
            "pastel"
         ]
      },
      {
         "name":"red",
         "medium":[
            "acrylic",
            "pastel",
            "watercolor"
         ]
      },
      {
         "name":"yellow",
         "medium":[
            "acrylic",
            "watercolor"
         ]
      }
   ]
}

如果您想要針對上述來源文件完整編製索引，您要建立索引定義，其中欄位名稱、層級和類型會反映為複雜類型。由於搜尋索引中的複雜類型不支援欄位對應，因此您的索引定義必須鏡像來源文件。

{
  "name": "my-test-index",
  "defaultScoringProfile": "",
  "fields": [
    { "name": "id", "type": "Edm.String", "searchable": false, "retrievable": true, "key": true},
    { "name": "palette", "type": "Edm.String", "searchable": true, "retrievable": true },
    { "name": "colors", "type": "Collection(Edm.ComplexType)",
      "fields": [
        {
          "name": "name",
          "type": "Edm.String",
          "searchable": true,
          "retrievable": true
        },
        {
          "name": "medium",
          "type": "Collection(Edm.String)",
          "searchable": true,
          "retrievable": true,
        }
      ]
    }
  ]
}

以下是執行匯入的範例索引子定義 (請注意沒有欄位對應且沒有技能集)。

{
  "name": "my-test-indexer",
  "dataSourceName": "my-test-ds",
  "skillsetName": null,
  "targetIndexName": "my-test-index",

  "fieldMappings": [],
  "outputFieldMappings": []
}

結果是下列範例搜尋文件，類似於 Azure Cosmos DB 中的原始文件。

{
  "value": [
    {
      "@search.score": 1,
      "id": "240a98f5-90c9-406b-a8c8-f50ff86f116c",
      "palette": "primary colors",
      "colors": [
        {
          "name": "blue",
          "medium": [
            "acrylic",
            "oil",
            "pastel"
          ]
        },
        {
          "name": "red",
          "medium": [
            "acrylic",
            "pastel",
            "watercolor"
          ]
        },
        {
          "name": "yellow",
          "medium": [
            "acrylic",
            "watercolor"
          ]
        }
      ]
    }
  ]
}

搜尋索引中的替代轉譯是將來源巢狀結構中的個別節點壓平合併為搜尋索引中的字串集合。

若要完成這項工作，您需要 outputFieldMappings，將記憶體內部節點對應至索引中的字串集合。雖然輸出欄位對應主要適用於技能輸出，但是您也可以在索引子開啟來源文件並將其讀取到記憶體的「文件萃取」之後，將其用來定址節點。

以下是範例索引定義，使用字串集合來接收壓平合併輸出：

{
  "name": "my-new-flattened-index",
  "defaultScoringProfile": "",
  "fields": [
    { "name": "id", "type": "Edm.String", "searchable": false, "retrievable": true, "key": true },
    { "name": "palette", "type": "Edm.String", "searchable": true, "retrievable": true },
    { "name": "color_names", "type": "Collection(Edm.String)", "searchable": true, "retrievable": true },
    { "name": "color_mediums", "type": "Collection(Edm.String)", "searchable": true, "retrievable": true}
  ]
}

以下是範例索引子定義，使用 outputFieldMappings 讓巢狀 JSON 與字串集合欄位產生關聯。請注意，來源欄位會使用擴充節點的路徑語法，即使沒有技能集也一樣。擴充文件會在文件萃取期間於系統中建立，這表示只要文件萃取時存在這些節點，您就可以存取每個文件樹狀結構中的節點。

{
  "name": "my-test-indexer",
  "dataSourceName": "my-test-ds",
  "skillsetName": null,
  "targetIndexName": "my-new-flattened-index",
  "parameters": {  },
  "fieldMappings": [   ],
  "outputFieldMappings": [
    {
       "sourceFieldName": "/document/colors/*/name",
       "targetFieldName": "color_names"
    },
    {
       "sourceFieldName": "/document/colors/*/medium",
       "targetFieldName": "color_mediums"
    }
  ]
}

上述定義的結果如下所示。在此情況下，簡化結構會遺失內容。指定色彩與其可用的媒體之間不再有任何關聯。不過，視您的案例而定，如下所示的結果可能是您需要的結果。

{
  "value": [
    {
      "@search.score": 1,
      "id": "240a98f5-90c9-406b-a8c8-f50ff86f116c",
      "palette": "primary colors",
      "color_names": [
        "blue",
        "red",
        "yellow"
      ],
      "color_mediums": [
        "[\"acrylic\",\"oil\",\"pastel\"]",
        "[\"acrylic\",\"pastel\",\"watercolor\"]",
        "[\"acrylic\",\"watercolor\"]"
      ]
    }
  ]
}

共用方式為

將擴充輸出對應至 Azure AI 搜尋服務中搜尋索引的欄位

必要條件

使用輸出欄位對應的時機

定義輸出欄位對應

一對多輸出欄位對應

將複雜結構壓平合併成字串集合

另請參閱

意見反應

其他資源