Load data into a search index in Azure AI Search

Článok
10/31/2024

This article explains how to import documents into a predefined search index. In Azure AI Search, a search index is created first with data import following as a second step. The exception is Import wizards in the portal and indexer pipelines, which create and load an index in one workflow.

How data import works

A search service accepts JSON documents that conform to the index schema. A search service imports and indexes plain text and vectors in JSON, used in full text search, vector search, hybrid search, and knowledge mining scenarios.

Plain text content is obtainable from alphanumeric fields in the external data source, metadata that's useful in search scenarios, or enriched content created by a skillset (skills can extract or infer textual descriptions from images and unstructured content).
Vector content is vectorized using an external embedding model or integrated vectorization using Azure AI Search features that integrate with applied AI.

You can prepare these documents yourself, but if content resides in a supported data source, running an indexer or using an Import wizard can automate document retrieval, JSON serialization, and indexing.

Once data is indexed, the physical data structures of the index are locked in. For guidance on what can and can't be changed, see Update and rebuild an index.

Indexing isn't a background process. A search service will balance indexing and query workloads, but if query latency is too high, you can either add capacity or identify periods of low query activity for loading an index.

For more information, see Data import strategies.

Use the Azure portal

In the Azure portal, use the import wizards to create and load indexes in a seamless workflow. If you want to load an existing index, choose an alternative approach.

Sign in to the Azure portal with your Azure account and find your search service.
On the Overview page, select Import data or Import and vectorize data on the command bar to create and populate a search index.

You can follow these links to review the workflow: Quickstart: Create an Azure AI Search index and Quickstart: Integrated vectorization.
After the wizard is finished, use Search Explorer to check for results.

Tip

The import wizards create and run indexers. If indexers are already defined, you can reset and run an indexer from the Azure portal, which is useful if you're adding fields incrementally. Reset forces the indexer to start over, picking up all fields from all source documents.

Use the REST APIs

Documents - Index is the REST API for importing data into a search index. REST APIs are useful for initial proof-of-concept testing, where you can test indexing workflows without having to write much code. The @search.action parameter determines whether documents are added in full, or partially in terms of new or replacement values for specific fields.

Quickstart: Text search using REST explains the steps. The following example is a modified version of the example. It's been trimmed for brevity and the first HotelId value has been altered to avoid overwriting an existing document.

Formulate a POST call specifying the index name, the "docs/index" endpoint, and a request body that includes the @search.action parameter.

POST https://[service name].search.windows.net/indexes/hotels-sample-index/docs/index?api-version=2024-07-01
Content-Type: application/json   
api-key: [admin key] 
{
    "value": [
    {
    "@search.action": "upload",
    "HotelId": "1111",
    "HotelName": "Stay-Kay City Hotel",
    "Description": "The hotel is ideally located on the main commercial artery of the city in the heart of New York. A few minutes away is Time's Square and the historic centre of the city, as well as other places of interest that make New York one of America's most attractive and cosmopolitan cities.",
    "Category": "Boutique",
    "Tags": [ "pool", "air conditioning", "concierge" ]
    },
    {
    "@search.action": "mergeOrUpload",
    "HotelId": "2",
    "HotelName": "Old Century Hotel",
    "Description": "This is description is replacing the original one for this hotel. New and changed values overwrite the previous ones. In a comma-delimited list like Tags, be sure to provide the full list because there is no merging of values within the field itself.",
    "Category": "Boutique",
    "Tags": [ "pool", "free wifi", "concierge", "my first new tag", "my second new tag" ]
    }
  ]
}

Set the @search.action parameter to upload to create or overwrite a document. Set it to merge or uploadOrMerge if you're targeting updates to specific fields within the document. The previous example shows both actions.

Action	Effect
merge	Updates a document that already exists, and fails a document that can't be found. Merge replaces existing values. For this reason, be sure to check for collection fields that contain multiple values, such as fields of type `Collection(Edm.String)`. For example, if a `tags` field starts with a value of `["budget"]` and you execute a merge with `["economy", "pool"]`, the final value of the `tags` field is `["economy", "pool"]`. It won't be `["budget", "economy", "pool"]`.
mergeOrUpload	Behaves like merge if the document exists, and upload if the document is new. This is the most common action for incremental updates.
upload	Similar to an "upsert" where the document is inserted if it's new, and updated or replaced if it exists. If the document is missing values that the index requires, the document field's value is set to null.

Send the request.

Look up the documents you just added as a validation step:

GET https://[service name].search.windows.net/indexes/hotel-sample-index/docs/1111?api-version=2024-07-01

When the document key or ID is new, null becomes the value for any field that is unspecified in the document. For actions on an existing document, updated values replace the previous values. Any fields that weren't specified in a "merge" or "mergeUpload" are left intact in the search index.

Use the Azure SDKs

Programmability is provided in the following Azure SDKs.

The Azure SDK for .NET provides the following APIs for simple and bulk document uploads into an index:

There are several samples that illustrate indexing in context of simple and large-scale indexing:

"Load an index" explains basic steps.
Azure.Search.Documents Samples - Indexing Documents from the Azure SDK team adds SearchIndexingBufferedSender.
Tutorial: Index any data couples batch indexing with testing strategies for determining an optimum size.
Be sure to check the azure-search-vector-samples repo for code examples showing how to index vector fields.

Zdieľať cez

Load data into a search index in Azure AI Search

How data import works

Use the Azure portal

Use the REST APIs

Use the Azure SDKs

See also

Pripomienky

Ďalšie zdroje informácií