Редактиране

Споделяне чрез


Quickstart: Image search using Search Explorer in Azure portal

Important

Image vectors are supported in stable API versions, but the wizard and vectorizers are in public preview under Supplemental Terms of Use. By default, the wizard targets the 2024-05-01-Preview REST API.

Get started with image search using the Import and vectorize data wizard in the Azure portal and use Search explorer to run image-based queries.

You need three Azure resources and some sample image files to complete this walkthrough:

  • Azure Storage to store image files as blobs
  • Azure AI services multiservice account, used for image vectorization and Optical Character Recognition (OCR)
  • Azure AI Search for indexing and queries

Sample data consists of image files in the azure-search-sample-data repo, but you can use different images and still follow this walkthrough.

Prerequisites

  • An Azure subscription. Create one for free.

  • Azure AI services, a multiservice account, in a region that provides Azure AI Vision multimodal embeddings.

    Currently, those regions are: SwedenCentral, EastUS, NorthEurope, WestEurope, WestUS, SoutheastAsia, KoreaCentral, FranceCentral, AustraliaEast, WestUS2, SwitzerlandNorth, JapanEast. Check the documentation for an updated list.

  • Azure AI Search, on any tier, but in the same region as Azure AI services.

    Service tier determines how many blobs you can index. We used the free tier to create this walkthrough and limited the content to 10 JPG files.

  • Azure Storage, a standard performance (general-purpose v2) account. Access tiers can be hot, cool, and cold.

All of the above resources must have public access enabled for the portal nodes to be able to access them. Otherwise, the wizard fails. After the wizard runs, firewalls and private endpoints can be enabled on the different integration components for security.

If private endpoints are already present and can't be disabled, the alternative option is to run the respective end-to-end flow from a script or program from a virtual machine within the same virtual network as the private endpoint. Here's a Python code sample for integrated vectorization. In the same GitHub repo are samples in other programming languages.

A free search service supports role-based access control on connections to Azure AI Search, but it doesn't support managed identities on outbound connections to Azure Storage or Azure AI Vision. This means you must use key-based authentication on free search service connections to other Azure services. For more secure connections, use basic tier or higher and configure a managed identity and role assignments to admit requests from Azure AI Search on other Azure services.

Check for space

If you're starting with the free service, you're limited to three indexes, three data sources, three skillsets, and three indexers. Make sure you have room for extra items before you begin. This quickstart creates one of each object.

Prepare sample data

  1. Download the unsplash-signs image folder to a local folder or find some images of your own. On a free search service, keep the image files under 20 to stay under the free quota for enrichment processing.

  2. Sign in to the Azure portal with your Azure account, and go to your Azure Storage account.

  3. In the navigation pane, under Data Storage, select Containers.

  4. Create a new container and then upload the images.

Start the wizard

If your search service and Azure AI service are located in the same supported region and tenant, and if your Azure Storage blob container is using the default configuration, you're ready to start the wizard.

  1. Sign in to the Azure portal with your Azure account, and go to your Azure AI Search service.

  2. On the Overview page, select Import and vectorize data.

    Screenshot of the wizard command.

Connect to your data

The next step is to connect to a data source that provides the images.

  1. On the Connect to your data tab, select Azure Blob Storage.

  2. Specify the Azure subscription.

  3. For Azure Storage, select the account and container that provides the data. Use the default values for the remaining fields.

    Screenshot of the connect to your data page in the wizard.

  4. Select Next.

Vectorize your text

If raw content includes text, or if the skillset produces text, the wizard calls a text embedding model to generate vectors for that content. In this exercise, text will be produced from the Optical Character Recognition (OCR) skill that you add in the next step.

Azure AI Vision provides text embeddings, so we'll use that resource for text vectorization.

  1. On the Vectorize your text page, select AI Vision vectorization. If it's not selectable, make sure Azure AI Search and your Azure AI multiservice account are together in a region that supports AI Vision multimodal APIs.

    Screenshot of the Vectorize your text page in the wizard.

  2. Select Next.

Vectorize and enrich your images

Use Azure AI Vision to generate a vector representation of the image files.

In this step, you can also apply AI to extract text from images. The wizard uses OCR from Azure AI services to recognize text in image files.

Two more outputs appear in the index when OCR is added to the workflow:

  • First, the "chunk" field is populated with an OCR-generated string of any text found in the image.
  • Second, the "text_vector" field is populated with an embedding that represents the "chunk" string.

The inclusion of plain text in the "chunk" field is useful if you want to use relevance features that operate on strings, such as semantic ranking and scoring profiles.

  1. On the Vectorize your images page, select the Vectorize images checkbox, and then select AI Vision vectorization.

  2. Select Use same AI service selected for text vectorization.

  3. In the enrichment section, select Extract text from images.

  4. Select Use same AI service selected for image vectorization.

    Screenshot of the Vectorize your images page in the wizard.

  5. Select Next.

Advanced settings

  1. Specify a run time schedule for the indexer. We recommend Once for this exercise, but for data sources where the underlying data is volatile, you can schedule indexing to pick up the changes.

    Screenshot of the Advanced settings page in the wizard.

  2. Select Next.

Run the wizard

  1. On Review and create, specify a prefix for the objects created when the wizard runs. The wizard creates multiple objects. A common prefix helps you stay organized.

    Screenshot of the Review and create page in the wizard.

  2. Select Create to run the wizard. This step creates the following objects:

    • An indexer that drives the indexing pipeline.

    • A data source connection to blob storage.

    • An index with vector fields, text fields, vectorizers, vector profiles, vector algorithms. You can't modify the default index during the wizard workflow. Indexes conform to the 2024-05-01-preview REST API.

    • A skillset with the following five skills:

Check results

Search Explorer accepts text, vectors, and images as query inputs. You can drag or select an image into the search area. Search Explorer vectorizes your image and sends the vector as a query input to the search engine. Image vectorization assumes that your index has a vectorizer definition, which Import and vectorize data creates based on your embedding model inputs.

  1. In the Azure portal, under Search Management and Indexes, select the index your created. An embedded Search Explorer is the first tab.

  2. Under View, select Image view.

    Screenshot of the query options button with image view.

  3. Drag an image from the local folder that contains the sample image files. Or, open the file browser to select a local image file.

  4. Select Search to run the query

    Screenshot of search results.

    The top match should be the image you searched for. Because a vector search matches on similar vectors, the search engine returns any document that is sufficiently similar to the query input, up to k-number of results. You can switch to JSON view for more advanced queries that include relevance tuning.

  5. Try other query options to compare search outcomes:

    • Hide vectors for more readable results (recommended).
    • Select a vector field to query over. The default is text vectors, but you can specify the image vector to exclude text vectors from query execution.

Clean up

This demo uses billable Azure resources. If the resources are no longer needed, delete them from your subscription to avoid charges.

Next steps

This quickstart introduced you to the Import and vectorize data wizard that creates all of the objects necessary for image search. If you want to explore each step in detail, try an integrated vectorization sample.