Azure Cognitive Search client library for Java - version 11.6.2

This is the Java client library for Azure Cognitive Search. Azure Cognitive Search service is a search-as-a-service cloud solution that gives developers APIs and tools for adding a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.

The Azure Cognitive Search service is well suited for the following application scenarios:

  • Consolidate varied content types into a single searchable index. To populate an index, you can push JSON documents that contain your content, or if your data is already in Azure, create an indexer to pull in data automatically.

  • Attach skillsets to an indexer to create searchable content from images and large text documents. A skillset leverages AI from Cognitive Services for built-in OCR, entity recognition, key phrase extraction, language detection, text translation, and sentiment analysis. You can also add custom skills to integrate external processing of your content during data ingestion.

  • In a search client application, implement query logic and user experiences similar to commercial web search engines.

Use the Azure Cognitive Search client library to:

  • Submit queries for simple and advanced query forms that include fuzzy search, wildcard search, regular expressions.
  • Implement filtered queries for faceted navigation, geospatial search, or to narrow results based on filter criteria.
  • Create and manage search indexes.
  • Upload and update documents in the search index.
  • Create and manage indexers that pull data from Azure into an index.
  • Create and manage skillsets that add AI enrichment to data ingestion.
  • Create and manage analyzers for advanced text analysis or multi-lingual content.
  • Optimize results through scoring profiles to factor in business logic or freshness.

Source code | Package (Maven) | API reference documentation| Product documentation | Samples

Getting started

Include the package

Include the BOM file

Please include the azure-sdk-bom to your project to take dependency on the General Availability (GA) version of the library. In the following snippet, replace the {bom_version_to_target} placeholder with the version number. To learn more about the BOM, see the AZURE SDK BOM README.

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>com.azure</groupId>
            <artifactId>azure-sdk-bom</artifactId>
            <version>{bom_version_to_target}</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

and then include the direct dependency in the dependencies section without the version tag.

<dependencies>
  <dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-search-documents</artifactId>
  </dependency>
</dependencies>

Include direct dependency

If you want to take dependency on a particular version of the library that is not present in the BOM, add the direct dependency to your project as follows.

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-search-documents</artifactId>
    <version>11.6.2</version>
</dependency>

Prerequisites

az search service create --name <mysearch> --resource-group <mysearch-rg> --sku free --location westus

See choosing a pricing tier for more information about available options.

Authenticate the client

To interact with the Search service, you'll need to create an instance of the appropriate client class: SearchClient for searching indexed documents, SearchIndexClient for managing indexes, or SearchIndexerClient for crawling data sources and loading search documents into an index. To instantiate a client object, you'll need an endpoint and API key. You can refer to the documentation for more information on supported authenticating approaches with the Search service.

Get an API Key

You can get the endpoint and an API key from the Search service in the Azure Portal. Please refer the documentation for instructions on how to get an API key.

Alternatively, you can use the following Azure CLI command to retrieve the API key from the Search service:

az search admin-key show --service-name <mysearch> --resource-group <mysearch-rg>

Note:

  • The example Azure CLI snippet above retrieves an admin key. This allows for easier access when exploring APIs, but it should be managed carefully.
  • There are two types of keys used to access your search service: admin (read-write) and query (read-only) keys. Restricting access and operations in client apps is essential to safeguarding the search assets on your service. Always use a query key rather than an admin key for any query originating from a client app.

The SDK provides three clients.

  • SearchIndexClient for CRUD operations on indexes and synonym maps.
  • SearchIndexerClient for CRUD operations on indexers, data sources, and skillsets.
  • SearchClient for all document operations.

Create a SearchIndexClient

To create a SearchIndexClient/SearchIndexAsyncClient, you will need the values of the Azure Cognitive Search service URL endpoint and admin key.

SearchIndexClient searchIndexClient = new SearchIndexClientBuilder()
    .endpoint(ENDPOINT)
    .credential(new AzureKeyCredential(API_KEY))
    .buildClient();

or

SearchIndexAsyncClient searchIndexAsyncClient = new SearchIndexClientBuilder()
    .endpoint(ENDPOINT)
    .credential(new AzureKeyCredential(API_KEY))
    .buildAsyncClient();

Create a SearchIndexerClient

To create a SearchIndexerClient/SearchIndexerAsyncClient, you will need the values of the Azure Cognitive Search service URL endpoint and admin key.

SearchIndexerClient searchIndexerClient = new SearchIndexerClientBuilder()
    .endpoint(ENDPOINT)
    .credential(new AzureKeyCredential(API_KEY))
    .buildClient();

or

SearchIndexerAsyncClient searchIndexerAsyncClient = new SearchIndexerClientBuilder()
    .endpoint(ENDPOINT)
    .credential(new AzureKeyCredential(API_KEY))
    .buildAsyncClient();

Create a SearchClient

Once you have the values of the Azure Cognitive Search service URL endpoint and admin key, you can create the SearchClient/SearchAsyncClient with an existing index name:

SearchClient searchClient = new SearchClientBuilder()
    .endpoint(ENDPOINT)
    .credential(new AzureKeyCredential(ADMIN_KEY))
    .indexName(INDEX_NAME)
    .buildClient();

or

SearchAsyncClient searchAsyncClient = new SearchClientBuilder()
    .endpoint(ENDPOINT)
    .credential(new AzureKeyCredential(ADMIN_KEY))
    .indexName(INDEX_NAME)
    .buildAsyncClient();

Create a client using Azure Active Directory authentication

You can also create a SearchClient, SearchIndexClient, or SearchIndexerClient using Azure Active Directory (AAD) authentication. Your user or service principal must be assigned the "Search Index Data Reader" role. Using the DefaultAzureCredential you can authenticate a service using Managed Identity or a service principal, authenticate as a developer working on an application, and more all without changing code. Please refer the documentation for instructions on how to connect to Azure Cognitive Search using Azure role-based access control (Azure RBAC).

Before you can use the DefaultAzureCredential, or any credential type from Azure.Identity, you'll first need to install the Azure.Identity package.

To use DefaultAzureCredential with a client ID and secret, you'll need to set the AZURE_TENANT_ID, AZURE_CLIENT_ID, and AZURE_CLIENT_SECRET environment variables; alternatively, you can pass those values to the ClientSecretCredential also in azure-identity.

Make sure you use the right namespace for DefaultAzureCredential at the top of your source file:

import com.azure.identity.DefaultAzureCredential;
import com.azure.identity.DefaultAzureCredentialBuilder;

Then you can create an instance of DefaultAzureCredential and pass it to a new instance of your client:

String indexName = "nycjobs";

// Get the service endpoint from the environment
String endpoint = Configuration.getGlobalConfiguration().get("SEARCH_ENDPOINT");
DefaultAzureCredential credential = new DefaultAzureCredentialBuilder().build();

// Create a client
SearchClient client = new SearchClientBuilder()
    .endpoint(endpoint)
    .indexName(indexName)
    .credential(credential)
    .buildClient();

Send your first search query

To get running with Azure Cognitive Search first create an index following this guide. With an index created you can use the following samples to begin using the SDK.

Key concepts

An Azure Cognitive Search service contains one or more indexes that provide persistent storage of searchable data in the form of JSON documents. (If you're new to search, you can make a very rough analogy between indexes and database tables.) The azure-search-documents client library exposes operations on these resources through two main client types.

Azure Cognitive Search provides two powerful features:

Semantic search enhances the quality of search results for text-based queries. By enabling Semantic Search on your search service, you can improve the relevance of search results in two ways:

  • It applies secondary ranking to the initial result set, promoting the most semantically relevant results to the top.
  • It extracts and returns captions and answers in the response, which can be displayed on a search page to enhance the user's search experience.

To learn more about Semantic Search, you can refer to the documentation.

Vector Search is an information retrieval technique that overcomes the limitations of traditional keyword-based search. Instead of relying solely on lexical analysis and matching individual query terms, Vector Search utilizes machine learning models to capture the contextual meaning of words and phrases. It represents documents and queries as vectors in a high-dimensional space called an embedding. By understanding the intent behind the query, Vector Search can deliver more relevant results that align with the user's requirements, even if the exact terms are not present in the document. Moreover, Vector Search can be applied to various types of content, including images and videos, not just text.

To learn how to index vector fields and perform vector search, you can refer to the sample. This sample provides detailed guidance on indexing vector fields and demonstrates how to perform vector search.

Additionally, for more comprehensive information about Vector Search, including its concepts and usage, you can refer to the documentation. The documentation provides in-depth explanations and guidance on leveraging the power of Vector Search in Azure Cognitive Search.

Examples

The following examples all use a simple Hotel data set that you can import into your own index from the Azure portal. These are just a few of the basics - please check out our Samples for much more.

Querying

There are two ways to interact with the data returned from a search query.

Let's explore them with a search for a "luxury" hotel.

Use SearchDocument like a dictionary for search results

SearchDocument is the default type returned from queries when you don't provide your own. Here we perform the search, enumerate over the results, and extract data using SearchDocument's dictionary indexer.

for (SearchResult searchResult : SEARCH_CLIENT.search("luxury")) {
    SearchDocument doc = searchResult.getDocument(SearchDocument.class);
    String id = (String) doc.get("hotelId");
    String name = (String) doc.get("hotelName");
    System.out.printf("This is hotelId %s, and this is hotel name %s.%n", id, name);
}

Use Java model class for search results

Define a Hotel class.

public class Hotel {
    private String id;
    private String name;

    public String getId() {
        return id;
    }

    public Hotel setId(String id) {
        this.id = id;
        return this;
    }

    public String getName() {
        return name;
    }

    public Hotel setName(String name) {
        this.name = name;
        return this;
    }
}

Use it in place of SearchDocument when querying.

for (SearchResult searchResult : SEARCH_CLIENT.search("luxury")) {
    Hotel doc = searchResult.getDocument(Hotel.class);
    String id = doc.getId();
    String name = doc.getName();
    System.out.printf("This is hotelId %s, and this is hotel name %s.%n", id, name);
}

It is recommended, when you know the schema of the search index, to create a Java model class.

Search Options

The SearchOptions provide powerful control over the behavior of our queries.

Let's search for the top 5 luxury hotels with a good rating.

SearchOptions options = new SearchOptions()
    .setFilter("rating ge 4")
    .setOrderBy("rating desc")
    .setTop(5);
SearchPagedIterable searchResultsIterable = SEARCH_CLIENT.search("luxury", options, Context.NONE);
// ...

Creating an index

You can use the SearchIndexClient to create a search index. Indexes can also define suggesters, lexical analyzers, and more.

There are multiple ways of preparing search fields for a search index. For basic needs, we provide a static helper method buildSearchFields in SearchIndexClient and SearchIndexAsyncClient, which can convert Java POJO class into List<SearchField>. There are three annotations SimpleFieldProperty, SearchFieldProperty and FieldBuilderIgnore to configure the field of model class.

List<SearchField> searchFields = SearchIndexClient.buildSearchFields(Hotel.class, null);
SEARCH_INDEX_CLIENT.createIndex(new SearchIndex("index", searchFields));

For advanced scenarios, we can build search fields using SearchField directly.

List<SearchField> searchFieldList = new ArrayList<>();
searchFieldList.add(new SearchField("hotelId", SearchFieldDataType.STRING)
    .setKey(true)
    .setFilterable(true)
    .setSortable(true));

searchFieldList.add(new SearchField("hotelName", SearchFieldDataType.STRING)
    .setSearchable(true)
    .setFilterable(true)
    .setSortable(true));
searchFieldList.add(new SearchField("description", SearchFieldDataType.STRING)
    .setSearchable(true)
    .setAnalyzerName(LexicalAnalyzerName.EU_LUCENE));
searchFieldList.add(new SearchField("tags", SearchFieldDataType.collection(SearchFieldDataType.STRING))
    .setSearchable(true)
    .setFilterable(true)
    .setFacetable(true));
searchFieldList.add(new SearchField("address", SearchFieldDataType.COMPLEX)
    .setFields(new SearchField("streetAddress", SearchFieldDataType.STRING).setSearchable(true),
        new SearchField("city", SearchFieldDataType.STRING)
            .setSearchable(true)
            .setFilterable(true)
            .setFacetable(true)
            .setSortable(true),
        new SearchField("stateProvince", SearchFieldDataType.STRING)
            .setSearchable(true)
            .setFilterable(true)
            .setFacetable(true)
            .setSortable(true),
        new SearchField("country", SearchFieldDataType.STRING)
            .setSearchable(true)
            .setFilterable(true)
            .setFacetable(true)
            .setSortable(true),
        new SearchField("postalCode", SearchFieldDataType.STRING)
            .setSearchable(true)
            .setFilterable(true)
            .setFacetable(true)
            .setSortable(true)
    ));

// Prepare suggester.
SearchSuggester suggester = new SearchSuggester("sg", Collections.singletonList("hotelName"));
// Prepare SearchIndex with index name and search fields.
SearchIndex index = new SearchIndex("hotels").setFields(searchFieldList).setSuggesters(suggester);
// Create an index
SEARCH_INDEX_CLIENT.createIndex(index);

Retrieving a specific document from your index

In addition to querying for documents using keywords and optional filters, you can retrieve a specific document from your index if you already know the key. You could get the key from a query, for example, and want to show more information about it or navigate your customer to that document.

Hotel hotel = SEARCH_CLIENT.getDocument("1", Hotel.class);
System.out.printf("This is hotelId %s, and this is hotel name %s.%n", hotel.getId(), hotel.getName());

Adding documents to your index

You can Upload, Merge, MergeOrUpload, and Delete multiple documents from an index in a single batched request. There are a few special rules for merging to be aware of.

IndexDocumentsBatch<Hotel> batch = new IndexDocumentsBatch<>();
batch.addUploadActions(Collections.singletonList(new Hotel().setId("783").setName("Upload Inn")));
batch.addMergeActions(Collections.singletonList(new Hotel().setId("12").setName("Renovated Ranch")));
SEARCH_CLIENT.indexDocuments(batch);

The request will throw IndexBatchException by default if any of the individual actions fail, and you can use findFailedActionsToRetry to retry on failed documents. There's also a throwOnAnyError option, and you can set it to false to get a successful response with an IndexDocumentsResult for inspection.

Async APIs

The examples so far have been using synchronous APIs, but we provide full support for async APIs as well. You'll need to use SearchAsyncClient.

SEARCH_ASYNC_CLIENT.search("luxury")
    .subscribe(result -> {
        Hotel hotel = result.getDocument(Hotel.class);
        System.out.printf("This is hotelId %s, and this is hotel name %s.%n", hotel.getId(), hotel.getName());
    });

Authenticate in a National Cloud

To authenticate in a National Cloud, you will need to make the following additions to your client configuration:

  • Set the AuthorityHost in the credential options or via the AZURE_AUTHORITY_HOST environment variable
  • Set the audience in SearchClientBuilder, SearchIndexClientBuilder, or SearchIndexerClientBuilder
// Create a SearchClient that will authenticate through AAD in the China national cloud.
SearchClient searchClient = new SearchClientBuilder()
    .endpoint(ENDPOINT)
    .indexName(INDEX_NAME)
    .credential(new DefaultAzureCredentialBuilder()
        .authorityHost(AzureAuthorityHosts.AZURE_CHINA)
        .build())
    .audience(SearchAudience.AZURE_CHINA)
    .buildClient();

Troubleshooting

See our troubleshooting guide for details on how to diagnose various failure scenarios.

General

When you interact with Azure Cognitive Search using this Java client library, errors returned by the service correspond to the same HTTP status codes returned for REST API requests. For example, the service will return a 404 error if you try to retrieve a document that doesn't exist in your index.

Handling Search Error Response

Any Search API operation that fails will throw an HttpResponseException with helpful Status codes. Many of these errors are recoverable.

try {
    Iterable<SearchResult> results = SEARCH_CLIENT.search("hotel");
} catch (HttpResponseException ex) {
    // The exception contains the HTTP status code and the detailed message
    // returned from the search service
    HttpResponse response = ex.getResponse();
    System.out.println("Status Code: " + response.getStatusCode());
    System.out.println("Message: " + ex.getMessage());
}

You can also easily enable console logging if you want to dig deeper into the requests you're making against the service.

Enabling Logging

Azure SDKs for Java provide a consistent logging story to help aid in troubleshooting application errors and expedite their resolution. The logs produced will capture the flow of an application before reaching the terminal state to help locate the root issue. View the logging wiki for guidance about enabling logging.

Default HTTP Client

By default, a Netty based HTTP client will be used. The HTTP clients wiki provides more information on configuring or changing the HTTP client.

Next steps

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Impressions