Features of Azure Cognitive Search
Azure Cognitive Search provides a full-text search engine, persistent storage of search indexes, integrated AI used during indexing to extract more text and structure, and APIs and tools.
The following table summarizes features by category. For more information about how Cognitive Search compares with other search technologies, see Compare search options.
|Data sources||Search indexes can accept text from any source, provided it's submitted as a JSON document.
Indexers are a feature that automates data import from supported data sources to extract searchable content in primary data stores. Indexers handle JSON serialization for you and most support some form of change and deletion detection. You can connect to a variety of data sources, including Azure SQL Database, Azure Cosmos DB, or Azure Blob storage.
|Hierarchical and nested data structures||Complex types and collections allow you to model virtually any type of JSON structure within a search index. One-to-many and many-to-many cardinality can be expressed natively through collections, complex types, and collections of complex types.|
|Linguistic analysis||Analyzers are components used for text processing during indexing and search operations. By default, you can use the general-purpose Standard Lucene analyzer, or override the default with a language analyzer, a custom analyzer that you configure, or another predefined analyzer that produces tokens in the format you require.
Language analyzers from Lucene or Microsoft are used to intelligently handle language-specific linguistics including verb tenses, gender, irregular plural nouns (for example, 'mouse' vs. 'mice'), word de-compounding, word-breaking (for languages with no spaces), and more.
Custom lexical analyzers are used for complex query forms such as phonetic matching and regular expressions.
AI enrichment and knowledge mining
|AI processing during indexing||AI enrichment refers to embedded image and natural language processing in an indexer pipeline that extracts text and information from content that can't otherwise be indexed for full text search. AI processing is achieved by adding and combining skills in a skillset, which is then attached to an indexer. AI can be either built-in skills from Microsoft, such as text translation or Optical Character Recognition (OCR), or custom skills that you provide.|
|Storing enriched content for analysis and consumption in non-search scenarios||Knowledge store is persistent storage of enriched content, intended for non-search scenarios like knowledge mining and data science processing. A knowledge store is defined in a skillset, but created in Azure Storage as objects or tabular rowsets.|
|Cached enrichments||Incremental enrichment (preview) refers to cached enrichments that can be reused during skillset execution. Caching is particularly valuable in skillsets that include OCR and image analysis, which are expensive to process.|
Query and user experience
|Free-form text search||Full-text search is a primary use case for most search-based apps. Queries can be formulated using a supported syntax.
Simple query syntax provides logical operators, phrase search operators, suffix operators, precedence operators.
Full Lucene query syntax includes all operations in simple syntax, with extensions for fuzzy search, proximity search, term boosting, and regular expressions.
|Relevance||Simple scoring is a key benefit of Azure Cognitive Search. Scoring profiles are used to model relevance as a function of values in the documents themselves. For example, you might want newer products or discounted products to appear higher in the search results. You can also build scoring profiles using tags for personalized scoring based on customer search preferences you've tracked and stored separately.
Semantic search (preview) is premium feature that reranks results based on semantic relevance to the query. Depending on your content and scenario, it can significantly improve search relevance with almost minimal configuration or effort.
|Geospatial search||Geospatial functions filter over and match on geographic coordinates. You can match on distance or by inclusion in a polygon shape.|
|Filters and facets||Faceted navigation is enabled through a single query parameter. Azure Cognitive Search returns a faceted navigation structure you can use as the code behind a categories list, for self-directed filtering (for example, to filter catalog items by price-range or brand).
Filters can be used to incorporate faceted navigation into your application's UI, enhance query formulation, and filter based on user- or developer-specified criteria. Create filters using the OData syntax.
|User experience||Autocomplete can be enabled for type-ahead queries in a search bar.
Search suggestions also works off of partial text inputs in a search bar, but the results are actual documents in your index rather than query terms.
Synonyms associates equivalent terms that implicitly expand the scope of a query, without the user having to provide the alternate terms.
Hit highlighting applies text formatting to a matching keyword in search results. You can choose which fields return highlighted snippets.
Sorting is offered for multiple fields via the index schema and then toggled at query-time with a single search parameter.
Paging and throttling your search results is straightforward with the finely tuned control that Azure Cognitive Search offers over your search results.
|Data encryption||Microsoft-managed encryption-at-rest is built into the internal storage layer and is irrevocable.
Customer-managed encryption keys that you create and manage in Azure Key Vault can be used for supplemental encryption of indexes and synonym maps. For services created after August 1 2020, CMK encryption extends to data on temporary disks, for full double encryption of indexed content.
|Endpoint protection||IP rules for inbound firewall support allows you to set up IP ranges over which the search service will accept requests.
Create a private endpoint using Azure Private Link to force all requests through a virtual network.
|Azure role-based access control||RBAC for data plane (preview) refers to the assignment of roles to users and groups in Azure Active Directory to control access to search content and operations.|
|Outbound security (indexers)||Data access through private endpoints allows an indexer to connect to Azure resources that are protected through Azure Private Link.
Data access using a trusted identity means that connection strings to external data sources can omit user names and passwords. When an indexer connects to the data source, the resource allows the connection if the search service was previously registered as a trusted service.
|Tools for prototyping and inspection||Add index is an index designer in the portal that you can use to create a basic schema consisting of attributed fields and a few other settings. After saving the index, you can populate it using an SDK or the REST API to provide the data.
Import data wizard creates indexes, indexers, skillsets, and data source definitions. If your data exists in Azure, this wizard can save you significant time and effort, especially for proof-of-concept investigation and exploration.
Search explorer is used to test queries and refine scoring profiles.
Create demo app is used to generate an HTML page that can be used to test the search experience.
Debug Sessions is a visual editor that lets you debug a skillset interactively. It shows you dependencies, output, and transformations.
|Monitoring and diagnostics||Enable monitoring features to go beyond the metrics-at-a-glance that are always visible in the portal. Metrics on queries per second, latency, and throttling are captured and reported in portal pages with no extra configuration required.|
|REST||Service REST API is for data plane operations, including all operations related to indexing, queries, and AI enrichment. You can also use this client library to retrieve system information and statistics.
Management REST API is for service creation and provisioning through Azure Resource Manager. You can also use this API to manage keys and capacity.
|Azure SDK for .NET||Azure.Search.Documents is for data plane operations, including all operations related to indexing, queries, and AI enrichment. You can also use this client library to retrieve system information and statistics.
Microsoft.Azure.Management.Search is for service creation and provisioning through Azure Resource Manager. You can also use this API to manage keys and capacity.
|Azure SDK for Java||com.azure.search.documents is for data plane operations, including all operations related to indexing, queries, and AI enrichment. You can also use this client library to retrieve system information and statistics.
com.microsoft.azure.management.search is for service creation and provisioning through Azure Resource Manager. You can also use this API to manage keys and capacity.
|Azure SDK for Python||azure-search-documents is for data plane operations, including all operations related to indexing, queries, and AI enrichment. You can also use this client library to retrieve system information and statistics.
azure-mgmt-search is for service creation and provisioning through Azure Resource Manager. You can also use this API to manage keys and capacity.
azure/arm-search is for service creation and provisioning through Azure Resource Manager. You can also use this API to manage keys and capacity.