Security overview for Azure Cognitive Search
This article describes the security features in Azure Cognitive Search that protect data and operations.
Data flow (network traffic patterns)
A Cognitive Search service is hosted on Azure and is typically accessed by client applications over public network connections. While that pattern is predominant, it's not the only traffic pattern that you need to care about. Understanding all points of entry as well as outbound traffic is necessary background for securing your development and production environments.
Cognitive Search has three basic network traffic patterns:
- Inbound requests made by a client to the search service (the predominant pattern)
- Outbound requests issued by the search service to other services on Azure and elsewhere
- Internal service-to-service requests over the secure Microsoft backbone network
Inbound requests that target a search service endpoint consist of:
- Creating or managing indexes, indexers, data sources, skillsets, or synonym lists
- Running indexers and skillsets
- Querying an index
For inbound access to data and operations on your search service, you can implement a progression of security measures, starting with network security features. You can create either inbound rules in an IP firewall, or private endpoints that fully shield your search service from the public internet.
Independent of network security, all inbound requests must be authenticated. Key-based authentication is the default. Alternatively, you can use Azure Active Directory and role-based access control for data plane operations (currently in preview).
Outbound requests from a search service to other applications are typically made by indexers for text-based indexing and some aspects of AI enrichment. Outbound requests include both read and write operations.
Outbound requests are made by the search service on its own behalf, and on the behalf of an indexer or custom skill:
- Search connects to Azure Key Vault for a customer-managed key used to encrypt and decrypt sensitive data.
- Indexers connect to external data sources to read in data for indexing.
- Indexers write to Azure Storage when creating knowledge stores, persisting cached enrichments, and persisting debug sessions.
- Custom skills connect to an Azure function or app to run external code that's hosted off-service. The request for external processing is sent during skillset execution.
Outbound connections can be made using a resource's full access connection string that includes a key or a database login, or an Azure AD login (a managed identity) if you're using Azure Active Directory.
If your Azure resource is behind a firewall, you'll need to create rules that admit search service requests. For resources protected by Azure Private Link, you can create a shared private link that an indexer uses to make its connection.
Internal requests are secured and managed by Microsoft. You can't configure or control these connections. If you're locking down network access, no action on your part is required because internal traffic isn't customer-configurable.
Internal traffic consists of:
- Service-to-service calls for tasks like authentication and authorization through Azure Active Directory, resource logging sent to Azure Monitor, and private endpoint connections that utilize Azure Private Link.
- Requests made to Cognitive Services APIs for built-in skills.
Network security protects resources from unauthorized access or attack by applying controls to network traffic. Azure Cognitive Search supports networking features that can be your first line of defense against unauthorized access.
Inbound connection through IP firewalls
A search service is provisioned with a public endpoint that allows access using a public IP address. To restrict which traffic comes through the public endpoint, create an inbound firewall rule that admits requests from a specific IP address or a range of IP addresses. All client connections must be made through an allowed IP address, or the connection is denied.
You can use the portal to configure firewall access.
Alternatively, you can use the management REST APIs. Starting with API version 2020-03-13, with the IpRule parameter, you can restrict access to your service by identifying IP addresses, individually or in a range, that you want to grant access to your search service.
Inbound connection to a private endpoint (network isolation, no Internet traffic)
The private endpoint uses an IP address from the virtual network address space for connections to your search service. Network traffic between the client and the search service traverses over the virtual network and a private link on the Microsoft backbone network, eliminating exposure from the public internet. A VNET allows for secure communication among resources, with your on-premises network as well as the Internet.
While this solution is the most secure, using additional services is an added cost so be sure you have a clear understanding of the benefits before diving in. For more information about costs, see the pricing page. For more information about how these components work together, watch this video. Coverage of the private endpoint option starts at 5:48 into the video. For instructions on how to set up the endpoint, see Create a Private Endpoint for Azure Cognitive Search.
Once a request is admitted, it must still undergo authentication and authorization that determines whether the request is permitted. Cognitive Search supports two approaches:
Key-based authentication is performed on the request (not the calling app or user) through an API key, where the key is a string composed of randomly generated numbers and letters that prove the request is from a trustworthy source. Keys are required on every request. Submission of a valid key is considered proof the request originates from a trusted entity.
Azure AD authentication (preview) establishes the caller (and not the request) as the authenticated identity. An Azure role assignment determines the allowed operation.
Outbound requests made by an indexer are subject to the authentication protocols supported by the external service. A search service can be made a trusted service on Azure, connecting to other services using a system or user-assigned managed identity. For more information, see Set up an indexer connection to a data source using a managed identity.
Cognitive Search provides different authorization models for content management and service management.
Authorization for content management
If you're using key-based authentication, authorization on content operations is conferred through the type of API key on the request:
Admin key (allows read-write access for create-read-update-delete operations on the search service), created when the service is provisioned
Query key (allows read-only access to the documents collection of an index), created as-needed and are designed for client applications that issue queries
In application code, you specify the endpoint and an API key to allow access to content and options. An endpoint might be the service itself, the indexes collection, a specific index, a documents collection, or a specific document. When chained together, the endpoint, the operation (for example, a create or update request) and the permission level (full or read-only rights based on the key) constitute the security formula that protects content and operations.
If you're using Azure AD authentication, use role assignments instead of API keys to establish who and what can read and write to your search service.
Controlling access to indexes
In Azure Cognitive Search, an individual index is generally not a securable object. As noted previously for key-based authentication, access to an index will include read or write permissions based on which API key you provide on the request, along with the context of an operation. In a query request, there's no concept of joining indexes or accessing multiple indexes simultaneously so all requests target a single index by definition. As such, construction of the query request itself (a key plus a single target index) defines the security boundary.
However, if you're using Azure roles, you can set permissions on individual indexes as long as it's done programmatically.
For key-based authentication scenarios, administrator and developer access to indexes is undifferentiated: both need write access to create, delete, and update the objects managed by the service. Anyone with an admin key to your service can read, modify, or delete any index in the same service. For protection against accidental or malicious deletion of indexes, your in-house source control for code assets is the solution for reversing an unwanted index deletion or modification. Azure Cognitive Search has failover within the cluster to ensure availability, but it doesn't store or execute your proprietary code used to create or load indexes.
For multitenancy solutions requiring security boundaries at the index level, such solutions typically include a middle tier, which customers use to handle index isolation. For more information about the multitenant use case, see Design patterns for multitenant SaaS applications and Azure Cognitive Search.
Controlling access to documents
If you require granular, per-user control over search results, you can build security filters on your queries, returning documents associated with a given security identity.
Conceptually equivalent to "row-level security", authorization to content within the index isn't natively supported using predefined roles or role assignments that map to entities in Azure Active Directory. Any user permissions on data in external systems, such as Azure Cosmos DB, don't transfer with that data as its being indexed by Cognitive Search.
Workarounds for solutions that require "row-level security" include creating a field in the data source that represents a security group or user identity, and then using filters in Cognitive Search to selectively trims search results of documents and content based on identities. The following table describes two approaches for trimming search results of unauthorized content.
|Security trimming based on identity filters||Documents the basic workflow for implementing user identity access control. It covers adding security identifiers to an index, and then explains filtering against that field to trim results of prohibited content.|
|Security trimming based on Azure Active Directory identities||This article expands on the previous article, providing steps for retrieving identities from Azure Active Directory (Azure AD), one of the free services in the Azure cloud platform.|
Authorization for Service Management
Service Management operations are authorized through Azure role-based access control (Azure RBAC). Azure RBAC is an authorization system built on Azure Resource Manager for provisioning of Azure resources.
In Azure Cognitive Search, Resource Manager is used to create or delete the service, manage API keys, and scale the service. As such, Azure role assignments will determine who can perform those tasks, regardless of whether they're using the portal, PowerShell, or the Management REST APIs.
Three basic roles are defined for search service administration. The role assignments can be made using any supported methodology (portal, PowerShell, and so forth) and are honored service-wide. The Owner and Contributor roles can perform a variety of administration functions. You can assign the Reader role to users who only view essential information.
Using Azure-wide mechanisms, you can lock a subscription or resource to prevent accidental or unauthorized deletion of your search service by users with admin rights. For more information, see Lock resources to prevent unexpected deletion.
Azure Cognitive Search won't store data outside of your specified region without your authorization. Specifically, the following features write to an Azure Storage resource: enrichment cache, debug session, knowledge store. The storage account is one that you provide, and it could be in any region.
If both the storage account and the search service are in the same region, network traffic between search and storage uses a private IP address and occurs over the Microsoft backbone network. Because private IP addresses are used, you can't configure IP firewalls or a private endpoint for network security. Instead, use the trusted service exception as an alternative when both services are in the same region.
At the storage layer, data encryption is built in for all service-managed content saved to disk, including indexes, synonym maps, and the definitions of indexers, data sources, and skillsets. Optionally, you can add customer-managed keys (CMK) for supplemental encryption of indexed content. For services created after August 1 2020, CMK encryption extends to data on temporary disks, for full "double encryption" of indexed content.
Data in transit
In Azure Cognitive Search, encryption starts with connections and transmissions, and extends to content stored on disk. For search services on the public internet, Azure Cognitive Search listens on HTTPS port 443. All client-to-service connections use TLS 1.2 encryption. Earlier versions (1.0 or 1.1) aren't supported.
Data at rest
For data handled internally by the search service, the following table describes the data encryption models. Some features, such as knowledge store, incremental enrichment, and indexer-based indexing, read from or write to data structures in other Azure Services. Those services have their own levels of encryption support separate from Azure Cognitive Search.
|server-side encryption||Microsoft-managed keys||None (built-in)||None, available on all tiers, in all regions, for content created after January 24 2018.||Content (indexes and synonym maps) and definitions (indexers, data sources, skillsets)|
|server-side encryption||customer-managed keys||Azure Key Vault||Available on billable tiers, in all regions, for content created after January 2019.||Content (indexes and synonym maps) on data disks|
|server-side double encryption||customer-managed keys||Azure Key Vault||Available on billable tiers, in selected regions, on search services after August 1 2020.||Content (indexes and synonym maps) on data disks and temporary disks|
Service-managed encryption is a Microsoft-internal operation, based on Azure Storage Service Encryption, using 256-bit AES encryption. It occurs automatically on all indexing, including on incremental updates to indexes that aren't fully encrypted (created before January 2018).
Customer-managed keys (CMK)
Customer-managed keys require an additional billable service, Azure Key Vault, which can be in a different region, but under the same subscription, as Azure Cognitive Search. Enabling CMK encryption will increase index size and degrade query performance. Based on observations to date, you can expect to see an increase of 30%-60% in query times, although actual performance will vary depending on the index definition and types of queries. Because of this performance impact, we recommend that you only enable this feature on indexes that really require it. For more information, see Configure customer-managed encryption keys in Azure Cognitive Search.
In Azure Cognitive Search, double encryption is an extension of CMK. It's understood to be two-fold encryption (once by CMK, and again by service-managed keys), and comprehensive in scope, encompassing long-term storage that is written to a data disk, and short-term storage written to temporary disks. Double encryption is implemented in services created after specific dates. For more information, see Double encryption.
Manage API keys
Reliance on API key-based authentication means that you should have a plan for regenerating the admin key at regular intervals, per Azure security best practices. There are a maximum of two admin keys per search service. For more information about securing and managing API keys, see Create and manage api-keys.
Activity and resource logs
Cognitive Search doesn't log user identities so you can't refer to logs for information about a specific user. However, the service does log create-read-update-delete operations, which you might be able to correlate with other logs to understand the agency of specific actions.
Using alerts and the logging infrastructure in Azure, you can pick up on query volume spikes or other actions that deviate from expected workloads. For more information about setting up logs, see Collect and analyze log data and Monitor query requests.
Certifications and compliance
Azure Cognitive Search participates in regular audits, and has been certified against many global, regional, and industry-specific standards for both the public cloud and Azure Government. For the complete list, download the Microsoft Azure Compliance Offerings whitepaper from the official Audit reports page.
For compliance, you can use Azure Policy to implement the high-security best practices of Microsoft cloud security benchmark. The Microsoft cloud security benchmark is a collection of security recommendations, codified into security controls that map to key actions you should take to mitigate threats to services and data. There are currently 12 security controls, including Network Security, Logging and Monitoring, and Data Protection.
Azure Policy is a capability built into Azure that helps you manage compliance for multiple standards, including those of Microsoft cloud security benchmark. For well-known benchmarks, Azure Policy provides built-in definitions that provide both criteria and an actionable response that addresses non-compliance.
For Azure Cognitive Search, there's currently one built-in definition. It's for resource logging. With this built-in, you can assign a policy that identifies any search service that is missing resource logging, and then turns it on. For more information, see Azure Policy Regulatory Compliance controls for Azure Cognitive Search.
Watch this video
Watch this fast-paced video for an overview of the security architecture and each feature category.
Submit and view feedback for