Azure AI Search performance benchmarks

2024-04-22

Important

These benchmarks apply to search services created before April 3, 2024 on deployments that run on older infrastructure. The benchmarks also apply to nonvector workloads only. Updates are pending for services and workloads on the new limits.

Performance benchmarks are useful for estimating potential performance under similar configurations. Actual performance depends on a variety of factors, including the size of your search service and the types of queries you're sending.

To help you estimate the size of search service needed for your workload, we ran several benchmarks to document the performance for different search services and configurations.

To cover a range of different use cases, we ran benchmarks for two main scenarios:

E-commerce search - This benchmark emulates a real e-commerce scenario and is based on the Nordic e-commerce company CDON.
Document search - This scenario is comprised of keyword search over full text documents from Semantic Scholar. This emulates a typical document search solution.

While these scenarios reflect different use cases, every scenario is different so we always recommend performance testing your individual workload. We've published a performance testing solution using JMeter so you can run similar tests against your own service.

Testing methodology

To benchmark Azure AI Search's performance, we ran tests for two different scenarios at different tiers and replica/partition combinations.

To create these benchmarks, the following methodology was used:

The test begins at X queries per second (QPS) for 180 seconds. This was usually 5 or 10 QPS.
QPS then increased by X and ran for another 180 seconds
Every 180 seconds, the test increased by X QPS until average latency increased above 1000 ms or less than 99% of queries succeeded.

The following graph gives a visual example of what the test's query load looks like:

Example test

Each scenario used at least 10,000 unique queries to avoid tests being overly skewed by caching.

Important

These tests only include query workloads. If you expect to have a high volume of indexing operations, be sure to factor that into your estimation and performance testing. Sample code for simulating indexing can be found in this tutorial.

Definitions

Maximum QPS - the maximum QPS numbers are based on the highest QPS achieved in a test where 99% of queries completed successfully without throttling and average latency stayed under 1000 ms.
Percentage of max QPS - A percentage of the maximum QPS achieved for a particular test. For example, if a given test reached a maximum of 100 QPS, 20% of max QPS would be 20 QPS.
Latency - The server's latency for a query; these numbers don't include round trip delay (RTT). Values are in milliseconds (ms).

Testing disclaimer

The code we used to run these benchmarks is available on the azure-search-performance-testing repository. It's worth noting that we observed slightly lower QPS levels with the JMeter performance testing solution than in the benchmarks. The differences can be attributed to differences in the style of the tests. This speaks to the importance of making your performance tests as similar to your production workload as possible.

Important

These benchmarks in no way guarantee a certain level of performance from your service but can give you an idea of the performance you can expect based on your scenario.

If you have any questions or concerns, reach out to us at azuresearch_contact@microsoft.com.

Benchmark 1: E-commerce search

This benchmark was created in partnership with the e-commerce company, CDON, the Nordic region's largest online marketplace with operations in Sweden, Finland, Norway, and Denmark. Through its 1,500 merchants, CDON offers a wide range assortment that includes over 8 million products. In 2020, CDON had over 120 million visitors and 2 million active customers. You can learn more about CDON's use of Azure AI Search in this article.

To run these tests, we used a snapshot of CDON's production search index and thousands of unique queries from their website.

Scenario Details

Document Count: 6,000,000
Index Size: 20 GB
Index Schema: a wide index with 250 fields total, 25 searchable fields, and 200 facetable/filterable fields
Query Types: full text search queries including facets, filters, ordering, and scoring profiles

S1 Performance

Queries per second

The following chart shows the highest query load a service could handle for an extended period of time in terms of queries per second (QPS).

Highest maintainable QPS ecommerce s1

Query latency

Query latency varies based on the load of the service and services under higher stress have a higher average query latency. The following table shows the 25th, 50th, 75th, 90th, 95th, and 99th percentiles of query latency for three different usage levels.

Percentage of max QPS	Average latency	25%	75%	90%	95%	99%
20%	104 ms	35 ms	115 ms	177 ms	257 ms	738 ms
50%	140 ms	47 ms	144 ms	241 ms	400 ms	1175 ms
80%	239 ms	77 ms	248 ms	466 ms	763 ms	1752 ms

S2 Performance

Queries per second

The following chart shows the highest query load a service could handle for an extended period of time in terms of queries per second (QPS).

Highest maintainable QPS ecommerce s2

Query latency

Percentage of max QPS	Average latency	25%	75%	90%	95%	99%
20%	56 ms	21 ms	68 ms	106 ms	132 ms	210 ms
50%	71 ms	26 ms	83 ms	132 ms	177 ms	329 ms
80%	140 ms	47 ms	153 ms	293 ms	452 ms	924 ms

S3 Performance

Queries per second

The following chart shows the highest query load a service could handle for an extended period of time in terms of queries per second (QPS).

Highest maintainable QPS ecommerce s3

In this case, we see that adding a second partition significantly increases the maximum QPS but adding a third partition provides diminishing marginal returns. The smaller improvement is likely because all of the data is already being pulled into the S3's active memory with just two partitions.

Query latency

Percentage of max QPS	Average latency	25%	75%	90%	95%	99%
20%	50 ms	20 ms	64 ms	83 ms	98 ms	160 ms
50%	62 ms	24 ms	80 ms	107 ms	130 ms	253 ms
80%	115 ms	38 ms	121 ms	218 ms	352 ms	828 ms