Indexing JSON documents in Sql Server 2016

JSON indexes in SQL Server

One important thing in every database is indexing. Indexes can speed-up your filter or sort operations. Without indexes, Sql Server would need to perform full table scan every time you query data.

Sql Server 2016 do not have some customized JSON index. However, you can use standard indexes on JSON documents. In this post we will see how you can use these indexes to optimize your queries.

Indexing JSON properties

In many cases you would filter results from a table by some JSON property or order results using some value in a JSON document.

An example, we can imagine that we have “Info” column in the AdventureWorks SalesOrderHeader table that contains various information about sales order (e.g. information about customer, sales person, shipping/billing addresses, etc.). We want to use this column to filter sales orders for some customer. The query that we might want to optimize using index is shown in the following code:

 SELECT SalesOrderNumber, OrderDate,
 JSON_VALUE(Info, '$.Customer.Name') AS CustomerName
 FROM Sales.SalesOrderHeader 
 WHERE JSON_VALUE(Info, '$.Customer.Name') = N'Aaron Campbell'
 

If you want to speed-up your filters or order by clauses applied on some property in JSON document, you can use the same indexes that you are using on any other columns. However, you cannot directly reference properties in the JSON documents. You need to create “virtual column” that will return values that will be used for filtering and create index on that column (in our example column that will expose customer name stored in the $.Customer.Name path in JSON documents). An example of computed column that will be used for indexing and index on that column is shown in the following code:

 ALTER TABLE Sales.SalesOrderHeader
 ADD vCustomerName AS JSON_VALUE(Info, '$.Customer.Name')
 
 CREATE INDEX idx_soh_json_CustomerName
 ON Sales.SalesOrderHeader(vCustomerName)
 

Note that column is not persisted. It does not occupy additional space in the table and it is just computed when index needs to be rebuilt.

It is important that you create computed column with the same expression as the one that is used in the query, i.e. JSON_VALUE(Info, '$.Customer.Name') in the query above.

You don’t need to rewrite your queries - if you use expressions with JSON_VALUE function, Sql Server will see that there is an equivalent computed column with the same expression and apply index if possible. Execution plan for the query above might look like to one on the following figure:

Instead of the full table scan, Sql Server uses index seek into non-clustered index and finds rows that satisfy conditions. Then it uses Key Lookup in the SalesOrderHeader table to fetch other columns that are used in the query (i.e. SalesOrderNumber and OrderDate in our case).

Note that you can avoid this additional lookop in the table if you add required columns in the index. You could add these columns in the JSON index as standard included columns:

 CREATE INDEX idx_soh_json_CustomerName
 ON Sales.SalesOrderHeader(vCustomerName)
 INCLUDE (SalesOrderNumber, OrderDate)

In this case Sql Server would not read additional data from the SalesOrderHeader table because everything is included in the non-clustered JSON index. This is might be a good way to combine JSON and column data in the queries and create optimal indexes fro your workload.

Collation aware indexes

The important feature of JSON indexes is the fact that they are collation aware. Result of JSON_VALUE function is a text that inherits collation from input expression. Therefore, values in the index will be ordered using the collation rules defined in the source columns.

To demonstrate this, we can create a simple collection table with primary key and json content:

 CREATE TABLE JsonCollection
 ( id int identity constraint PK_JSON_ID primary key,
 json nvarchar(max) COLLATE Serbian_Cyrillic_100_CI_AI
 CONSTRAINT [Content should be formatted as JSON]
 CHECK (ISJSON(json)>0)
 )
 

In this case we have set Serbian Cyrillic collation on the JSON column. Now we can populate table and create index on name property:

 INSERT INTO JsonCollection
 VALUES (N'{"name":"Иво","surname":"Андрић"}'),
 (N'{"name":"Андрија","surname":"Герић"}'),
 (N'{"name":"Владе","surname":"Дивац"}'),
 (N'{"name":"Новак","surname":"Ђоковић"}'),
 (N'{"name":"Предраг","surname":"Стојаковић"}'),
 (N'{"name":"Михајло","surname":"Пупин"}'),
 (N'{"name":"Борислав","surname":"Станковић"}'),
 (N'{"name":"Владимир","surname":"Грбић"}'),
 (N'{"name":"Жарко","surname":"Паспаљ"}'),
 (N'{"name":"Дејан","surname":"Бодирога"}'),
 (N'{"name":"Ђорђе","surname":"Вајферт"}'),
 (N'{"name":"Горан","surname":"Бреговић"}'),
 (N'{"name":"Милутин","surname":"Миланковић"}'),
 (N'{"name":"Никола","surname":"Тесла"}')
 
 GO
 ALTER TABLE JsonCollection
 ADD vName AS JSON_VALUE(json, '$.name')
 
 CREATE INDEX idx_name
 ON JsonCollection(vName)
 

We have created standard index on computed column vName that represents the value from the $.name property. In Serbian Cyrillic code page, order of letters is ‘А’,’Б’,’В’,’Г’,’Д’,’Ђ’,’Е’, etc. Since the result of JSON_VALUE function inherits collation from the source column, order in the index will be compliant with Serbian Cyrillic rules. Now if we want to query this collection and sort results by name:

 SELECT JSON_VALUE(json, '$.name'), * FROM JsonCollection
 ORDER BY JSON_VALUE(json, '$.name')
 

If you look at the actual execution plan, you will see that it uses sorted values from non-clustered index:

 

We don’t have Sort operator although we have ORDER BY clause. JSON index is already ordered according to the Serbian Cyrillic rules so Sql Server can use non clustered index where results are already sorted.

However, if we change collation of the order by expression (e.g. put COLLATE French_100_CI_AS_SC after JSON_VALUE function), we will get different query execution plan:

Since the order of values in the index is not compliant to the French collation rules, Sql Server cannot use it to order results. Therefore, it will add separate Sort operator that will sort results using French collation rules.