Performance of Compress and Decompress nvarchar(max) in sql server

Question

This is on Microsoft SQL Server 2019

I have a table in the following form

create table tab1 (date_id int not null,i int not null, json_field nvarchar(max) null)
go
The size of json_field is over 8K. I am planning to have multi-billion rows in this table. Apparently the size of the table is ballooning.

So I am considering

create table tab2 (date_id int not null,i int not null,json_field nvarbinary(max) null)

When I attempt to compress the json column the ingestion rates are pathetic. While at it I am more concerned with the reading of the data. I have a requirement to perform some aggregations on few of the columns in the json structure.

1) The size of tab2 (with nvarbinary compressed) - 1/4 of tab1 (with nvarchar(max) uncompressed)
2) The CPU Utilization is 4x when I use nvarbinary and then decompress

select g.i
 ,sum(jcol1)
 ,sum(jcol2)
 ,sum(jcol3) 
from tab1 g 
outer apply openjson(g.json_field)
 with  (jcol1 float '$.jcol1'
 ,jcol2 float '$.jcol2' 
,jcol3 float '$.jcol3' )
 where g.date_id = 20210101 
group by g.i

select g.i
 ,sum(jcol1)
 ,sum(jcol2)
 ,sum(jcol3) 
from (select date_id,i,json_field = decompress(json_field) from tab2) g  
outer apply openjson(g.json_field)
 with  (jcol1 float '$.jcol1'
 ,jcol2 float '$.jcol2' 
,jcol3 float '$.jcol3' )
 where g.date_id = 20210101 
group by g.i

Are there any ways to improve performance? Any help is greatly appreciated.

Thanks!

Accepted Answer

Assuming that you know beforehand which columns to aggregate on, I would consider to load the data as an nvarchar(MAX) column, so that the load is quick. Then I would have a background task - an Agent job, or maybe something Service Broker basked - that takes a couple of rows at a time and extracts the fields you need to aggregate on and loads these to a table, and this routine also compresses the data.

An alternative is to read to the json into the binary column directly, but uncompressed, which would be indicated with a flag, and the background job would work on that column.

Of course, this approach assumes that you don't those aggregations on the spot, but can wait a while until the background job has completed.

Performance of Compress and Decompress nvarchar(max) in sql server

0 additional answers