Performance of Compress and Decompress nvarchar(max) in sql server

G P 41 Reputation points
2021-02-19T21:16:21.577+00:00

This is on Microsoft SQL Server 2019

I have a table in the following form

create table tab1 (date_id int not null,i int not null, json_field nvarchar(max) null)
go
The size of json_field is over 8K. I am planning to have multi-billion rows in this table. Apparently the size of the table is ballooning.

So I am considering

create table tab2 (date_id int not null,i int not null,json_field nvarbinary(max) null)

When I attempt to compress the json column the ingestion rates are pathetic. While at it I am more concerned with the reading of the data. I have a requirement to perform some aggregations on few of the columns in the json structure.

1) The size of tab2 (with nvarbinary compressed) - 1/4 of tab1 (with nvarchar(max) uncompressed)
2) The CPU Utilization is 4x when I use nvarbinary and then decompress

select g.i

,sum(jcol1)

,sum(jcol2)

,sum(jcol3)

from tab1 g

outer apply openjson(g.json_field)

with 
(jcol1 float '$.jcol1'

,jcol2 float '$.jcol2'

,jcol3 float '$.jcol3'
)

where g.date_id = 20210101

group by g.i

select g.i

,sum(jcol1)

,sum(jcol2)

,sum(jcol3)

from (select date_id,i,json_field = decompress(json_field) from tab2) g 

outer apply openjson(g.json_field)

with 
(jcol1 float '$.jcol1'

,jcol2 float '$.jcol2'

,jcol3 float '$.jcol3'
)

where g.date_id = 20210101

group by g.i

Are there any ways to improve performance? Any help is greatly appreciated.

Thanks!

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,825 questions
0 comments No comments
{count} votes

Accepted answer
  1. Erland Sommarskog 101.9K Reputation points MVP
    2021-02-19T22:21:47.267+00:00

    Assuming that you know beforehand which columns to aggregate on, I would consider to load the data as an nvarchar(MAX) column, so that the load is quick. Then I would have a background task - an Agent job, or maybe something Service Broker basked - that takes a couple of rows at a time and extracts the fields you need to aggregate on and loads these to a table, and this routine also compresses the data.

    An alternative is to read to the json into the binary column directly, but uncompressed, which would be indicated with a flag, and the background job would work on that column.

    Of course, this approach assumes that you don't those aggregations on the spot, but can wait a while until the background job has completed.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful