How to use the GZIP compression while inserting multiple rows at the same time

Andreas ss 726 Reputation points
2020-11-10T00:18:20.907+00:00

Hello!

  • I use: Enterprise core edition SQL server

I am trying to understand how to compress a SQL table. Now I have looked at the GZIP algorithm, that would be interesting to test out as it offers a very good compression. This is my first time when it comes to compression, so I am not exactly sure how I will do this correctly. I have looked at the below link which shows that a compressed table only takes up 6% of the space which means that it offers very good compression.
https://www.mssqltips.com/sqlservertip/5709/using-compress-and-decompress-in-sql-server-to-save-disk-space/

I will try to tell what I now do step by step without compression.

Step 1: I create this table

CREATE TABLE [dbo].[table123] (
    [_DateTime]    SMALLDATETIME DEFAULT (getdate()) NOT NULL,
    [_DayNr]       TINYINT       DEFAULT ((0)) NOT NULL,
    [_CategoryNbr] TINYINT       DEFAULT ((0)) NOT NULL,
    [_FeatureNbr]  SMALLINT      DEFAULT ((-1)) NOT NULL,
    [_Value]       FLOAT (53)    NULL,
    [_Bool]        BIT           NULL,
    CONSTRAINT [PK_table123] PRIMARY KEY CLUSTERED ([_DayNr] ASC, [_DateTime] ASC, [_CategoryNbr] ASC, [_FeatureNbr] ASC),
    CONSTRAINT [UC_table123] UNIQUE NONCLUSTERED ([_FeatureNbr] ASC, [_DateTime] ASC)
);

Step 2: I now insert many rows at a time, usually about 100 rows at a time, to save overhead time and make the inserts faster more efficient

INSERT INTO table123 (
    _DateTime,
    _DayNr,
    _CategoryNbr,
    _FeatureNbr,
    _Value,
    _Bool
)
VALUES
    ('2010-08-01 17:00:00', 1, 1, 1, 12.4578941564, 1),
    ('2010-08-02 17:00:00', 2, 1, 1, 13.4578941564, 1),
    ('2010-08-03 17:00:00', 3, 1, 1, 14.4578941564, 1),
    ('2010-08-04 17:00:00', 4, 1, 1, 15.4578941564, 1);
    

Many questions at the same time now:
1. Am I thinking correct by saving overhead/time by inserting multiple rows at a time like this?
2. Now I simply wonder. How can I apply the GZIP compress algorithm for all those 4 rows(with all column values) while I insert them to the table like above, - is my big question now, I beleive. At the very same time, I wonder if the GZIP algorithm would offer the best/one the best compressions just thinking of taking up as low storage on the disk as possible. If not please be free to tell me a better compression method for this table?
I beleive it must be possible to query/extract the values without uncompressing the data, if this is possible(Here then thinking of if not all compression methods can do that)

Thank you!

SQL Server Other
0 comments No comments
{count} votes

Accepted answer
  1. tibor_karaszi@hotmail.com 4,316 Reputation points
    2020-11-11T09:07:27.003+00:00

    A tip is that queries with high selectivity (the returns relatively few rows) are likely to not perform as well with a columnstore index as with a suitable "traditional" index (row index).

    There are no SEEK in a columnstore index. If you are lucky, there are rowgroup elimination.

    The pointer from a row index is the rowgroup where the row lives in the clustered columnstore index. Remember that the clustered columnstore index is the data. So after finding each row in the row index, SQL server has to scan about 1,000,000 rows in the columnstore index to find the one you were looking for. For each row.

    Unless the row index covers the query, that is (all the columns that the qurery refers to is in the row index).

    There is much much more to columnstore index that first meet the eye. Make sure you understand the architecture etc for columnstore indexes before deciding on them.

    1 person found this answer helpful.

4 additional answers

Sort by: Most helpful
  1. Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
    2020-11-10T23:11:07.64+00:00

    Here is your table with a clustered columnstore index:

    CREATE TABLE [dbo].[table123] (
    [_DateTime] SMALLDATETIME DEFAULT (getdate()) NOT NULL,
    [_DayNr] TINYINT DEFAULT ((0)) NOT NULL,
    [_CategoryNbr] TINYINT DEFAULT ((0)) NOT NULL,
    [_FeatureNbr] SMALLINT DEFAULT ((-1)) NOT NULL,
    [_Value] FLOAT (53) NULL,
    [_Bool] BIT NULL,
    CONSTRAINT [PK_table123] PRIMARY KEY NONCLUSTERED ([_DayNr] ASC, [_DateTime] ASC, [_CategoryNbr] ASC, [_FeatureNbr] ASC),
    CONSTRAINT [UC_table123] UNIQUE NONCLUSTERED ([_FeatureNbr] ASC, [_DateTime] ASC),
    INDEX colstore_ix CLUSTERED COLUMNSTORE
    );

    I don't have the syntax for COLUMNSTORE_ARCHIVE around, but as Tibor says, you should first get acquainted with columnstore to determine whether it is for you.

    1 person found this answer helpful.

  2. tibor_karaszi@hotmail.com 4,316 Reputation points
    2020-11-10T07:54:22.27+00:00

    GZIP compression is suitable when you have one column which can hold lots of data. Typically an nvarchar(max). In your case, you only have tiny columns, so this compression (which work at the column level) isn't suitable for you.

    Possibly Data Compression is a better choice: https://learn.microsoft.com/en-us/sql/relational-databases/data-compression/data-compression?view=sql-server-ver15

    Os, perhaps even having a clustered columnstore index on the table, but that is pretty much aimed at data analysis rather than OLTP type of work.


  3. CathyJi-MSFT 22,396 Reputation points Microsoft External Staff
    2020-11-10T09:47:54.603+00:00

    Hi @Andreas ss ,

    Data be compressed using the GZIP algorithm format. This is most suitable for compressing portions of the data when archiving old data for long-term storage. Data compressed using the COMPRESS function cannot be indexed. Please refer to COMPRESS (Transact-SQL).

    As TiborKaraszi mentioned, you can try to use Data Compression, please refer to the blog How to use SQL Server Data Compression to Save Space.

    Best regards,
    Cathy


    If the response is helpful, please click "Accept Answer" and upvote it.
    Hot issues October--Users always get connection timeout problem when using multi subnet AG via listener. Especially after failover to another subnet

    0 comments No comments

  4. tibor_karaszi@hotmail.com 4,316 Reputation points
    2020-11-10T15:53:20.76+00:00

    Tp put it bluntly, you shouldn't use columnstore unless you are pretty familiar with columnstore. I'm talking about the architecture behind columnstore, the advantages and the disadvantages. It is mainly designed for data warehousing, and I wouldn't recommend using it unless you pretty confident in what you are doing.

    1: Instead of creating a regular clustered index, you create a clustered columnstore index.

    2: The code you posted doesn't do columnstore_archive compression, it does data compression, page. Anyhow, you can specofy what compression you want to have in the CREATE (table/index) and the data will be compressed from the start. Note, however, the page compression kicks in whan a page is full and targeted for more rows. And for columnstore, SQL server batches rows in batches of 1,000,000 rows. I.e., you do 1,000,000 inserts which will be done under the covers as uncompressed and then after 1,000,000 rows they are copressed into a compressed rowgroup.

    There are lots and lots to say about columnstore. I suggest you prepare to spend some time reading up on it. Two starting points are:
    http://www.sqlservercentral.com/stairway/121631/
    http://www.nikoport.com/columnstore/


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.