SQL Server 2019 Developer Edition: Collation: Latin1_General_100_CI_AS_SC_UTF8: Varchar is taking 2 bytes for ASCII Characters. For example the upper case letter 'A'

Aswin 21 Reputation points
2022-10-17T11:22:27.177+00:00

Hi,
Problem as described in Question.
Steps followed:

Step 1:
SELECT name, collation_name
FROM sys.databases
WHERE name = N'MyOptionsTest2';

251050-image.png

CREATE TABLE SCTEST4 (ID INT,
VARCHARTEST VARCHAR(30),
NVARCHARTEST NVARCHAR(60),
);
GO
INSERT INTO SCTEST4 VALUES(1,'A', N'A')

251072-image.png

SELECT *,LEN(NVARCHARTEST) as LENVC, DATALENGTH(NVARCHARTEST) as DATALENVC,ASCII(VARCHARTEST) as ASCIIVARCHAR, UNICODE(VARCHARTEST) as UNICODEVARCHAR, LEN(NVARCHARTEST) as LENNVC, DATALENGTH(NVARCHARTEST) as DATALENNVC, ASCII(NVARCHARTEST) as ASCIINVARCHAR, UNICODE(NVARCHARTEST) as UNICODENVARCHAR from SCTEST4

251101-image.png

Just to Cross verify the collation on the column VARCHARTEST:

SELECT
name,
collation_name
FROM sys.columns
WHERE name = N'VARCHARTEST';

251102-image.png

It appears VARCHAR takes 2 bytes for ASCII characters too, ? (This is of course UTF-8 collation which seem to be introduced in SQL Server 2019)
Can anyone help answer this ?

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
14,165 questions
SQL Server Integration Services
SQL Server Integration Services
A Microsoft platform for building enterprise-level data integration and data transformations solutions.
2,615 questions
0 comments No comments
{count} votes

Accepted answer
  1. Viorel 118.5K Reputation points
    2022-10-17T13:56:35.73+00:00

    Maybe you should test the DATALENGTH(VARCHARTEST) instead of DATALENGTH(NVARCHARTEST).

    1 person found this answer helpful.

4 additional answers

Sort by: Most helpful
  1. Olaf Helper 45,371 Reputation points
    2022-10-17T12:50:41.717+00:00

    What have you expected by a UTF-8 codepage?
    ASCII always uses 1 byte, while UTF-8 stores up to 4 bytes, see UTF-8 support => Storage

    1 person found this answer helpful.

  2. CathyJi-MSFT 22,331 Reputation points Microsoft Vendor
    2022-10-18T02:03:50.58+00:00

    Hi @Aswin ,

    Quote from MS document.

    With UTF-8 encoding, characters in the ASCII range (000000–00007F) require 1 byte, code points 000080–0007FF require 2 bytes, code points 000800–00FFFF require 3 bytes, and code points 0010000–0010FFFF require 4 bytes.

    251384-capture.png


    If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    1 person found this answer helpful.
    0 comments No comments

  3. Aswin 21 Reputation points
    2022-10-18T08:11:58.49+00:00

    Thank You, this helps


  4. Aswin 21 Reputation points
    2022-10-18T08:19:53.737+00:00

    Everything was fine but there was small typo ,mistake in the below:

    /*******************
    SELECT *,LEN(NVARCHARTEST) as LENVC, DATALENGTH(NVARCHARTEST) as DATALENVC,ASCII(VARCHARTEST) as ASCIIVARCHAR, UNICODE(VARCHARTEST) as UNICODEVARCHAR, LEN(NVARCHARTEST) as LENNVC, DATALENGTH(NVARCHARTEST) as DATALENNVC, ASCII(NVARCHARTEST) as ASCIINVARCHAR, UNICODE(NVARCHARTEST) as UNICODENVARCHAR from SCTEST4
    *****************/

    I was looking at DATALENGTH(NVARCHARTEST) instead of DATALENGTH(VARCHARTEST) -- The second attribute in above query

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.