Snowflake V2 Connector show all data types as String in Sink

Question

Snowflake V2 Connector show all data types as String in Sink

Ivan Miliovsky 25

Hello All,

I had to change the version from Snowflake Legacy connector to SnowflakeV2 Connector and I found out very weird behavior of the connector. The connector interprets in Sink activity, snowflake's data types as string of all columns, which is causing some data truncation when the pipeline is inserting new records into the target table. In the dataset all columns data types are fine, but when I used it into the sink all columns are string. The issue occurred mainly when you try to insert a timestamp data. For instance, Snowflake table contains column which is in TIMESTAMP_NTZ(9) data type and when you pass to the sink a timestamp value, the milliseconds fraction is cut off. passing '2024-04-30 10:00:33.357' in Snowflake data is inserted as '2024-04-30 10:00:33.000'

I think this is a bug which is related to how the Snowflake V2 Connector is interprets the data types! If the support has a workaround or idea how this should be fixed, please share it.

Vinodh247 34,661 Reputation points MVP Volunteer Moderator

2024-07-17T09:39:49.56+00:00
Consider any of the following and see if it works...

To address this, consider setting TRUNCATECOLUMNS = TRUE in your copy options. This will truncate the columns to the maximum width defined for each column 1.

Keep in mind that if a string length exceeds 16,777,216 characters, it will still generate an error. In such cases, you can handle errors using ON_ERROR = CONTINUE.

Manually convert the timestamp data in your pipeline to avoid truncation or use custom SQL expressions to handle timestamp conversions.
Chandra Boorla 14,585 Reputation points Microsoft External Staff Moderator

2024-07-19T12:34:37.5933333+00:00

Hi @ Ivan Miliovsky

Thanks for the question and using MS Q&A platform.

Apologies for the inconvenience that you are facing here.

As a workaround try using a Script activity to insert data into Snowflake with a SQL script can provide more control over the data insertion process. By using Snowflake's parse_timestamp function, you can explicitly specify the format of the timestamp value, including the milliseconds precision.

Here's an example of how you can use the parse_timestamp function in a SQL script: ``sql INSERT INTO mytable (timestamp_column) VALUES (PARSE_TIMESTAMP('%Y-%m-%d %H:%M:%S.%f', '2024-04-30 10:00:33.357'));

This script uses the PARSE_TIMESTAMP function to parse the timestamp value '2024-04-30 10:00:33.357' with milliseconds precision. The '%Y-%m-%d %H:%M:%S.%f' format specifier indicates that the input string is in the format 'year-month-day hour:minute:second.millisecond'.

By using this approach, you can ensure that the timestamp value is inserted into Snowflake with the correct precision. Make sure to adjust the format specifier to match the format of your timestamp values.

Using a Script activity with a SQL script can provide more flexibility and control over the data insertion process. You can also use this approach to perform additional data transformations or validations before inserting the data into Snowflake.

Additionally, please log a feedback item regarding Snowflake V2 connector on the ADF feedback channel:

Please do share the feedback link once it is posted so that we can share the info with respective product team. All the feedback logged in this forum are actively monitored and reviewed by respective engineering team and will consider during future implementations.

https://feedback.azure.com/d365community/forum/1219ec2d-6c26-ec11-b6e6-000d3a4f032c

I hope this information helps, please do let us know if you any further queries.

Accepted answer

2 additional answers

Your answer

Vinodh247 34,661 Reputation points MVP Volunteer Moderator

2024-07-17T09:39:49.56+00:00

Consider any of the following and see if it works...

To address this, consider setting TRUNCATECOLUMNS = TRUE in your copy options. This will truncate the columns to the maximum width defined for each column 1.

Keep in mind that if a string length exceeds 16,777,216 characters, it will still generate an error. In such cases, you can handle errors using ON_ERROR = CONTINUE.

Manually convert the timestamp data in your pipeline to avoid truncation or use custom SQL expressions to handle timestamp conversions.

Answer 1

Mok Max 80

i am not sure if anyone else is facing the same issue of using ADF with snowflake. We had raised a support ticket with MS and they had acknowledged this to be a v2 driver bug. Currently our workaround is to ingest everything to snowflake as string. And create a store procedure to change the data type

Answer 2

Ivan Miliovsky 25

Thanks for the answer Vinodh247, but none of those workarounds are working. The only way how to bypass this issue with the timestamp is to use schemaless data set and to parametrize it. Otherwise the the Snowflake V2 connector is not recognizing properly column's data types into the Sink activity.

Chandra Boorla 14,585 Reputation points Microsoft External Staff Moderator

2024-07-23T11:44:59.93+00:00

Hi @ Ivan Miliovsky

Glad to know your issue has been resolved. Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others "I'll repost your solution in case you'd like to accept the answer.
Chandra Boorla 14,585 Reputation points Microsoft External Staff Moderator

2024-07-24T15:07:45.1566667+00:00

Hi @Ivan Miliovsky

Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 3

Hi @Ivan Miliovsky

I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer.

Issue: I had to change the version from Snowflake Legacy connector to SnowflakeV2 Connector and I found out very weird behavior of the connector. The connector interprets in Sink activity, snowflake's data types as string of all columns, which is causing some data truncation when the pipeline is inserting new records into the target table. In the dataset all columns data types are fine, but when I used it into the sink all columns are string. The issue occurred mainly when you try to insert a timestamp data. For instance, Snowflake table contains column which is in TIMESTAMP_NTZ(9) data type and when you pass to the sink a timestamp value, the milliseconds fraction is cut off. passing '2024-04-30 10:00:33.357' in Snowflake data is inserted as '2024-04-30 10:00:33.000'

I think this is a bug which is related to how the Snowflake V2 Connector is interprets the data types! If the support has a workaround or idea how this should be fixed, please share it.

Solution: The only way how to bypass this issue with the timestamp is to use schemaless data set and to parametrize it. Otherwise the the Snowflake V2 connector is not recognizing properly column's data types into the Sink activity.

If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

Snowflake V2 Connector show all data types as String in Sink

2 additional answers

Your answer