Synapse can't transfer data from parquet file to Azure SQL Database table

Lily 1 Reputation point
2024-04-26T10:37:45.7633333+00:00

Hi!

The pipeline in Synapse (workspace) transfers data from parquet files to a database using a copy activity. Everything has been working fine for months, but today it simply times out without a specific error. It seems the problem lies in transforming the read data and writing it to the database, as source-sink combinations like parquet-parquet, db-db, txt-db, and txt-parquet transfer data normally.

Test file contains only 1 row with the following values:
Timestamp Level OperationName OperationItem Message

On the sink side, auto create table is enabled and synapse does create it using parquet header data, but doesn't transfer the data to the table. I also tried to create another data source for parquet but it doesn't work either.

Synapse has no issue reading the same file in .txt format and writing the data to the database table.

Thank you in advance!
Kr

Azure SQL Database
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
2,765 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,476 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Lily 1 Reputation point
    2024-04-26T11:29:07.5333333+00:00

    Datatype of all the columns is nvarchar(max) - so it wasn't it I guess.
    After 4 hours we finally found the issue - seems like auto create option makes some kind of distraction for Synapse.
    We found the following error in diagnostic settings logs (not visible in synapse):
    { "PartitionId": "0", "LogicalServerName": "...", "SubscriptionId": "...", "ResourceGroup": "...", "time": "2024-04-26T10:31:21.1730408Z", "resourceId": "...", "category": "Errors", "operationName": "ErrorEvent", "properties": {"ElasticPoolName":"","DatabaseName":"...","query_hash":"0","query_plan_hash":"0","message":"Invalid object name 'dbo.noviparquettest1'.","error_number":208,"severity":16,"user_defined":false,"state":1}, "location": "centralus"}.
    So we disabled auto create option and now it works fine. :/ :/

    Thank you anyway!
    Kr


  2. Smaran Thoomu 10,875 Reputation points Microsoft Vendor
    2024-04-26T17:51:11.3333333+00:00

    Hi @Lily

    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer .

    Issue: The pipeline in Synapse (workspace) transfers data from parquet files to a database using a copy activity. Everything has been working fine for months, but today it simply times out without a specific error. It seems the problem lies in transforming the read data and writing it to the database, as source-sink combinations like parquet-parquet, db-db, txt-db, and txt-parquet transfer data normally.

    Test file contains only 1 row with the following values: Timestamp Level OperationName OperationItem Message

    On the sink side, auto create table is enabled and synapse does create it using parquet header data, but doesn't transfer the data to the table. I also tried to create another data source for parquet but it doesn't work either.

    Synapse has no issue reading the same file in .txt format and writing the data to the database table.

    Solution: Datatype of all the columns is nvarchar(max) - so it wasn't it I guess. After 4 hours we finally found the issue - seems like auto create option makes some kind of distraction for Synapse. We found the following error in diagnostic settings logs (not visible in synapse): { "PartitionId": "0", "LogicalServerName": "...", "SubscriptionId": "...", "ResourceGroup": "...", "time": "2024-04-26T10:31:21.1730408Z", "resourceId": "...", "category": "Errors", "operationName": "ErrorEvent", "properties": {"ElasticPoolName":"","DatabaseName":"...","query_hash":"0","query_plan_hash":"0","message":"Invalid object name 'dbo.noviparquettest1'.","error_number":208,"severity":16,"user_defined":false,"state":1}, "location": "centralus"}. So we disabled auto create option and now it works fine.

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    I hope this helps!

    If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments