ADF Copy data activity fails with Polybase option when trying to insert date to datetimeoffset column

Siniša Kaurić 21 Reputation points
2022-11-28T18:14:09.967+00:00

Hello,
I'm having problems using copy data activity and plcing copy method to PolyBase.
The problem is with the date column.
I have parquet file in ADLS and want to load data into sqldw table.
In that file there is a date column that is defined as INT96 and values look like: 2022-10-17T19:20:16.247Z
In destination table that column is defined as datetimeoffset(7).
When I execute the pipeline, copy activity fails with the following message:
"ErrorCode=PolybaseOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error happened when loading data into SQL Data Warehouse. Operation: 'Polybase operation'.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: MalformedInputException: Input length = 1,Source=.Net SqlClient Data Provider,SqlErrorNumber=107090,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=107090,State=1,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: MalformedInputException: Input length = 1,},],'"

When I change the destination column to datetime2 - everything works fine. But that is not an option.

I've tried searching and finding the solution but I can't find one. I'm suspecting it has something to do with the value format.
How can I solve this problem?

EDIT:
How does the process look like:

  1. I have date values in my sqldw table with datetimeoffset datatype (2022-01-03 12:19:43.0766667 +00:00)
  2. Then I'm exporting that data to ADLS parquet format via COPY activity. The parquet file type for that column is INT96 and the value stored there is 2022-01-03T12:19:43.077Z
  3. When I want to pull that data back to the same table with Polybase option - it fails on that same column with the error mentioned above.
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,535 questions
0 comments No comments
{count} votes

Accepted answer
  1. HimanshuSinha-msft 19,376 Reputation points Microsoft Employee
    2022-12-02T01:02:12.443+00:00

    Hello @Siniša Kaurić ,
    Thanks for the question and using MS Q&A platform.
    As we understand the ask here is timestamp to datetimeoffset , please do let us know if its not accurate.
    First of all let me thank you for putting the ask very clearly , i can see that you have put some extra time in that , whcih really helped .
    The Copy activity is literaly used to move data from source to sink , with very minimal transformation . In this case you will need to convert the data types . If you use data flow it will easy to achieve this .

    The process which you mentioned Step#1 says that you copy the file , since the data type in SQL is datetimeoffset , so ADF smartly changes to INT96 , hence the issues . I suggest you to use the query option and convert the column to varchar and all should work out .

    create table datetimeoffset_Paraquet
    (
    datetimeoffset_col1 datetimeoffset

    )

    INSERT INTO datetimeoffset_Paraquet values ('2022-01-03 12:19:43.0766667 +00:00')

    266340-image.png

    266316-image.png

    266381-image.png

    Please do let me if you have any queries.
    Thanks
    Himanshu


    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
      • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.
    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Siniša Kaurić 21 Reputation points
    2022-12-03T11:32:59.31+00:00

    Thank you @HimanshuSinha-msft , first for clarifying, second for suggesting a workaround.
    That is a good suggestion and I will try to use that approach.
    But now I'm facing a different issue:
    If I change the first step process and instead of a table (Source option) use a stored procedure to create select statement then I get the following error:
    "ErrorCode=ParquetInvalidColumnName,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The column name is invalid. Column name cannot contain these character:[,;{}()\n\t=],Source=Microsoft.DataTransfer.Common,'".

    I am searching for the solution for that but with no luck.
    What I want to do is to generate select statements for all my synapse tables and then move that data to ADLS. I've created a custom stored procedure that generates SELECT statement with column names for all tables.
    This is what the SQL lookls like:

    WITH c AS (  
     SELECT  
      TABLE_NAME  
     ,CASE  
      WHEN DATA_TYPE = 'datetimeoffset'  
      THEN COLUMN_NAME + '=CONVERT(VARCHAR(40),' + COLUMN_NAME + ')'  
      ELSE COLUMN_NAME + '=' + COLUMN_NAME  
      END as COLUMN_NAME2  
     FROM INFORMATION_SCHEMA.COLUMNS  
     WHERE TABLE_NAME = @table_name  
       AND TABLE_SCHEMA = @table_schema  
       )  
    SELECT   
      'SELECT '  
     + TRIM(STRING_AGG(COLUMN_NAME2, ','))  
     + ' FROM '  
     + TRIM(@table_schema)+TRIM('.') + TRIM(@table_name)  
    FROM c  
    GROUP BY TABLE_NAME  
    

    So, I want to dynamically pass the select statement for every table in synapse and if the column is of datetimeoffset type - convert it to varchar.
    I tried every possible solution to get it to work but with no luck. Replace function, trimming, trimming every part of code, but still receive the error above when executing copy activity.

    For the parameters of COPY activity:
    Source:
    Use query: Stored procedure, and passing two parameters: table_schema and table_name. (Copy activity is inside FOR EACH and tables are picked up with lookup activity before for each.)
    Sink:
    266797-image.png
    266776-image.png

    I dont have column mapping set up because we have multiple different tables in synapse.

    0 comments No comments

  2. Siniša Kaurić 21 Reputation points
    2022-12-06T15:35:39.653+00:00

    Turns out this was my mistake - COPY activity can not use output of stored procedure, so I've added a lookup prior to copy activity that executes stored procedure and then use that output (SELECT statement) as a source query.
    Hope this helps anyone who comes across the same issue :)

    0 comments No comments