Reading using sqlanalytics connector in spark using Notebooks - Synapse

Sivagnana Sundaram, Krithiga 31 Reputation points
2022-03-30T21:32:01.34+00:00

I am reading a table in the Synapse notebooks using the sqlanalytics connector. When the spark encounters an empty string in a column , it is trying to convert to None/Null.

I am getting error, when the column is NOT NULL .

Column ordinal: 7, Expected data type: NVARCHAR(50) collate SQL_Latin1_General_CP1_CI_AS NOT NULL

I am trying to fill with a default value, so I can read the DataFrame. It doesn't convert it somehow.

Is there any work around to this?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,067 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Sivagnana Sundaram, Krithiga 31 Reputation points
    2022-04-01T13:34:27.627+00:00

    CREATE TABLE [dbo].[DimCustomer2] (
    [CustomerKey] INT NOT NULL,
    [GeographyKey] INT NULL,
    [CustomerAlternateKey] nvarchar(15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
    )

    insert into [dbo].[DimCustomer2] values
    (
    1,1,'')

    189202-image.png


  2. AnnuKumari-MSFT 33,816 Reputation points Microsoft Employee
    2022-05-09T05:28:41.89+00:00

    Hi @Sivagnana Sundaram, Krithiga ,

    We got response from product team on the above issue. Kindly have a look:

    " Yes, this is the current behavior of Synapse/Polybase: an empty string gets treated as null on export. The upcoming Gen3 DW (ded SQL Pool) will address this which will preview late 2022. Since the fix is non-trivial, there are no plans to fix for current Gen2.

    However, a couple options can be done to get unblocked here:

    1. When doing CETAS export, convert the empty strings to a unique placeholder value or even an empty space. The round-trip will work and differentiate null vs. empty fields.

     CREATE EXTERNAL TABLE [dbo].[abc]  
        WITH (DATA_SOURCE = [SQLAnalyticsConnectorDataSourceTest1],LOCATION = N'/abc',FILE_FORMAT = [SQLAnalyticsConnectorDefaultFileFormat798e8ca0543342a6b43e0787ab2a7db1])  
        AS  
        SELECT  
        CASE WHEN name = '' THEN ' ' END AS name  
        FROM "dbo"."abc1"  
    

    2. Use CSV instead of parquet and specify a STRING_DELIMITER. This will natively support null/empty string differentiation as well without any work-arounds. "

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.