How to read uniqueidentifier into Spark notebook in Synapse Analytics Studio?

Kristo Raun 16 Reputation points
2021-01-27T08:06:57.087+00:00

We are looking into a possibility to move some notebooks from Databricks to Synapse Studio. However, it seems that it is impossible in Synapse Spark notebooks to query tables with uniqueidentifiers?

Example table:

CREATE TABLE dbo.test
(id uniqueidentifier null
, ts datetimeoffset not null
,data float null
) 

Create a spark dataframe in notebook:
val df = spark.read.synapsesql("testdb.dbo.test")
display(df)

The display() command returns error:
Error : com.microsoft.spark.sqlanalytics.exception.SQLAnalyticsConnectorException: com.microsoft.sqlserver.jdbc.SQLServerException: Columns with UniqueIdentifier types are not supported in external tables.

I thought about creating a view where the uniqueidentifier would be a varchar, but views are not supported by synapsesql().

Do I have any other way to get this table into Spark notebook for some processing, or should we just forget about Synapse studio for now?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,457 questions
{count} votes

4 answers

Sort by: Most helpful
  1. Kristo Raun 16 Reputation points
    2021-02-01T09:38:56.77+00:00

    Hi @HimanshuSinha-msft , Thanks for your comment, but I find it hard to agree with. On the link you provided, it also does not list datetimeoffset(7) as a supported data type, but still when using synapsesql() it loads the datetimeoffset(7) column as a Spark string type, AND it displays the dataframe without errors. Also, uniqueidentifier is mapped to string, but then it gives the error as in my original post. Please see images added to this post. ![62388-image-7.png][1] ![62464-image-9.png][2] It seems to me that correctly fetching uniqueidentifier by synapsesql() is just not yet implemented, or it is implemented erroneously. If the first is true (not yet implemented), I would strongly recommend to implement in the near future. Otherwise, what is the benefit of synapsesql() if you cannot use it for some data types? Also, uniqueidentifier is supported by JDBC and other competitive platforms (eg Databricks). It is strange that synapsesql() doesn't do it, because based on the image here: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/synapse-spark-sql-pool-import-export I understand that synapsesql() is also using jdbc. Unfortunately, Synapse Studio is not useable for our use case then at the moment. Let me know if I misunderstood something. [1]: /api/attachments/62388-image-7.png?platform=QnA [2]: /api/attachments/62464-image-9.png?platform=QnA

    1 person found this answer helpful.
    0 comments No comments

  2. Andrew Fogarty 6 Reputation points Microsoft Employee
    2021-06-09T15:45:57.94+00:00

    This is still an issue and a recommended work around would be great.

    1 person found this answer helpful.
    0 comments No comments

  3. Dilkush Patel 22 Reputation points
    2022-10-08T21:47:19.227+00:00

    So I'm trying to reach Synapse DW table with uniqueidentifier table in pyspark dataframe and then cover to pandas in databricks but it fails with

    com.microsoft.sqlserver.jdbc.SQLServerException: Columns with UniqueIdentifier types are not supported in external tables. [ErrorCode = 102050] [SQLState = S0001]

    Is this same issue?

    If so any workaround available/ This is kind of very surprise limitation as I'm sure uniqueidentifier would be very widely used datatype.

    0 comments No comments

  4. Thanapon Chaijunla 0 Reputation points
    2023-02-02T09:33:19.8133333+00:00

    I got the same issue, do you have any update on this?

    0 comments No comments