How to read uniqueidentifier into Spark notebook in Synapse Analytics Studio?

Question

We are looking into a possibility to move some notebooks from Databricks to Synapse Studio. However, it seems that it is impossible in Synapse Spark notebooks to query tables with uniqueidentifiers?

Example table:

CREATE TABLE dbo.test
(id uniqueidentifier null
, ts datetimeoffset not null
,data float null
)

Create a spark dataframe in notebook:
val df = spark.read.synapsesql("testdb.dbo.test")
display(df)

The display() command returns error:
Error : com.microsoft.spark.sqlanalytics.exception.SQLAnalyticsConnectorException: com.microsoft.sqlserver.jdbc.SQLServerException: Columns with UniqueIdentifier types are not supported in external tables.

I thought about creating a view where the uniqueidentifier would be a varchar, but views are not supported by synapsesql().

Do I have any other way to get this table into Spark notebook for some processing, or should we just forget about Synapse studio for now?

Answer

Hi @HimanshuSinha-msft , Thanks for your comment, but I find it hard to agree with. On the link you provided, it also does not list datetimeoffset(7) as a supported data type, but still when using synapsesql() it loads the datetimeoffset(7) column as a Spark string type, AND it displays the dataframe without errors. Also, uniqueidentifier is mapped to string, but then it gives the error as in my original post. Please see images added to this post. ![62388-image-7.png][1] ![62464-image-9.png][2] It seems to me that correctly fetching uniqueidentifier by synapsesql() is just not yet implemented, or it is implemented erroneously. If the first is true (not yet implemented), I would strongly recommend to implement in the near future. Otherwise, what is the benefit of synapsesql() if you cannot use it for some data types? Also, uniqueidentifier is supported by JDBC and other competitive platforms (eg Databricks). It is strange that synapsesql() doesn't do it, because based on the image here: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/synapse-spark-sql-pool-import-export I understand that synapsesql() is also using jdbc. Unfortunately, Synapse Studio is not useable for our use case then at the moment. Let me know if I misunderstood something. [1]: /api/attachments/62388-image-7.png?platform=QnA [2]: /api/attachments/62464-image-9.png?platform=QnA

Answer

This is still an issue and a recommended work around would be great.

Answer

So I'm trying to reach Synapse DW table with uniqueidentifier table in pyspark dataframe and then cover to pandas in databricks but it fails with

com.microsoft.sqlserver.jdbc.SQLServerException: Columns with UniqueIdentifier types are not supported in external tables. [ErrorCode = 102050] [SQLState = S0001]

Is this same issue?

If so any workaround available/ This is kind of very surprise limitation as I'm sure uniqueidentifier would be very widely used datatype.

Answer

I got the same issue, do you have any update on this?

Share via

How to read uniqueidentifier into Spark notebook in Synapse Analytics Studio?

4 answers