Azure Synapse Delta Table with Varchar(max)

Axel 0 Reputation points
2024-09-04T17:02:08.1366667+00:00

Hi all,

I did save a Dataframe as Delta-format with pyspark and created a managed table in a Lake Database, to access it aswell with SQL Script and the Serverless-Endpoint

raw_stream\
    .writeStream\
    .format("delta")\
    .outputMode("append") \
    .option("checkpointLocation", bronze_path_chk_path)\
    .start(bronze_table_path)

spark.sql(f'CREATE TABLE IF NOT EXISTS bronze_layer.{entity} USING DELTA LOCATION \'{bronze_table_path}/\'')

As some columns, which are nested JSON, have a lenght of more then 10000, I receive an error when querying over the SQL Serverless

Error: Column 'JSON_column' of type 'NVARCHAR(4000)' is not compatible with external data type 'JSON string. (underlying parquet nested/repeatable column must be read as VARCHAR or CHAR)'.

Primary Question : How can I define the columntype as varchar(max) from within a notebook? So the table can be directly queried.

Secondary Question: If from within a notebook is not possible, how to set it up in a Lake Database? Serverless Database is not an option, as I need the managed Delta Tables.

One more hint, the following Query returns proper results within SQL on the Lake Database


SELECT 
    column1,
    column1,
   
      JSON_VALUE(JSON_column, '$.property1') AS property1,
      JSON_VALUE(JSON_column, '$.property2') AS property2,

FROM
    OPENROWSET(
        BULK 'https://***.dfs.core.windows.net/bronze/entity/',
        FORMAT = 'DELTA'
    )  
    WITH
    (
		column1,
		column2,
        JSON_column VARCHAR(MAX)
   
    ) query

Thanks

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,841 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Axel 0 Reputation points
    2024-09-05T06:14:09.0966667+00:00

    Thank You for the response, but that does not work. For the spark.sql execution throws an error on the type "varchar(max)"

    ParseException: 
    [PARSE_SYNTAX_ERROR] Syntax error at or near 'MAX'.(line 1, pos 81)
    
    == SQL ==
    ALTER TABLE bronze_layer.rntity ALTER COLUMN JSON_column TYPE VARCHAR(MAX)
    
    

    If I use "string" instead, it works fine, but as the column is allready in that type it changes nothing. A query from SQL continues to fail.

    For the second command, I only reveive "Query completed with errors.", but no further details


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.