How to change the data type of the SQL Endpoint of a delta table?

Matthias Wenger 5 Reputation points
2023-12-13T08:34:22.6166667+00:00

I have create a delta table with this function:

def get_parquet_and_create_delta(table_name, lake_db):
    df = spark.read.load(f'abfss://xxx.parquet', format='parquet')
    delta_table_path = f'abfss://xxx'
    df.write.mode('error').format("delta").save(delta_table_path)
    spark.sql("CREATE TABLE {0}.{1} USING DELTA LOCATION '{2}'".format(lake_db, table_name, delta_table_path))

In my Lake DB in the UI a table is created. All the text data type are VARCHAR(8000). When I use the SQL pool to query the table I get the error:

String or binary data would be truncated while reading column of type 'VARCHAR(8000)'. Check ANSI_WARNINGS option.

In SparkSQL I can run the same queries without an issue. When I do df.describe in the spark notebook all the columns with text data have data type 'string'.

How can I alter the data types of the SQL Endpoint?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Pandey, Kinjal 0 Reputation points Student Ambassador
    2023-12-13T09:00:12.03+00:00

    To change the data type of a column in a Delta table for use with an SQL endpoint, particularly in the scenario you've described, you need to alter the table schema. Since you are dealing with a Delta table and using Spark, the process involves using Spark SQL to modify the table schema. Here's a step-by-step guide:

    Read the Existing Table Schema: You first need to understand the current schema of your Delta table. You can do this by reading the table into a DataFrame and printing its schema.

    Read the Delta table into a DataFrame

    df = spark.read.format("delta").load(delta_table_path)

    Print the current schema

    df.printSchema()

    Identify the Columns to Alter: From the schema, identify the columns whose data types need to be changed. In your case, these would be columns currently set to VARCHAR(8000) that are causing the truncation error.

    Alter the Table Schema: Use Spark SQL to alter the table schema. You can change the data type of a column using the ALTER TABLE command. For example, to change a column myColumn to TEXT type, you would do:

    Alter the column data type

    spark.sql(f"ALTER TABLE {lake_db}.{table_name} ALTER COLUMN myColumn TYPE TEXT")

    Repeat this for each column you need to change. Replace TEXT with the appropriate data type that suits your needs, like VARCHAR(255), TEXT, etc.

    Test the Changes: After altering the table, it's important to test and make sure the changes are reflected and that the SQL pool queries now work without the previous error.

    Handle Data Truncation Issues: If changing the data type does not resolve the truncation issue, you may need to look into the data itself. There might be instances where the actual data length exceeds the maximum limit for the data type you have set.

    Data Type Considerations: When altering data types, be mindful of the implications. Changing to a type with a smaller limit may lead to data loss if existing data exceeds that limit. Conversely, increasing the limit might resolve truncation issues but could have performance implications.


  2. KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator
    2024-01-09T22:20:32.5533333+00:00

    @Matthias Wenger Thanks for using this forum posting your question.

    You can overcome this issue by using With to set the type and length of the field that is causing the problem while using SQL pool queries.

    For example as below:

    SELECT * FROM
    OPENROWSET( BULK N'https://myaccount.dfs.core.windows.net/mycontainer/mysubfolder/data.parquet', FORMAT = 'PARQUET') 
    WITH (
          C1 int, 
          C2 varchar(20),
          C3 varchar(max)
    ) as rows
    
    
    

    You may also try setting the warning off if you just want to look into the data without defining the data types

    SET ANSI_WARNINGS OFF 
    SELECT TOP 100 *FROM
    OPENROWSET(BULK '',FORMAT = 'DELTA') AS [result]
    SET ANSI_WARNINGS ON
    

    Hope this info helps.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.