How to change the data type of the SQL Endpoint of a delta table?

Question

How to change the data type of the SQL Endpoint of a delta table?

Matthias Wenger 5

I have create a delta table with this function:

def get_parquet_and_create_delta(table_name, lake_db):
    df = spark.read.load(f'abfss://xxx.parquet', format='parquet')
    delta_table_path = f'abfss://xxx'
    df.write.mode('error').format("delta").save(delta_table_path)
    spark.sql("CREATE TABLE {0}.{1} USING DELTA LOCATION '{2}'".format(lake_db, table_name, delta_table_path))

In my Lake DB in the UI a table is created. All the text data type are VARCHAR(8000). When I use the SQL pool to query the table I get the error:

String or binary data would be truncated while reading column of type 'VARCHAR(8000)'. Check ANSI_WARNINGS option.

In SparkSQL I can run the same queries without an issue. When I do df.describe in the spark notebook all the columns with text data have data type 'string'.

How can I alter the data types of the SQL Endpoint?

Vinodh247 34,666 Reputation points MVP Volunteer Moderator

2023-12-13T09:34:28.7466667+00:00

did you had a chance to check the below? there are multiple accepted answers and I hope one of it might work for you... let me know if it didnt

https://stackoverflow.com/questions/6388756/sql-server-string-or-binary-data-would-be-truncated
Matthias Wenger 5 Reputation points

2023-12-13T12:13:02.1+00:00

yes, i checked these answers. Unfortunately, they did not help to solve the problem.

2 answers

Your answer

Vinodh247 34,666 Reputation points MVP Volunteer Moderator

2023-12-13T09:34:28.7466667+00:00

did you had a chance to check the below? there are multiple accepted answers and I hope one of it might work for you... let me know if it didnt

https://stackoverflow.com/questions/6388756/sql-server-string-or-binary-data-would-be-truncated
Matthias Wenger 5 Reputation points

2023-12-13T12:13:02.1+00:00

yes, i checked these answers. Unfortunately, they did not help to solve the problem.

Answer 1

To change the data type of a column in a Delta table for use with an SQL endpoint, particularly in the scenario you've described, you need to alter the table schema. Since you are dealing with a Delta table and using Spark, the process involves using Spark SQL to modify the table schema. Here's a step-by-step guide:

Read the Existing Table Schema: You first need to understand the current schema of your Delta table. You can do this by reading the table into a DataFrame and printing its schema.

Read the Delta table into a DataFrame

df = spark.read.format("delta").load(delta_table_path)

Print the current schema

df.printSchema()

Identify the Columns to Alter: From the schema, identify the columns whose data types need to be changed. In your case, these would be columns currently set to VARCHAR(8000) that are causing the truncation error.

Alter the Table Schema: Use Spark SQL to alter the table schema. You can change the data type of a column using the ALTER TABLE command. For example, to change a column myColumn to TEXT type, you would do:

Alter the column data type

spark.sql(f"ALTER TABLE {lake_db}.{table_name} ALTER COLUMN myColumn TYPE TEXT")

Repeat this for each column you need to change. Replace TEXT with the appropriate data type that suits your needs, like VARCHAR(255), TEXT, etc.

Test the Changes: After altering the table, it's important to test and make sure the changes are reflected and that the SQL pool queries now work without the previous error.

Handle Data Truncation Issues: If changing the data type does not resolve the truncation issue, you may need to look into the data itself. There might be instances where the actual data length exceeds the maximum limit for the data type you have set.

Data Type Considerations: When altering data types, be mindful of the implications. Changing to a type with a smaller limit may lead to data loss if existing data exceeds that limit. Conversely, increasing the limit might resolve truncation issues but could have performance implications.

Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
Matthias Wenger 5 Reputation points

2023-12-13T12:17:54.8366667+00:00

thank you. using spark sql to alter the column type did not resolve the issue. When I print the schema in spark the data type of the column in question is 'string'. Again, all operations with Spark DO work with that column.
However, when I use the SQL Pool for queries, I get the error mentioned above. I wonder why the external tables have data type VARCHAR(8000) per default for text and how to change that.
Guna Sekhar S 0 Reputation points

2024-10-01T10:17:08.71+00:00

@Matthias Wenger I am also encountering the same issue in Synapse Lake DB. where a column having more than 8000 chars len is giving error while reading through SQL End point. May I know if you got any solution for this problem?

Answer 2

@Matthias Wenger Thanks for using this forum posting your question.

You can overcome this issue by using With to set the type and length of the field that is causing the problem while using SQL pool queries.

For example as below:

SELECT * FROM
OPENROWSET( BULK N'https://myaccount.dfs.core.windows.net/mycontainer/mysubfolder/data.parquet', FORMAT = 'PARQUET') 
WITH (
      C1 int, 
      C2 varchar(20),
      C3 varchar(max)
) as rows

You may also try setting the warning off if you just want to look into the data without defining the data types

SET ANSI_WARNINGS OFF 
SELECT TOP 100 *FROM
OPENROWSET(BULK '',FORMAT = 'DELTA') AS [result]
SET ANSI_WARNINGS ON

Hope this info helps.

Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

Share via

How to change the data type of the SQL Endpoint of a delta table?

2 answers

Read the Delta table into a DataFrame

Print the current schema

Alter the column data type

Your answer