databricks - change column datatype

Question

databricks - change column datatype

arkiboys 9,706

hello,

in databricks, how is it possible in sql to

1- make the column to accept null?

2- change column datatype from string to date

thank you

Accepted answer

0 additional answers

Your answer

Answer 1

Hello arkiboys,

My understanding is you want to change the Databricks delta table column datatype from String to date and make the column to accept nulls using SQL.

Databricks SQL doesn't support the Alter table/Alter column.

But you can use Pyspark to achieve this by reading the data in dataframe and writing it back to the delta table.

Column datatype from string to date:

Ex:

I have a table employee_delta with empno string.

spark.read.table("employee_delta").printSchema()

Output:
root
 |-- empno: String (nullable = true)

To change the string to date, use the below syntax:

df.withColumn("id", col("empno").cast("newdatatype")) # changes a column datatype.

from pyspark.sql.functions import col

spark.read.table("employee_delta") \
    .withColumn("empno", col("empno").cast("date")) \
    .write.format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", True) \
    .saveAsTable("employee_delta")

Then you can check the datatype using this command:

spark.read.table("employee_delta").printSchema()
Output:
empno: date (nullable = true)

User's image

To explicitly make a column to accept nulls

If the "empno" column datatype is originally a string, and you want to allow null values in that column while keeping it as a string, you can use the nullif function to convert empty strings to null values using the below Pyspark code.

from pyspark.sql.functions import col, expr

# Read the Delta table
delta_df = spark.read.table("employee_delta")

# Modify the column to accept null values for empty strings
delta_df = delta_df.withColumn("empno", expr("nullif(empno, '')"))

# Write the DataFrame back to the Delta table
delta_df.write.format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", True) \
    .saveAsTable("employee_delta")

I hope this helps.

Please see the below StackOverflow and Q&A thread that was discussed on this.

https://stackoverflow.com/questions/67769181/change-the-datatype-of-a-column-in-delta-table

https://learn.microsoft.com/en-us/answers/questions/813039/how-to-change-the-column-datatype-of-a-delta-table

Another blog post:

https://www.projectpro.io/recipes/change-column-type-delta-table

If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.

Share via

databricks - change column datatype

0 additional answers

Your answer