databricks - change column datatype

arkiboys 9,706 Reputation points
2023-09-05T06:32:00.6633333+00:00

hello,

in databricks, how is it possible in sql to

1- make the column to accept null?

2- change column datatype from string to date

thank you

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
0 comments No comments
{count} votes

Accepted answer
  1. Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator
    2023-09-05T19:37:51.4866667+00:00

    Hello arkiboys,

    My understanding is you want to change the Databricks delta table column datatype from String to date and make the column to accept nulls using SQL.

    Databricks SQL doesn't support the Alter table/Alter column.

    But you can use Pyspark to achieve this by reading the data in dataframe and writing it back to the delta table.

    Column datatype from string to date:

    Ex:

    I have a table employee_delta with empno string.

    spark.read.table("employee_delta").printSchema()
    
    Output:
    root
     |-- empno: String (nullable = true)
    
    
    

    To change the string to date, use the below syntax:

    df.withColumn("id", col("empno").cast("newdatatype")) # changes a column datatype.

    from pyspark.sql.functions import col
    
    spark.read.table("employee_delta") \
        .withColumn("empno", col("empno").cast("date")) \
        .write.format("delta") \
        .mode("overwrite") \
        .option("overwriteSchema", True) \
        .saveAsTable("employee_delta")
    
    
    

    Then you can check the datatype using this command:

    spark.read.table("employee_delta").printSchema()
    Output:
    empno: date (nullable = true)
    
    
    

    User's image

    To explicitly make a column to accept nulls

    If the "empno" column datatype is originally a string, and you want to allow null values in that column while keeping it as a string, you can use the nullif function to convert empty strings to null values using the below Pyspark code.

    from pyspark.sql.functions import col, expr
    
    # Read the Delta table
    delta_df = spark.read.table("employee_delta")
    
    # Modify the column to accept null values for empty strings
    delta_df = delta_df.withColumn("empno", expr("nullif(empno, '')"))
    
    # Write the DataFrame back to the Delta table
    delta_df.write.format("delta") \
        .mode("overwrite") \
        .option("overwriteSchema", True) \
        .saveAsTable("employee_delta")
    
    
    

    I hope this helps.

    Please see the below StackOverflow and Q&A thread that was discussed on this.

    https://stackoverflow.com/questions/67769181/change-the-datatype-of-a-column-in-delta-table

    https://learn.microsoft.com/en-us/answers/questions/813039/how-to-change-the-column-datatype-of-a-delta-table

    Another blog post:

    https://www.projectpro.io/recipes/change-column-type-delta-table

    If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.