delta table datatype

arkiboys 9,706 Reputation points
2023-09-22T04:18:13.4633333+00:00

hello,

how is it possible to correct the column data types of delta table?

for example, when I create and populate the delta tables, the columns are of type string. now I would like to alter these column according to the type of data in columns, i.e. int, decimal, etc.

how can I do this?

notice that this is how I create the delta tables and populate at the same time.

df.write.format("delta").mode("overwrite").saveAsTable("db_name.tblname")

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 33,071 Reputation points Volunteer Moderator
    2023-09-22T15:24:22.67+00:00

    You may need to rewrite the table, so start by reading the existing table into a Spark datagrame, cast the columns to the desired data types and then write the df back to a new delta table with the desired schema (here is an example) :

    import pyspark.sql.functions as F
    df = spark.read.table("db_name.tblname")
    
    df = df.withColumn("int_column", F.col("int_column").cast("int"))
    df = df.withColumn("decimal_column", F.col("decimal_column").cast("decimal(10,2)"))
    
    df.write.format("delta").mode("overwrite").saveAsTable("db_name.new_tblname")
    

    Or, if you are familiar with Delta Lake column mapping, it can allow you to read and write data to a Delta table with a different schema than the table's actual schema :

    # Enable column mapping for the table
    spark.sql("ALTER TABLE db_name.tblname SET TBLPROPERTIES ('delta.columnMapping.enabled' = 'true')")
    
    # Create a new Spark DataFrame with the desired schema
    new_df = spark.createDataFrame([], ["int_column"])
    
    # Write the DataFrame to the Delta table using the column mapping configuration
    new_df.write.format("delta").mode("overwrite") \
        .option("delta.columnMapping.int_column", "string") \
        .saveAsTable("db_name.tblname")
    
    
    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.