Performing union of two dataframes

Madhu 1 Reputation point
2022-11-28T05:28:11.787+00:00

I am trying to perform union operation on two dataframes , but if the column is of same data type then I can perform union but when the column in df1 is of different data type and in df2 as different data type , unable to perform , need to maintain a seperate databricks notebook to cast the columns to get the similar data type , is there any feasible solution so that we can make all the objects to run in a single notebook , Thanks

@PRADEEPCHEEKATLA-MSFT Your help is appreciated

Thanks

.NET
.NET
Microsoft Technologies based on the .NET software framework.
3,367 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,916 questions
{count} votes

2 answers

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 76,836 Reputation points Microsoft Employee
    2022-11-29T09:09:08.27+00:00

    Hello @Madhu ,

    Thanks for the question and using MS Q&A platform.

    Dataframe union() – union() method of the DataFrame is used to merge two DataFrame’s of the same structure/schema. If schemas are not the same it returns an error.

    For more details, refer to PySpark Union and UnionAll Explained

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is jhow you can be part of Q&A Volunteer Moderators
    0 comments No comments

  2. yak 1 Reputation point
    2022-11-29T09:21:54.723+00:00

    Can you include the cast operation before your union statement?

    df1.withColumn('yourcolumn', df1['yourcolumn'].cast('df2datatype')).union(df2)

    You could also write a function that maps the datatypes in one df to the other, and use it so that you won't have to hard code it like the suggestion above. Loop through the columns and update datatypes.

    0 comments No comments