how to unpivot columns is Pyspark Dataframe in mutiple columns using synapse notebook

Question

how to unpivot columns is Pyspark Dataframe in mutiple columns using synapse notebook

Heta Desai 357

Hi,

I want to unpivot columns in pyspark dataframe.I have 3 group of columns and on that basis I need to unpivot those columns and generate 6 new columns.

Here is the example:

Id Key Code Label1 Label2 Label3 Rate1 Rate2 Rate3 CancelRate1 CancelRate2 CancelRate3
1 K1 c1 1 0 3 0.00 1.00 0.00 1.00 0.00 0.00

expected output:

Id Key Code LabelName LabelId RateName Rate CanceRateName CancelRate
1 K1 c1 Label1 1 Rate1 0.00 CancelRate1 1.00
1 K1 c1 Label2 0 Rate2 1.00 CancelRate2 0.00
1 K1 c1 Label3 3 Rate3 0.00 CancelRate3 0.00

Please suggest me the solution for this. Once columns get unpivoted I need to perform aggregation and again pivot columns. As dataframe size is too large I can not use pandas library

HimanshuSinha-msft 19,486 Reputation points Microsoft Employee Moderator

2022-11-17T19:31:18.22+00:00

Hello @Anonymous ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .In case if you have any resolution please do share that same with the community as it can be helpful to others . Otherwise, will respond back with the more details and we will try to help .
Thanks
Himanshu
HimanshuSinha-msft 19,486 Reputation points Microsoft Employee Moderator

2022-11-30T22:29:06.857+00:00

Hello @Anonymous ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .In case if you have any resolution please do share that same with the community as it can be helpful to others .
If you have any question relating to the current thread, please do let us know and we will try out best to help you.
In case if you have any other question on a different issue, we request you to open a new thread .
Thanks
Himanshu

1 answer

Your answer

HimanshuSinha-msft 19,486 Reputation points Microsoft Employee Moderator

2022-11-17T19:31:18.22+00:00

Hello @Anonymous ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .In case if you have any resolution please do share that same with the community as it can be helpful to others . Otherwise, will respond back with the more details and we will try to help .
Thanks
Himanshu
HimanshuSinha-msft 19,486 Reputation points Microsoft Employee Moderator

2022-11-30T22:29:06.857+00:00

Hello @Anonymous ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .In case if you have any resolution please do share that same with the community as it can be helpful to others .
If you have any question relating to the current thread, please do let us know and we will try out best to help you.
In case if you have any other question on a different issue, we request you to open a new thread .
Thanks
Himanshu

Answer 1

Hello @Anonymous ,
Thanks for the question and using MS Q&A platform.
As we understand the ask here is to Unpivot the dataframe, please do let us know if its not accurate.
As I understand that you are unpivoting some data and then Pivot that back , I am sure you understand the data better , but I request to review the logic again as Unpivoting and Pivoting the data does not make lot of sense .
UnPivot in SPark is not supported and you will ghave to use the stack function . In Stack function you can only pass one set of columns and so you will have to create three dataframe one for label,rate and CancelRate . And then Join the 3 dataframe and get the output which you want . I am sharing the code for implementation the stack function .

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import expr

Create spark session

df = [(1,"K1","c1",1,0,3,0.00,1.00,0.00,1.00,0.00,0.00)]

columns= ["Id","Key","Code","Label1","Label2","Label3","Rate1","Rate2","Rate3","CancelRate1","CancelRate2","CancelRate3"]
df = spark.createDataFrame(data = df, schema = columns)
df.printSchema()
df.show(truncate=False)

unpivotExpr1 = "stack(3, 'Label1',Label1, 'Label2',Label2, 'Label3',Label3) as (Label,Total)"
unpivotExpr2 = "stack(3, 'Rate1',Rate1,'Rate2',Rate2,'Rate3',Rate3) as (Rate,Total)"
unPivotDF = df.select("id","Key","Code", expr(unpivotExpr1))
dflabel = unPivotDF.withColumnRenamed('Total','LabelId ')
dflabel.show()

unPivotDF = df.select("id","Key","Code", expr(unpivotExpr2))
dfrate = unPivotDF.withColumnRenamed('Total','Rate ')
dfrate.show()

Please do let me if you have any queries.
Thanks
Himanshu

Please don't forget to click on or upvote button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
- If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Share via

how to unpivot columns is Pyspark Dataframe in mutiple columns using synapse notebook

1 answer

Create spark session

Your answer