Generate new rows based on date series in databricks delta table

Question

Generate new rows based on date series in databricks delta table

NIKHIL KUMAR 126

Have a table which has below sample data as

Table A:
Id, start date, end date
1, 2022-10-10, 2022-10-12

I want a new column with individual date of the series as below in the output as below:

Id, start date, end date, NewDerivedCol
1, 2022-10-10, 2022-10-12, 2022-10-10
1,2022-10-10,2022-10-12, 2022-10-11
1,2022-10-10,2022-10-12, 2022-10-12

Accepted answer

0 additional answers

Your answer

Answer 1

Hi @NIKHIL KUMAR ,

Thanks for posting this question in Microsoft Q&A platform and for using Azure Services.

Regarding the above Input dataset and required Output Dataset with New Column as Date Series, we can use below code in Databricks using Pyspark SQL functions:

from pyspark.sql import functions as F  
df = df \  
  .withColumn('StartDate', F.col('StartDate').cast('date')) \  
  .withColumn('EndDate', F.col('EndDate').cast('date'))  
  
display(df)  
  
df.withColumn('NewDerivedCol', F.explode(F.expr('sequence(StartDate, EndDate, interval 1 day)')))\  
  .show()

Here we have used Sequence function which generates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions.
Supported types are byte, short, integer, long, date, timestamp.
It is used to create an array containing all dates between StartDate and EndDate.

This array can then be exploded using explode function which returns a new row for each element in the given array.

Reference Links: pyspark.sql.functions.explode.html
pyspark.sql.functions.sequence.html

Hope this will help. Please let us know if any further queries.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you.
Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification

NIKHIL KUMAR 126 Reputation points

2022-10-22T04:46:22.067+00:00

Thank you @SSingh-MSFT , this perfectly works for the dataset I have in real time.

Share via

Generate new rows based on date series in databricks delta table

0 additional answers

Your answer