create unique id in pyspark

Question

Hi Expert,
how we can create unique key in table creatoin in databricks pysparrk
like 1,2,3, auto integration column in databricks
id,Name
1 test,
2 test2
3 test3
Regards

Answer

Hi @Shambhu Rai ,

Thanks for posting question in Microsoft Q&A forum and for using Azure Services.

As I understand your question, you want to create an id column as key with incremental numbers during table creation in databricks.

To accomplish this, we can use Generate Always As Identity while table creation:

CREATE OR REPLACE TABLE demo (  
  id BIGINT GENERATED ALWAYS AS IDENTITY,  
  product_type STRING,  
  sales BIGINT  
);

If the table already exists and we want to add surrogate key column, then we can make use of sql function monotonically_increasing_id or could use analytical function row_number as shown below:

from pyspark.sql.functions import monotonically_increasing_id  
  
df1 = df.withColumn( "ID", monotonically_increasing_id())  
  
display(df1)

df.createOrReplaceTempView('v_view')  
df = spark.sql("""  
SELECT   
    row_number() OVER (  
        PARTITION BY ''   
        ORDER BY ''   
    ) as id,  
    *  
FROM   
    v_view  
""")  
display(df)

Hope this will help. Please let us know if any further queries.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you.
Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification

create unique id in pyspark

1 answer