Share via

Bigdata analytics(Azure Databricks)

Rakesh Kumar 45 Reputation points
2023-11-29T12:50:18.8766667+00:00

I have a table

| ID| Name| LastModifiedTime|

| 1| Rakesh|2023-10-30 00:09:00|

| 2| Manoj|2023-10-29 00:34:00|

| 3| Anil|2023-10-28 00:32:00|

| 4| Puneet|2023-10-29 00:35:00|

| 5| Sanchal|2023-10-29 00:57:00|

| 6|AbhishekKumar|2023-11-29 01:09:00|

| 7| Rahul|2023-11-29 00:35:00|

| 8| Ram|2023-11-29 00:35:00|

| 9| Shyam|2023-10-29 00:35:00|

| 10| Sanjay|2023-10-29 00:35:00|

| 11| Deepak|2023-11-29 00:35:00|

| 12| Nitin|2023-11-29 00:35:00|

using databricks need to store the data in azure datalake with folders like:

2023 folder

 **10** -> 10 month data will store in this folder->table.csv should be the name

 **11** -> 11 month data will store in this folder->table.csv should be the name

  **:**

if files are already present then it should do overwrite.

Files should be in csv format

if any new year, month come it should do for all

Azure Data Lake Storage
Azure Data Lake Storage

An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.

Azure Blob Storage
Azure Blob Storage

An Azure service that stores unstructured data in the cloud as blobs.

Azure Databricks
Azure Databricks

An Apache Spark-based analytics platform optimized for Azure.


1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,631 Reputation points Microsoft Employee Moderator
    2023-11-30T16:48:23.7833333+00:00

    Hi Rakesh Kumar,

    Thank you for posting query in Microsoft Q&A Platform.

    You can consider using partitionBy to achieve this. While writing data to files use partitionBy parameter.

    Please check below video for better idea.

    partitionBy function in PySpark

    Hope this helps. Please let me know if any further queries.


    Please consider hitting Accept Answer button. Accepted answers help community as well.

    Was this answer helpful?

    1 person found this answer helpful.

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.