Thankyou for your query on Microsoft Q&A platform .
It seems that you want to prevent the creation of a SUCCESS file, and also specify a name for the target file.
It is the default behavior of spark to create the transactional files like _success file, _committed file, and _metadata files .
You can consider using the below solutions to remove the generated transactional files and give specific name to target file:
- Use coalesce(1) function to create single partition file in a temp folder.
- Loop through all the files present in the folder and filter on the .csv files and ignore the transactional files
- Copy only the csv files to the new folder with specified file name
- Remove the temp folder with recursive set as True
Relevant resources: How to Write Dataframe as single file with specific name in PySpark
Alternatively, you can try the below solution:
we can disable the transaction logs of spark parquet write using spark.sql.sources.commitProtocolClass = org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol"
.
This will help to disable the "committed<TID>" and "started<TID>" files but still _SUCCESS, _common_metadata and _metadata files will generate.
- We can disable the
_common_metadata
and_metadata files
using"parquet.enable.summary-metadata=false"
. - We can also disable the
_SUCCESS file
using"mapreduce.fileoutputcommitter.marksuccessfuljobs=false".
Related documentation: https://community.databricks.com/t5/data-engineering/how-do-i-prevent-success-and-committed-files-in-my-write-output/td-p/28690
Hope it helps. Kindly accept the answer by clicking on Accept answer
button. Thankyou