databricks - save to .parquet

arkiboys 9,686 Reputation points
2022-03-01T07:30:48.047+00:00

Hello,
using pyspark, I run a select and then save to .parquet.
The problem is that it saves .parquet as well as othe rfiles such as _commited and _success, etc.
Question:
How can I change the pyspark to only save .parquet and have no other files?
Thanks

df = spark.sql('select * from viewName limit 100')
df.write.parquet('dbfs:/mnt/temp/foldername', mode='overwrite')

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,697 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,080 questions
{count} votes

Accepted answer
  1. Costa , Ana 76 Reputation points
    2022-03-31T13:00:41.8+00:00

    I have the same problem.
    For anybody looking for a quick fix meanwhile , I use this after creating a file with Python:


    nameFile = [x.name for x in dbutils.fs.ls(f"{path}{fileName}.parquet") if x.name.split('.')[-1] == 'parquet'][0]
    dbutils.fs.cp(f"{path}{fileName}.parquet/{nameFile}",f"{path}{fileName}.parquet")
    dbutils.fs.rm(f"{path}{fileName}.parquet",recurse = True)

    0 comments No comments

0 additional answers

Sort by: Most helpful