Hello @Richards, Sam (DG-STL-HQ),
Welcome to the MS Q&A platform.
To convert Delta Parquet files to a single Parquet file with the latest version of Delta, you can use Apache Spark and Delta Lake.
- Load the Delta Parquet files into a Spark DataFrame
df = spark.read.format("delta").load(delta_table_path)
df.show()
- Get the latest version of the Delta table:
delta_table = DeltaTable.forPath(spark, delta_table_path)
df = delta_table.toDF()
df.show()
- Filter the DataFrame to include only the latest version:
df = df.filter("version = (SELECT max(version) from delta_table_path)")
df.show()
- Write out the DataFrame as a single Parquet file:
df.write.parquet("parquet.delta_table_path", mode="overwrite")
If you have the plain parquet files(not using delta lake format), then you can use the below Apache spark python script to convert the plain parquet files in the folder to a single delta lake format.
%%pyspark
from delta.tables import DeltaTable
deltaTable = DeltaTable.convertToDelta(spark, "parquet.delta_table_path")
Reference documents:
I hope this helps. Please let us know if you have any further questions.