Ingest several types of CSV's with Databricks Auto Loader

Lima, Leonardo 1

I'm trying to load several types of csv files using Autoloader, it currently merge all csv that I drop into a big parquet table, what I want is to create parquet tables for each type of schema/csv_file

Current code does: What I currently have

Streaming files/ waiting a file to be dropped

spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", "csv") \
.option("delimiter", "~|~") \
.option("cloudFiles.inferColumnTypes","true") \
.option("cloudFiles.schemaLocation", pathCheckpoint) \
.load(sourcePath) \
.writeStream \
.format("delta") \
.option("mergeSchema", "true") \
.option("checkpointLocation", pathCheckpoint) \
.start(pathResult)

What I want enter

Saurabh Sharma 23,751 Reputation points Microsoft Employee

2021-10-16T04:44:58.127+00:00

Hi @Lima, Leonardo ,

Thanks for using Microsoft Q&A !!
Are you trying to load files from Azure Data Lake Gen2 ?

Thanks
Saurabh
Lima, Leonardo 1 Reputation point

2021-10-18T08:10:35.517+00:00
updated comment
Lima, Leonardo 1 Reputation point

2021-10-18T08:17:00.357+00:00

Hello Saurabh,

Yes, the files are on Azure data Lake gen 2, the files load fine, the problem is the output using the autoloader and the output format,

Regards,

Leo

note: same question posted on stackoverflow, but no answers yet https://stackoverflow.com/questions/69572265/ingest-several-types-of-csvs-with-databricks-auto-loader
Saurabh Sharma 23,751 Reputation points Microsoft Employee

2021-10-23T00:47:09.937+00:00

@Lima, Leonardo I do not think this is possible through Auto Loader but I am checking internally if any way we could do that. I will get back to you.

Thanks
Saurabh