LimaLeonardo-6934 avatar image
0 Votes"
LimaLeonardo-6934 asked SaurabhSharma-msft commented

Ingest several types of CSV's with Databricks Auto Loader

I'm trying to load several types of csv files using Autoloader, it currently merge all csv that I drop into a big parquet table, what I want is to create parquet tables for each type of schema/csv_file

Current code does: What I currently have

Streaming files/ waiting a file to be dropped

spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", "csv") \
.option("delimiter", "~|~") \
.option("cloudFiles.inferColumnTypes","true") \
.option("cloudFiles.schemaLocation", pathCheckpoint) \
.load(sourcePath) \
.writeStream \
.format("delta") \
.option("mergeSchema", "true") \
.option("checkpointLocation", pathCheckpoint) \

What I want enter

stackquestion1.png (32.2 KiB)
stackquestion2.png (28.2 KiB)
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @limaleonardo-6934,

Thanks for using Microsoft Q&A !!
Are you trying to load files from Azure Data Lake Gen2 ?


0 Votes 0 ·

Hello Saurabh,

Yes, the files are on Azure data Lake gen 2, the files load fine, the problem is the output using the autoloader and the output format,



note: same question posted on stackoverflow, but no answers yet

0 Votes 0 ·
  • updated comment

0 Votes 0 ·

@limaleonardo-6934 I do not think this is possible through Auto Loader but I am checking internally if any way we could do that. I will get back to you.


0 Votes 0 ·

0 Answers