question

LimaLeonardo-6934 avatar image
0 Votes"
LimaLeonardo-6934 asked SaurabhSharma-msft commented

Ingest several types of CSV's with Databricks Auto Loader

I'm trying to load several types of csv files using Autoloader, it currently merge all csv that I drop into a big parquet table, what I want is to create parquet tables for each type of schema/csv_file

Current code does: What I currently have
140915-stackquestion1.png

Streaming files/ waiting a file to be dropped

spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", "csv") \
.option("delimiter", "~|~") \
.option("cloudFiles.inferColumnTypes","true") \
.option("cloudFiles.schemaLocation", pathCheckpoint) \
.load(sourcePath) \
.writeStream \
.format("delta") \
.option("mergeSchema", "true") \
.option("checkpointLocation", pathCheckpoint) \
.start(pathResult)


What I want enter
140933-stackquestion2.png


azure-databricksdotnet-ml-big-data
stackquestion1.png (32.2 KiB)
stackquestion2.png (28.2 KiB)
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @limaleonardo-6934,

Thanks for using Microsoft Q&A !!
Are you trying to load files from Azure Data Lake Gen2 ?

Thanks
Saurabh

0 Votes 0 ·

Hello Saurabh,

Yes, the files are on Azure data Lake gen 2, the files load fine, the problem is the output using the autoloader and the output format,

Regards,

Leo

note: same question posted on stackoverflow, but no answers yet https://stackoverflow.com/questions/69572265/ingest-several-types-of-csvs-with-databricks-auto-loader

0 Votes 0 ·
  • updated comment

0 Votes 0 ·

@limaleonardo-6934 I do not think this is possible through Auto Loader but I am checking internally if any way we could do that. I will get back to you.

Thanks
Saurabh

0 Votes 0 ·

0 Answers