You can use SSIS Script Task to process a parquet file.
Below is an example of C# code to convert a parquet file.
https://stackoverflow.com/questions/62094616/how-to-convert-parquet-file-to-csv-using-net-core
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Hi all, I need to import a parquet file with SSIS. I read several forums and it seems it is not possible. I wouldn`t need Azure-related answers since I know it is possible with ADF.
Does anyone know how to do this with SSIS? I am even interested in programmatic ways (C# or .NET) through SSIS.
Any ideas will be much appreciated. Thank you all!
You can use SSIS Script Task to process a parquet file.
Below is an example of C# code to convert a parquet file.
https://stackoverflow.com/questions/62094616/how-to-convert-parquet-file-to-csv-using-net-core
Hi anonymous user,
I did not find a good way to import parquet files in SSIS without ADF.
You may refer the link IgorGelin-0063 provided to see if it is useful.
Regards,
Zoe
If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.
Hot issues October
Thank you very much @Igor Gelin and @ZoeHui-MSFT for you kind replies. I will try to follow @Igor Gelin 's guidance and get back to this conversation with an answer. If anyone else has some insights on this matter, please continue providing answers. Thank you both again!
Hi @Igor Gelin , I went through the advised documentation and it is about converting parquet files into csv files using Cinchoo ETL library. I read Cinchoo ETL's documentation and it doesn`t seem to work with SQL, it converts json to csv or Parquet to csv. I would need a way of loading Parquet files in SQL Server tables through SSIS. I apologize if I am misunderstanding how to use Cinchoo ETL framework.
Thanks again!
The reason ADF supports Parquet is that the engine is based upon Spark, which uses Parquet as its intermediate storage format. It does so because Parquet supports partitioning and is designed for use on the HDFS file system which will distribute 256MB blocks of data to different processing nodes for parallel processing. Since these 256MB blocks represent compressed data, the underlying raw size of this data is likely to be 1-2.5GB per block.
Therefore you should ask yourself whether the raw data you hold in Parquet files is large enough to justify the Parquet format.
If the parquet files are not several multiples of 256MB in size, then it is likely that the file format is inappropriate for the volume of data. In this case, consider converting the data to a supported format before using SSIS. As a rule, SSIS can usually process 50,000-100,000 rows per second for a single non-blocking dataflow with a startup time of 2-3 seconds. So you should be able to estimate how long an SSIS package should take to process the number of rows you have per file.
Another option you have is to either write a custom SSIS source task or to purchase a 3rd party parquet file source.
You should compare SSIS with ADF, which may take between 30-60 seconds to start up and is really suited to files of 1GB+ in size, processing large parquet files in parallel.