The simplest way to do exports from sql server to parquet local is with fastbcp
Is it really fast
This will export you table in chunks (faster for export and for reimport).
*Prefer to use the first column of a clustered index as distributekeycolumn (if the column have enought value of course)
*If your table id is an identity switch from Ntile method to RangeId
.\FastBCP.exe `
--connectiontype "mssql" `
--server "myserver,1433" `
--trusted `
--database "mydatabase" `
--query "SELECT * FROM dbo.mytable where mycond = true" `
--fileoutput "myfile.parquet" `
--directory "d:\out\{sourceschema}\{sourcetable}" `
--parallelmethod "Ntile" `
--distributekeycolumn "myid" `
--merge false `
--license "C:\MyFreeTrialLicense.lic"
if you prefer to generate one file per month swtich to Timepartition method:
.\FastBCP.exe `
--connectiontype "mssql" `
--server "myserver,1433" `
--trusted `
--database "mydatabase" `
--query "SELECT * FROM dbo.mytable where mycond = true" `
--fileoutput "myfile.parquet" `
--directory "d:\out\{sourceschema}\{sourcetable}" `
--parallelmethod "Timepartition" `
--distributekeycolumn "(mydatecolumn,year,month)" `
--merge false `
--license "C:\MyFreeTrialLicense.lic"
For small tables:
.\FastBCP.exe `
--connectiontype "mssql" `
--server "myserver,1433" `
--trusted `
--database "mydatabase" `
--sourceschema "dbo" `
--sourcetable "mysmalltable" `
--fileoutput "{sourcetable}.parquet" `
--directory "d:\out\{sourceschema}\{sourcetable}" `
--parallelmethod "None" `
--merge false `
--license "C:\MyFreeTrialLicense.lic"
Nota : work also on Linux