I have a requirement to export the data from a SQL Server table to Hadoop. The export will be done daily on an incremental basis and whatever data was pulled over to hadoop needs to be deleted from SQL server table.
I am trying to SSIS to accomplish this requirement using the below 2 options:
1.Export the data to an excel file on a local drive and then use the 'SSIS Hadoop File System task' to copy over the file to Hadoop. 2. Write the sql server data directly to hadoop using 'HDFS File Destination'.
I have no luck using both of these tasks.
With SSIS Hadoop File System task, I simply get a 'File failed to copy' error. With HDFS File Destination task, I get '400 Bad Request error'.
The Hadoop connection manager Test connection works fine.
The SSIS server is a Windows server and Hadoop is on Unix/Linux.
Could this be because of the OS issue?
Can anyone help me how to fulfill this requirement please?
The user tried to use 'Sqoop' to pull the data from SQL Server. But he says its very slow to pull the data. Daily we are looking at around 40 million rows to be exported to hadoop. Also the data that was exported needs to be deleted from the SQL Server table.