question

SQLSatyam20-7049 avatar image
0 Votes"
SQLSatyam20-7049 asked Monalv-msft edited

SSIS Hadoop File System Task / HDFS File Destination issue

I have a requirement to export the data from a SQL Server table to Hadoop. The export will be done daily on an incremental basis and whatever data was pulled over to hadoop needs to be deleted from SQL server table.

I am trying to SSIS to accomplish this requirement using the below 2 options:

1.Export the data to an excel file on a local drive and then use the 'SSIS Hadoop File System task' to copy over the file to Hadoop. 2. Write the sql server data directly to hadoop using 'HDFS File Destination'.

I have no luck using both of these tasks.

With SSIS Hadoop File System task, I simply get a 'File failed to copy' error. With HDFS File Destination task, I get '400 Bad Request error'.

The Hadoop connection manager Test connection works fine.

The SSIS server is a Windows server and Hadoop is on Unix/Linux.

Could this be because of the OS issue?

Can anyone help me how to fulfill this requirement please?

The user tried to use 'Sqoop' to pull the data from SQL Server. But he says its very slow to pull the data. Daily we are looking at around 40 million rows to be exported to hadoop. Also the data that was exported needs to be deleted from the SQL Server table.

sql-server-integration-services
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

Monalv-msft avatar image
0 Votes"
Monalv-msft answered Monalv-msft edited

Hi @SQLSatyam20-7049 ,

I have a requirement to export the data from a SQL Server table to Hadoop.

Why not use OLEDB Source and HDFS File Destination directly in Data Flow Task to export the data from a SQL Server table to Hadoop ?

Also the data that was exported needs to be deleted from the SQL Server table.

After the Data Flow Task, we can drag the Execute SQL Task to delete the data in SQL table.

Best regards,
Mona


If the answer is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.



· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi, I have tried using 'HDFS File Destination' in Data FLow task. But as I said earlier, i get '400 Bad Request error' when the data flow task runs to copy the data to hdfs.
The Hadoop connection manager test connection works fine. But the data flow task fails with 400 Bad Request error.

0 Votes 0 ·

Hi @SQLSatyam20-7049 ,

Please check if you configure the Hadoop Connection Manager correctly.

Hope the following links will be helpful:
1.Hadoop Connection Manager
2.Importing and Exporting data using SSIS Hadoop components

Best regards,
Mona

0 Votes 0 ·