Leveraging a Hadoop cluster from SQL Server Integration Services (SSIS)

SQL Server Technical Article

Published: October 2012

Authors: Benjamin Guinebertière (Microsoft France), Philippe Beraud (Microsoft France), Rémi Olivier (Microsoft France)

Technical Reviewers/Contributors: Carla Sabotta (Microsoft Corporation), Steve Howard (Microsoft Corporation), Debarchan Sarkar (Microsoft India GTSC), Jennifer Hubbard (Microsoft Corporation).

Summary: With the explosion of data, the open source Apache™ Hadoop™ Framework is gaining traction thanks to its huge ecosystem that has arisen around the core functionalities of Hadoop distributed file system (HDFS™) and Hadoop Map Reduce. As of today, being able to have SQL Server working with Hadoop™ becomes increasingly important because the two are indeed complementary. For instance, while petabytes of data can be stored unstructured in Hadoop and take hours to be queried, terabytes of data can be stored in a structured way in the SQL Server platform and queried in seconds. This leads to the need to transfer data between Hadoop and SQL Server.

This white paper explores how SQL Server Integration Services (SSIS), i.e. the SQL Server Extract, Transform and Load (ETL) tool, can be used to automate Hadoop + non Hadoop job executions, and manage data transfers between Hadoop and other sources and destinations.


To review the document, please download the Leveraging a Hadoop cluster from SQL Server Integration Services (SSIS) Word document.

Ask a question in the SQL Server Forums
Send Feedback