IBM Datastage to Azure Databricks Migration

Sourav 100 Reputation points
2024-03-05T09:45:52.6166667+00:00

Hello Team,

We would like to move from on-prem IBM datastage to Azure Databricks for ETL activities.

  1. How can we migrate the existing IBM datastage scripts to Azure Databricks ? I understand that the existing scripts are linux or shell scripts. Can we convert them to run in Azure databricks.
  2. What are the consideration or checklist we need to migration to Azure Databricks.
  3. Is this something that be done or was done by another in your history ?

Any inputs and suggestions on this will be highly appreciated, thanks!

Regards,

Sourav

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,162 questions
{count} votes

2 answers

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,441 Reputation points Microsoft Employee
    2024-03-06T09:01:36.9766667+00:00

    Hi Sourav,

    Thank you for posting query in Microsoft Q&A Platform.

    Unfortunately, there’s no direct connector or tool to automatically convert DataStage jobs to Databricks notebooks. You’ll need to manually rewrite the logic in Python (PySpark) or Scala (Spark) based on your existing DataStage scripts.

    Please let me know if any further queries.


    Please consider hitting Accept Answer button. Accepted answers help community as well.

    1 person found this answer helpful.

  2. Amira Bedhiafi 23,171 Reputation points
    2024-03-05T12:40:46.4366667+00:00

    DataStage scripts, if based on Linux or shell scripting, may require conversion or rewriting to be compatible with Databricks' supported languages (Scala, Python, SQL, R).

    You may need to use Databricks notebooks to replicate the logic of your DataStage jobs. Python is a popular choice for its simplicity and readability.

    For data connectivity, you need to verify that Azure Databricks can connect to all your data sources and targets. (Verify your connectors)

    You can plan for the migration of existing datasets to cloud storage solutions that are accessible by Azure Databricks.

    When it comes t rebuilding the ETL workflows, you need to think about recreating the data transformation logic using Databricks notebooks. You can take advantage of Spark's distributed processing features for better performance.

    What you should consider ?

    • Compatibility of Data Formats
    • Scalability and Performance
    • Security and Compliance
    • Cost Management
    • Skills and Training
    • Monitoring and Maintenance

    Has this be done before ?

    Yes, many organizations have migrated their on-premises ETL workflows to cloud-based solutions like Azure Databricks. While each migration is unique due to specific business requirements and technical complexities, leveraging Azure Databricks for ETL tasks has become increasingly common due to its scalability, performance, and flexibility. Engaging with a partner experienced in such migrations or consulting with Microsoft Azure's support team can provide tailored advice and best practices.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.