IBM Datastage to Azure Databricks Migration

Question

IBM Datastage to Azure Databricks Migration

Sourav 130

Hello Team,

We would like to move from on-prem IBM datastage to Azure Databricks for ETL activities.

How can we migrate the existing IBM datastage scripts to Azure Databricks ? I understand that the existing scripts are linux or shell scripts. Can we convert them to run in Azure databricks.
What are the consideration or checklist we need to migration to Azure Databricks.
Is this something that be done or was done by another in your history ?

Any inputs and suggestions on this will be highly appreciated, thanks!

Regards,

Sourav

Sourav 130 Reputation points

2024-03-05T18:14:06.32+00:00

How do we migrate the scripts from data stage to databricks? Is there any connector or tool ? Is there any process to convert the Linux scripts to python scripts or re-writing the script is what you mean ?

2 answers

Your answer

Sourav 130 Reputation points

2024-03-05T18:14:06.32+00:00

How do we migrate the scripts from data stage to databricks? Is there any connector or tool ? Is there any process to convert the Linux scripts to python scripts or re-writing the script is what you mean ?

Answer 1

ShaikMaheer-MSFT 38,546 Microsoft Employee Moderator

Hi Sourav,

Thank you for posting query in Microsoft Q&A Platform.

Unfortunately, there’s no direct connector or tool to automatically convert DataStage jobs to Databricks notebooks. You’ll need to manually rewrite the logic in Python (PySpark) or Scala (Spark) based on your existing DataStage scripts.

Please let me know if any further queries.

Please consider hitting Accept Answer button. Accepted answers help community as well.

ShaikMaheer-MSFT 38,546 Reputation points Microsoft Employee Moderator

2024-03-27T12:49:15.7333333+00:00

Hi Sourav,
If above helpful, please consider hitting Accept Answer button. Accepted answers help community as well. Please let me know if any further queries. Thank you.
Kunal Debnath 0 Reputation points

2024-04-05T04:59:34.4166667+00:00

Thanks for the confirmation. I was also looking for this information. Searched lot but no luck.

Answer 2

DataStage scripts, if based on Linux or shell scripting, may require conversion or rewriting to be compatible with Databricks' supported languages (Scala, Python, SQL, R).

You may need to use Databricks notebooks to replicate the logic of your DataStage jobs. Python is a popular choice for its simplicity and readability.

For data connectivity, you need to verify that Azure Databricks can connect to all your data sources and targets. (Verify your connectors)

You can plan for the migration of existing datasets to cloud storage solutions that are accessible by Azure Databricks.

When it comes t rebuilding the ETL workflows, you need to think about recreating the data transformation logic using Databricks notebooks. You can take advantage of Spark's distributed processing features for better performance.

What you should consider ?

Compatibility of Data Formats
Scalability and Performance
Security and Compliance
Cost Management
Skills and Training
Monitoring and Maintenance

Has this be done before ?

Yes, many organizations have migrated their on-premises ETL workflows to cloud-based solutions like Azure Databricks. While each migration is unique due to specific business requirements and technical complexities, leveraging Azure Databricks for ETL tasks has become increasingly common due to its scalability, performance, and flexibility. Engaging with a partner experienced in such migrations or consulting with Microsoft Azure's support team can provide tailored advice and best practices.

Share via

IBM Datastage to Azure Databricks Migration

2 answers

Your answer