Azure Databricks

0 answers

Efficient Data Migration of Databricks from One Region to Another

How can data bricks be migrated from US East 2 to US Central efficiently? The migration includes moving the existing dev and prod setup, which consists of workspace jobs, notebooks, scripts, catalog tables, volumes, and Azure Data Factory (ADF)…

asked

Deepa M 0

1 answer

Looking for insights on enabling Databricks Automatic Provisioning

We currently have a SCIM provisioning connector set up to synchronize identities from Entra ID to Unity Catalog. We’re now considering enabling Databricks Automatic Provisioning but want to fully understand the potential impact on our environment before…

asked

Rameez Ali 61

edited a comment

Smaran Thoomu 24,750 Microsoft External Staff Moderator

2 answers

How to disable serverless culster in Azure Databricks?

How to disable serverless culster in Azure Databricks? Mail to Suport team asking paid subscription

asked

Akash Nidavani 0

edited an answer

PRADEEPCHEEKATLA 90,661 Moderator

2 answers

How to set up databricks-bundle in Azure Databricks UI

Hello Mr./Ms., I'm a new bee with Databircks and databricks-bundle for CICD. I have 3 questions:. By chance, I saw this picture: I tried to look for documents or videos related to Databricks Bundles in the UI (like in the image above), including how…

asked

Viet Tran 70

answered

PRADEEPCHEEKATLA 90,661 Moderator

1 answer

EIGHT HUNDRED KAFKA TOPICS PROCESSING BY DBR

We are working on a large-scale Change Data Capture (CDC) implementation where: The source system is IBM DB2. IBM InfoSphere CDC pushes changes to Kafka, with each Kafka topic representing one DB2 table. There are 800 Kafka topics in total, please note…

asked

Janice Chi 160

accepted

Janice Chi 160

1 answer

CDC pipeline schema handling

We completed historical migration from DB2 to Azure SQL Hyperscale and ADLS Gen2 (Delta format, partitioned). Now building Catch-Up CDC pipelines using Kafka (via IBM CDC), ADF (orchestration), and Databricks (Delta processing). CDC data is merged with…

asked

Janice Chi 160

accepted

Janice Chi 160

1 answer

Essential Data Cleaning steps to Succeed Recon

In our project, we are migrating 80TB of production-grade data from DB2 to Azure SQL Hyperscale using ADF and Databricks. While the primary transformation is data type conversion, what essential data cleaning steps should be performed to ensure…

asked

Janice Chi 160

accepted

Janice Chi 160

1 answer

Integrity checks via Databricks

In our pipeline, the source data is coming from DB2 production systems and is assumed to be highly reliable. During transformation in Databricks, we are already performing column-level checks (e.g., presence of mandatory fields, null validation,…

asked

Janice Chi 160

accepted

Janice Chi 160

1 answer

Best Practice for Schema Enforcement and Type Casting in Databricks Ingestion Pipeline

In our data pipeline, I'm considering importing source data into Databricks in raw string format (regardless of original data types), performing column-level validation, and only then applying explicit type casting to desired data types before writing to…

asked

Janice Chi 160

accepted

Janice Chi 160

1 answer

Handling DATETIME2 compatibility issue in Databricks during Hyperscale type alignment

In our project, we are transforming data in Azure Databricks coming from source systems (DB2 via CDC or snapshots) and storing it temporarily in Delta Lake. We later load this data into Azure SQL Hyperscale. To align with Hyperscale’s expected schema, we…

asked

Janice Chi 160

accepted

Janice Chi 160

1 answer

Guidance on Real-Time Streaming with Reconciliation and Domain Cutover Handling – Kafka → Databricks → Azure SQL Hyperscale

We are currently designing a real-time ingestion and reconciliation architecture using Databricks Structured Streaming that reads IBM DB2 CDC messages via Kafka and writes them to Azure SQL Hyperscale. The context is a healthcare data migration…

asked

Janice Chi 160

accepted

Janice Chi 160

1 answer

Hash Storage and Reconciliation Scope in Azure SQL Hyperscale for Initial Load

In our one-time migration architecture (DB2 → ADLS → Hyperscale), we plan to perform row-level SHA hash reconciliation using Databricks. For this: Do you recommend storing the hash values in a separate audit table in Azure SQL Hyperscale instead of…

asked

Janice Chi 160

commented

Smaran Thoomu 24,750 Microsoft External Staff Moderator

0 answers

Design Guidance for High-Throughput Kafka to Azure Streaming Pipeline using Databricks Structured Streaming

As part of our 80TB+ data migration project from IBM DB2 (on-prem) to Azure SQL Hyperscale, we are now entering the real-time streaming CDC phase. We have ~800 Kafka topics (from IBM CDC engine) grouped under 4 business domains. Our architecture…

asked

Janice Chi 160

commented

Smaran Thoomu 24,750 Microsoft External Staff Moderator

1 answer

Unable to create cluster. "Compute Terminated" is the error message.

Unable to create an all-purpose compute cluster. I made multiple attempts with different accounts, cluster configurations, and regions. But I am unable to create a cluster.

asked

Kapil Chogga 0

answered

Alex Burlachenko 11,035

1 answer

how to connect onprem hadoop hive to azure databricks

asked

Prasad Sandu 0

commented

Smaran Thoomu 24,750 Microsoft External Staff Moderator

1 answer

Unable to access account console

We are currently unable to access the Databricks Account Console (https://accounts.azuredatabricks.net) for our Azure Databricks deployment. Attempts to log in using our Azure AD Global Administrator account redirect us to the workspace instead of the…

asked

Chaithanya Chowdhary 0

commented

Smaran Thoomu 24,750 Microsoft External Staff Moderator

1 answer

How to manage storage and access configuration for ADLS Gen2 in DB2 to Azure SQL migration via Databricks and ADF

We are working on a healthcare data migration project where ~80TB of data from IBM DB2 (on-prem, snapshot via FlashCopy) is being moved to Azure SQL Hyperscale using ADF, ADLS Gen2, and Azure Databricks. The architecture follows a modular pattern for…

asked

Janice Chi 160

commented

Smaran Thoomu 24,750 Microsoft External Staff Moderator

1 answer

Azure Data Bricks Cluster

Hi, While creating Azure Databricks cluster I am getting an error that the node Standrad_DS3_vs is not available in the Region. Please help. I am not able to see any Region while I run a Azure bash command to find the Region which has Standard_DS3_v2…

asked

Rishi Rithi 20

accepted

Rishi Rithi 20

2 answers

How to implement retry logic in Azure Databricks for a failed microbatch in Structured Streaming (without replaying from Kafka)?

We are ingesting CDC data from Kafka into Azure Databricks using Structured Streaming. In rare cases, a microbatch fails due to reconciliation logic (e.g., row count or hash mismatch). We would like to reprocess only the failed batch (e.g., write the…

asked

Gabriel25 545

accepted

Gabriel25 545

2 answers

Best practices for secure and performant JDBC access from Azure Databricks to on-prem Hadoop Hive over VPN

We have a VNet-injected Azure Databricks workspace connected to on-prem Hadoop via VPN Gateway. We want to read from Hive tables using the JDBC driver in Databricks notebooks. While the connection works, performance is slow and we're unsure about the…

asked

Vikranth-2626 160

accepted

Vikranth-2626 160

Filter

Content

2,540 questions with Azure Databricks tags

Efficient Data Migration of Databricks from One Region to Another

Looking for insights on enabling Databricks Automatic Provisioning

How to disable serverless culster in Azure Databricks?

How to set up databricks-bundle in Azure Databricks UI

EIGHT HUNDRED KAFKA TOPICS PROCESSING BY DBR

CDC pipeline schema handling

Essential Data Cleaning steps to Succeed Recon

Integrity checks via Databricks

Best Practice for Schema Enforcement and Type Casting in Databricks Ingestion Pipeline

Handling DATETIME2 compatibility issue in Databricks during Hyperscale type alignment

Guidance on Real-Time Streaming with Reconciliation and Domain Cutover Handling – Kafka → Databricks → Azure SQL Hyperscale

Hash Storage and Reconciliation Scope in Azure SQL Hyperscale for Initial Load

Design Guidance for High-Throughput Kafka to Azure Streaming Pipeline using Databricks Structured Streaming

Unable to create cluster. "Compute Terminated" is the error message.

how to connect onprem hadoop hive to azure databricks

Unable to access account console

How to manage storage and access configuration for ADLS Gen2 in DB2 to Azure SQL migration via Databricks and ADF

Azure Data Bricks Cluster

How to implement retry logic in Azure Databricks for a failed microbatch in Structured Streaming (without replaying from Kafka)?

Best practices for secure and performant JDBC access from Azure Databricks to on-prem Hadoop Hive over VPN