2,540 questions with Azure Databricks tags

Sort by: Updated
0 answers

Efficient Data Migration of Databricks from One Region to Another

How can data bricks be migrated from US East 2 to US Central efficiently? The migration includes moving the existing dev and prod setup, which consists of workspace jobs, notebooks, scripts, catalog tables, volumes, and Azure Data Factory (ADF)…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-07-10T08:27:33.7066667+00:00
Deepa M 0 Reputation points
1 answer

Looking for insights on enabling Databricks Automatic Provisioning

We currently have a SCIM provisioning connector set up to synchronize identities from Entra ID to Unity Catalog. We’re now considering enabling Databricks Automatic Provisioning but want to fully understand the potential impact on our environment before…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-07-09T16:59:30.3433333+00:00
Rameez Ali 61 Reputation points
edited a comment 2025-07-09T23:01:02.7333333+00:00
Smaran Thoomu 24,750 Reputation points Microsoft External Staff Moderator
2 answers

How to disable serverless culster in Azure Databricks?

How to disable serverless culster in Azure Databricks? Mail to Suport team asking paid subscription

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-07-06T05:16:08.8466667+00:00
Akash Nidavani 0 Reputation points
edited an answer 2025-07-09T21:37:48.3466667+00:00
PRADEEPCHEEKATLA 90,661 Reputation points Moderator
2 answers One of the answers was accepted by the question author.

How to set up databricks-bundle in Azure Databricks UI

Hello Mr./Ms., I'm a new bee with Databircks and databricks-bundle for CICD. I have 3 questions:. By chance, I saw this picture: I tried to look for documents or videos related to Databricks Bundles in the UI (like in the image above), including how…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-07-07T14:35:28.33+00:00
Viet Tran 70 Reputation points
answered 2025-07-09T20:49:08.3133333+00:00
PRADEEPCHEEKATLA 90,661 Reputation points Moderator
1 answer One of the answers was accepted by the question author.

EIGHT HUNDRED KAFKA TOPICS PROCESSING BY DBR

We are working on a large-scale Change Data Capture (CDC) implementation where: The source system is IBM DB2. IBM InfoSphere CDC pushes changes to Kafka, with each Kafka topic representing one DB2 table. There are 800 Kafka topics in total, please note…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-06-11T07:06:44.0833333+00:00
Janice Chi 160 Reputation points
accepted 2025-07-09T17:06:46.4166667+00:00
Janice Chi 160 Reputation points
1 answer One of the answers was accepted by the question author.

CDC pipeline schema handling

We completed historical migration from DB2 to Azure SQL Hyperscale and ADLS Gen2 (Delta format, partitioned). Now building Catch-Up CDC pipelines using Kafka (via IBM CDC), ADF (orchestration), and Databricks (Delta processing). CDC data is merged with…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-06-12T13:24:09.4333333+00:00
Janice Chi 160 Reputation points
accepted 2025-07-09T17:06:35.2733333+00:00
Janice Chi 160 Reputation points
1 answer One of the answers was accepted by the question author.

Essential Data Cleaning steps to Succeed Recon

In our project, we are migrating 80TB of production-grade data from DB2 to Azure SQL Hyperscale using ADF and Databricks. While the primary transformation is data type conversion, what essential data cleaning steps should be performed to ensure…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-06-12T14:17:19.49+00:00
Janice Chi 160 Reputation points
accepted 2025-07-09T17:06:26.1033333+00:00
Janice Chi 160 Reputation points
1 answer One of the answers was accepted by the question author.

Integrity checks via Databricks

In our pipeline, the source data is coming from DB2 production systems and is assumed to be highly reliable. During transformation in Databricks, we are already performing column-level checks (e.g., presence of mandatory fields, null validation,…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-06-13T11:26:04.89+00:00
Janice Chi 160 Reputation points
accepted 2025-07-09T17:06:12.57+00:00
Janice Chi 160 Reputation points
1 answer One of the answers was accepted by the question author.

Best Practice for Schema Enforcement and Type Casting in Databricks Ingestion Pipeline

In our data pipeline, I'm considering importing source data into Databricks in raw string format (regardless of original data types), performing column-level validation, and only then applying explicit type casting to desired data types before writing to…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-06-13T13:11:47.9+00:00
Janice Chi 160 Reputation points
accepted 2025-07-09T17:06:03.36+00:00
Janice Chi 160 Reputation points
1 answer One of the answers was accepted by the question author.

Handling DATETIME2 compatibility issue in Databricks during Hyperscale type alignment

In our project, we are transforming data in Azure Databricks coming from source systems (DB2 via CDC or snapshots) and storing it temporarily in Delta Lake. We later load this data into Azure SQL Hyperscale. To align with Hyperscale’s expected schema, we…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-06-13T13:19:48.26+00:00
Janice Chi 160 Reputation points
accepted 2025-07-09T17:05:46.8966667+00:00
Janice Chi 160 Reputation points
1 answer One of the answers was accepted by the question author.

Guidance on Real-Time Streaming with Reconciliation and Domain Cutover Handling – Kafka → Databricks → Azure SQL Hyperscale

We are currently designing a real-time ingestion and reconciliation architecture using Databricks Structured Streaming that reads IBM DB2 CDC messages via Kafka and writes them to Azure SQL Hyperscale. The context is a healthcare data migration…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-06-27T05:41:37.3766667+00:00
Janice Chi 160 Reputation points
accepted 2025-07-09T14:33:37.09+00:00
Janice Chi 160 Reputation points
1 answer

Hash Storage and Reconciliation Scope in Azure SQL Hyperscale for Initial Load

In our one-time migration architecture (DB2 → ADLS → Hyperscale), we plan to perform row-level SHA hash reconciliation using Databricks. For this: Do you recommend storing the hash values in a separate audit table in Azure SQL Hyperscale instead of…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-07-09T07:56:36.1166667+00:00
Janice Chi 160 Reputation points
commented 2025-07-09T14:14:31.9466667+00:00
Smaran Thoomu 24,750 Reputation points Microsoft External Staff Moderator
0 answers

Design Guidance for High-Throughput Kafka to Azure Streaming Pipeline using Databricks Structured Streaming

As part of our 80TB+ data migration project from IBM DB2 (on-prem) to Azure SQL Hyperscale, we are now entering the real-time streaming CDC phase. We have ~800 Kafka topics (from IBM CDC engine) grouped under 4 business domains. Our architecture…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-07-09T12:26:10.3033333+00:00
Janice Chi 160 Reputation points
commented 2025-07-09T14:00:10.7766667+00:00
Smaran Thoomu 24,750 Reputation points Microsoft External Staff Moderator
1 answer

Unable to create cluster. "Compute Terminated" is the error message.

Unable to create an all-purpose compute cluster. I made multiple attempts with different accounts, cluster configurations, and regions. But I am unable to create a cluster.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-07-09T08:31:04.0766667+00:00
Kapil Chogga 0 Reputation points
answered 2025-07-09T08:42:26.14+00:00
Alex Burlachenko 11,035 Reputation points
1 answer

how to connect onprem hadoop hive to azure databricks

how to connect onprem hadoop hive to azure databricks

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-07-03T19:27:41.45+00:00
Prasad Sandu 0 Reputation points
commented 2025-07-09T07:54:23.27+00:00
Smaran Thoomu 24,750 Reputation points Microsoft External Staff Moderator
1 answer

Unable to access account console

We are currently unable to access the Databricks Account Console (https://accounts.azuredatabricks.net) for our Azure Databricks deployment. Attempts to log in using our Azure AD Global Administrator account redirect us to the workspace instead of the…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-07-04T23:51:53.9833333+00:00
Chaithanya Chowdhary 0 Reputation points
commented 2025-07-09T07:47:12.6466667+00:00
Smaran Thoomu 24,750 Reputation points Microsoft External Staff Moderator
1 answer

How to manage storage and access configuration for ADLS Gen2 in DB2 to Azure SQL migration via Databricks and ADF

We are working on a healthcare data migration project where ~80TB of data from IBM DB2 (on-prem, snapshot via FlashCopy) is being moved to Azure SQL Hyperscale using ADF, ADLS Gen2, and Azure Databricks. The architecture follows a modular pattern for…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-07-05T12:40:04.9+00:00
Janice Chi 160 Reputation points
commented 2025-07-09T06:32:35.1966667+00:00
Smaran Thoomu 24,750 Reputation points Microsoft External Staff Moderator
1 answer One of the answers was accepted by the question author.

Azure Data Bricks Cluster

Hi, While creating Azure Databricks cluster I am getting an error that the node Standrad_DS3_vs is not available in the Region. Please help. I am not able to see any Region while I run a Azure bash command to find the Region which has Standard_DS3_v2…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-07-07T17:25:51.37+00:00
Rishi Rithi 20 Reputation points
accepted 2025-07-08T16:12:56.2566667+00:00
Rishi Rithi 20 Reputation points
2 answers One of the answers was accepted by the question author.

How to implement retry logic in Azure Databricks for a failed microbatch in Structured Streaming (without replaying from Kafka)?

We are ingesting CDC data from Kafka into Azure Databricks using Structured Streaming. In rare cases, a microbatch fails due to reconciliation logic (e.g., row count or hash mismatch). We would like to reprocess only the failed batch (e.g., write the…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-07-07T09:31:20.03+00:00
Gabriel25 545 Reputation points
accepted 2025-07-08T11:34:11.8866667+00:00
Gabriel25 545 Reputation points
2 answers One of the answers was accepted by the question author.

Best practices for secure and performant JDBC access from Azure Databricks to on-prem Hadoop Hive over VPN

We have a VNet-injected Azure Databricks workspace connected to on-prem Hadoop via VPN Gateway. We want to read from Hive tables using the JDBC driver in Databricks notebooks. While the connection works, performance is slow and we're unsure about the…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,540 questions
asked 2025-07-07T10:28:55.65+00:00
Vikranth-2626 160 Reputation points
accepted 2025-07-07T14:59:09.1233333+00:00
Vikranth-2626 160 Reputation points