2,542 questions with Azure Databricks tags

Sort by: Updated
1 answer

How to Fix Null Job IDs and Null Metadata from Databricks Usage Dashboard

In Databricks when viewing the Usage Dashboard, there is a job there with 'null' job ID. Also, when looking into the system.billing.usage table there seems to be null metadata. We'd like to be able to break down further the jobs and usage to see how…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-17T14:07:08.44+00:00
Ivan Canseco 0 Reputation points
commented 2025-06-23T01:29:27.5466667+00:00
J N S S Kasyap 4,085 Reputation points Microsoft External Staff Moderator
1 answer One of the answers was accepted by the question author.

The VM size you are specifying is not available. [details] QuotaExceeded: Operation could not be completed as it results in exceeding approved Total Regional Cores quota. Additional details - Deployment Model: Resource Manager, Location: CentralIndia, Cur

I'm trying to create cluster in Azure databricks, I check different regions and with different memory and core sizes but all i see is SKU errors or (The VM size you are specifying is not available. [details] QuotaExceeded: Operation could not be…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-19T19:08:59.48+00:00
Rathnasree Kalkura 20 Reputation points
accepted 2025-06-20T12:59:58.6933333+00:00
Rathnasree Kalkura 20 Reputation points
1 answer

Guidance on Partition-Level Retry Strategy for Catch-Up CDC Ingestion and Reconciliation

In our ongoing healthcare data migration project, we are ingesting data from IBM DB2 (via IBM InfoSphere CDC) into Kafka (on GCP), and then further processing it through Databricks into Azure SQL Hyperscale. Here’s the specific situation for which we…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-17T11:41:34.7166667+00:00
Janice Chi 320 Reputation points
commented 2025-06-20T08:47:50+00:00
J N S S Kasyap 4,085 Reputation points Microsoft External Staff Moderator
1 answer

My Databricks workspace (adb-3634739564378269.9.azuredatabricks.net) is not loading after login.

My Databricks workspace (adb-3634739564378269.9.azuredatabricks.net) is not loading after login. Login succeeds and redirects to the workspace, but it shows a blank screen or freezes. Error from browser DevTools: ChunkLoadError: Loading chunk 46442…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-17T21:18:26.5566667+00:00
Dimpy Rathee 0 Reputation points
edited an answer 2025-06-20T07:53:26.74+00:00
Krupal Bandari 770 Reputation points Microsoft External Staff Moderator
1 answer

Databricks automate

Hello , I have scheduled data bricks automated mail for data quality checks for 15-20 datasets which send DQ report at certain scheduled time. Since all these mails goes to business in separate mailers, it wants all of them as a single mail. How to do…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-16T14:44:54.45+00:00
Samy A 0 Reputation points
commented 2025-06-19T18:24:35.1766667+00:00
Chandra Boorla 14,680 Reputation points Microsoft External Staff Moderator
1 answer

ADF vs Databricks for Load in ETL into Hyperscale

We are working on a highly sensitive healthcare data migration project involving: Source: IBM DB2 (on-prem) with partitioned tables (up to 18 TB in size). CDC: IBM InfoSphere CDC → Kafka Topics (on GCP). Target: Azure SQL Hyperscale. There are two…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-18T07:28:38.9333333+00:00
Janice Chi 320 Reputation points
commented 2025-06-19T10:05:02.3433333+00:00
Smaran Thoomu 25,005 Reputation points Microsoft External Staff Moderator
2 answers

When to Use MERGE INTO vs APPLY CHANGES INTO in Databricks CDC Pipelines

Background: In our CDC pipeline, we use Databricks to process Kafka CDC data (I/U/D events) into Delta tables. We’re evaluating whether to continue using MERGE INTO or shift to APPLY CHANGES INTO. ❓ Questions for Microsoft: When should we prefer APPLY…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-14T12:13:23.58+00:00
Janice Chi 320 Reputation points
commented 2025-06-19T07:58:44.44+00:00
Shraddha Pore 525 Reputation points Microsoft External Staff Moderator
1 answer

CDC Merge Hyperscale Options

In our current project, we have already completed a historical load of ~80 TB into Azure SQL Hyperscale, and the table content in Hyperscale is in sync with our "branch" Delta Lake table in Databricks. For catch-up CDC ingestion, incremental…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-17T16:08:22.0066667+00:00
Janice Chi 320 Reputation points
answered 2025-06-17T17:04:52.92+00:00
Chandra Boorla 14,680 Reputation points Microsoft External Staff Moderator
1 answer

Kafka CDC merge ordering

In our CatchUp architecture, we are consuming CDC data from IBM InfoSphere CDC (FirstWare) into Kafka topics. For a given primary key, it is possible that we get a sequence of operations like Insert followed by one or more Updates. These events are…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-17T12:11:00.79+00:00
Janice Chi 320 Reputation points
answered 2025-06-17T13:37:37.71+00:00
J N S S Kasyap 4,085 Reputation points Microsoft External Staff Moderator
1 answer

Best Practices for Reconciling Kafka CDC Operations (I/U/D) in Azure Databricks During Catch-Up

Background: In our healthcare-sensitive project, we are performing large-scale historical migration from IBM DB2 to Azure. The catch-up phase handles all CDC changes (Insert/Update/Delete) that occurred after the historical snapshot. These changes are…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-17T11:58:10.3033333+00:00
Janice Chi 320 Reputation points
answered 2025-06-17T12:44:45.3066667+00:00
J N S S Kasyap 4,085 Reputation points Microsoft External Staff Moderator
1 answer

Auto loader in detail

Hello, my task is to provide costing of auto loader, it should be close to accurate. Please advise how to do that. Thanks

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-16T11:01:28.8366667+00:00
Samy A 0 Reputation points
commented 2025-06-17T09:29:10.7533333+00:00
Venkat Reddy Navari 3,630 Reputation points Microsoft External Staff Moderator
1 answer

Retry and Failure Handling Strategy for CDC Merge Pipeline from Kafka to Databricks and Hyperscale

In our CDC ingestion architecture, we are processing incremental changes( 3000-30,000 events/sec) , 800 topics for 800 tables from IBM DB2 using Kafka topics (via IBM InfoSphere CDC), with the following two stages: Kafka to Databricks Silver Layer: We…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-14T18:28:28.0433333+00:00
Janice Chi 320 Reputation points
commented 2025-06-17T02:08:35.77+00:00
Smaran Thoomu 25,005 Reputation points Microsoft External Staff Moderator
1 answer

Kafka Partitionings vs DB partitions

We are working on a large-scale CDC ingestion pipeline after completion of One time historicsl Migration where we have already imported 80 TB of data vi ADF to bronze layer where: Source: IBM DB2 (on-prem) CDC Tool: IBM InfoSphere CDC publishes to…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-11T12:44:08.54+00:00
Janice Chi 320 Reputation points
commented 2025-06-16T15:07:39.7266667+00:00
J N S S Kasyap 4,085 Reputation points Microsoft External Staff Moderator
1 answer

Guidance on Connecting Azure Databricks to External Kafka Cluster (GCP-Hosted) for Structured Streaming Ingestion

We are implementing a real-time ingestion pipeline where Azure Databricks (in our tenant) consumes CDC data directly from a Kafka cluster hosted on GCP (external to Azure). The Kafka topics are populated by IBM InfoSphere CDC and are available in…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-10T15:58:44.08+00:00
Janice Chi 320 Reputation points
commented 2025-06-16T10:47:12.4566667+00:00
Shraddha Pore 525 Reputation points Microsoft External Staff Moderator
1 answer

Best Practices for Handling Kafka Load Spikes in Structured Streaming Without Autoscaling

❓Question for Microsoft/Databricks Team: We are working on a stateful real-time CDC ingestion pipeline using Azure Databricks Structured Streaming, where: Kafka (CDC topics from on-prem DB2 via IBM CDC) is our source. Azure Databricks reads these topics…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-15T11:45:19.4633333+00:00
Janice Chi 320 Reputation points
answered 2025-06-16T01:30:37.5533333+00:00
Smaran Thoomu 25,005 Reputation points Microsoft External Staff Moderator
1 answer

mpact of Kafka Partition Size on Databricks Streaming Performance When Writing to Azure SQL Hyperscale

n our project, we are using Databricks (not ADF) for both catch-up and real-time CDC ingestion from Kafka topics and writing the output directly to Azure SQL Hyperscale via JDBC. Some of our source Kafka topics (originating from DB2 CDC) may have large…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-10T16:08:12.71+00:00
Janice Chi 320 Reputation points
commented 2025-06-16T01:23:08.9133333+00:00
Smaran Thoomu 25,005 Reputation points Microsoft External Staff Moderator
1 answer

Hash calculation strategy for datatypes mismatch

In our current project, we are migrating data from an on-premises IBM DB2 system to Azure SQL Hyperscale, using Azure Databricks for transformation and reconciliation. This includes both batch and CDC-based pipelines. Our project requirement is not just…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-14T13:08:58.0566667+00:00
Janice Chi 320 Reputation points
answered 2025-06-16T00:57:47.9833333+00:00
Smaran Thoomu 25,005 Reputation points Microsoft External Staff Moderator
2 answers

Unable to create or bring up the cluster on azure databricks. - Failed to perform resource identity operation

Hi, We have set up an Azure Databricks service along supporting services and it was working fine until the below changes were performed on Azure subscription. Details of changes: --> subscriptions was moved to a different directory Post this change,…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-06T07:12:48.59+00:00
Sandeep Jidagi 0 Reputation points
commented 2025-06-13T16:04:15.3+00:00
Shraddha Pore 525 Reputation points Microsoft External Staff Moderator
3 answers

I can't get databricks to talk to my storage account. Error 403

I can't get the data bricks to mount my data lake storage. I get error 403 no matter what I do.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-09T17:46:54.1166667+00:00
Dev, Roger (RIS-HBE) 0 Reputation points
commented 2025-06-13T10:35:54.21+00:00
Pritam Kabiraj 315 Reputation points Microsoft External Staff Moderator
1 answer

I am not able to create cluster. I am trying to create single node cluster. However , when I am trying to select node type all seems disabled. Also I am getting a message that "cluster cannot be created because no node is enabled for this subscription".

I am trying to create a single node cluster using my free trial account . I have selected west india as my region while creating resource group. When I am trying to create "all purpose cluster" and then I am trying to select node type, all VM…

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,542 questions
asked 2025-06-11T04:42:29.6+00:00
bikash hota 0 Reputation points
edited an answer 2025-06-13T01:27:04.44+00:00
Krupal Bandari 770 Reputation points Microsoft External Staff Moderator