Azure Databricks

1 answer

How to Fix Null Job IDs and Null Metadata from Databricks Usage Dashboard

In Databricks when viewing the Usage Dashboard, there is a job there with 'null' job ID. Also, when looking into the system.billing.usage table there seems to be null metadata. We'd like to be able to break down further the jobs and usage to see how…

asked

Ivan Canseco 0

commented

J N S S Kasyap 4,085 Microsoft External Staff Moderator

1 answer

The VM size you are specifying is not available. [details] QuotaExceeded: Operation could not be completed as it results in exceeding approved Total Regional Cores quota. Additional details - Deployment Model: Resource Manager, Location: CentralIndia, Cur

I'm trying to create cluster in Azure databricks, I check different regions and with different memory and core sizes but all i see is SKU errors or (The VM size you are specifying is not available. [details] QuotaExceeded: Operation could not be…

asked

Rathnasree Kalkura 20

accepted

Rathnasree Kalkura 20

1 answer

Guidance on Partition-Level Retry Strategy for Catch-Up CDC Ingestion and Reconciliation

In our ongoing healthcare data migration project, we are ingesting data from IBM DB2 (via IBM InfoSphere CDC) into Kafka (on GCP), and then further processing it through Databricks into Azure SQL Hyperscale. Here’s the specific situation for which we…

asked

Janice Chi 320

commented

J N S S Kasyap 4,085 Microsoft External Staff Moderator

1 answer

My Databricks workspace (adb-3634739564378269.9.azuredatabricks.net) is not loading after login.

My Databricks workspace (adb-3634739564378269.9.azuredatabricks.net) is not loading after login. Login succeeds and redirects to the workspace, but it shows a blank screen or freezes. Error from browser DevTools: ChunkLoadError: Loading chunk 46442…

asked

Dimpy Rathee 0

edited an answer

Krupal Bandari 770 Microsoft External Staff Moderator

1 answer

Databricks automate

Hello , I have scheduled data bricks automated mail for data quality checks for 15-20 datasets which send DQ report at certain scheduled time. Since all these mails goes to business in separate mailers, it wants all of them as a single mail. How to do…

asked

Samy A 0

commented

Chandra Boorla 14,680 Microsoft External Staff Moderator

1 answer

ADF vs Databricks for Load in ETL into Hyperscale

We are working on a highly sensitive healthcare data migration project involving: Source: IBM DB2 (on-prem) with partitioned tables (up to 18 TB in size). CDC: IBM InfoSphere CDC → Kafka Topics (on GCP). Target: Azure SQL Hyperscale. There are two…

asked

Janice Chi 320

commented

Smaran Thoomu 25,005 Microsoft External Staff Moderator

2 answers

When to Use MERGE INTO vs APPLY CHANGES INTO in Databricks CDC Pipelines

Background: In our CDC pipeline, we use Databricks to process Kafka CDC data (I/U/D events) into Delta tables. We’re evaluating whether to continue using MERGE INTO or shift to APPLY CHANGES INTO. ❓ Questions for Microsoft: When should we prefer APPLY…

asked

Janice Chi 320

commented

Shraddha Pore 525 Microsoft External Staff Moderator

1 answer

CDC Merge Hyperscale Options

In our current project, we have already completed a historical load of ~80 TB into Azure SQL Hyperscale, and the table content in Hyperscale is in sync with our "branch" Delta Lake table in Databricks. For catch-up CDC ingestion, incremental…

asked

Janice Chi 320

answered

Chandra Boorla 14,680 Microsoft External Staff Moderator

1 answer

Kafka CDC merge ordering

In our CatchUp architecture, we are consuming CDC data from IBM InfoSphere CDC (FirstWare) into Kafka topics. For a given primary key, it is possible that we get a sequence of operations like Insert followed by one or more Updates. These events are…

asked

Janice Chi 320

answered

J N S S Kasyap 4,085 Microsoft External Staff Moderator

1 answer

Best Practices for Reconciling Kafka CDC Operations (I/U/D) in Azure Databricks During Catch-Up

Background: In our healthcare-sensitive project, we are performing large-scale historical migration from IBM DB2 to Azure. The catch-up phase handles all CDC changes (Insert/Update/Delete) that occurred after the historical snapshot. These changes are…

asked

Janice Chi 320

answered

J N S S Kasyap 4,085 Microsoft External Staff Moderator

1 answer

Auto loader in detail

Hello, my task is to provide costing of auto loader, it should be close to accurate. Please advise how to do that. Thanks

asked

Samy A 0

commented

Venkat Reddy Navari 3,630 Microsoft External Staff Moderator

1 answer

Retry and Failure Handling Strategy for CDC Merge Pipeline from Kafka to Databricks and Hyperscale

In our CDC ingestion architecture, we are processing incremental changes( 3000-30,000 events/sec) , 800 topics for 800 tables from IBM DB2 using Kafka topics (via IBM InfoSphere CDC), with the following two stages: Kafka to Databricks Silver Layer: We…

asked

Janice Chi 320

commented

Smaran Thoomu 25,005 Microsoft External Staff Moderator

1 answer

Kafka Partitionings vs DB partitions

We are working on a large-scale CDC ingestion pipeline after completion of One time historicsl Migration where we have already imported 80 TB of data vi ADF to bronze layer where: Source: IBM DB2 (on-prem) CDC Tool: IBM InfoSphere CDC publishes to…

asked

Janice Chi 320

commented

J N S S Kasyap 4,085 Microsoft External Staff Moderator

1 answer

Guidance on Connecting Azure Databricks to External Kafka Cluster (GCP-Hosted) for Structured Streaming Ingestion

We are implementing a real-time ingestion pipeline where Azure Databricks (in our tenant) consumes CDC data directly from a Kafka cluster hosted on GCP (external to Azure). The Kafka topics are populated by IBM InfoSphere CDC and are available in…

asked

Janice Chi 320

commented

Shraddha Pore 525 Microsoft External Staff Moderator

1 answer

Best Practices for Handling Kafka Load Spikes in Structured Streaming Without Autoscaling

❓Question for Microsoft/Databricks Team: We are working on a stateful real-time CDC ingestion pipeline using Azure Databricks Structured Streaming, where: Kafka (CDC topics from on-prem DB2 via IBM CDC) is our source. Azure Databricks reads these topics…

asked

Janice Chi 320

answered

Smaran Thoomu 25,005 Microsoft External Staff Moderator

1 answer

mpact of Kafka Partition Size on Databricks Streaming Performance When Writing to Azure SQL Hyperscale

n our project, we are using Databricks (not ADF) for both catch-up and real-time CDC ingestion from Kafka topics and writing the output directly to Azure SQL Hyperscale via JDBC. Some of our source Kafka topics (originating from DB2 CDC) may have large…

asked

Janice Chi 320

commented

Smaran Thoomu 25,005 Microsoft External Staff Moderator

1 answer

Hash calculation strategy for datatypes mismatch

In our current project, we are migrating data from an on-premises IBM DB2 system to Azure SQL Hyperscale, using Azure Databricks for transformation and reconciliation. This includes both batch and CDC-based pipelines. Our project requirement is not just…

asked

Janice Chi 320

answered

Smaran Thoomu 25,005 Microsoft External Staff Moderator

2 answers

Unable to create or bring up the cluster on azure databricks. - Failed to perform resource identity operation

Hi, We have set up an Azure Databricks service along supporting services and it was working fine until the below changes were performed on Azure subscription. Details of changes: --> subscriptions was moved to a different directory Post this change,…

asked

Sandeep Jidagi 0

commented

Shraddha Pore 525 Microsoft External Staff Moderator

3 answers

I can't get databricks to talk to my storage account. Error 403

I can't get the data bricks to mount my data lake storage. I get error 403 no matter what I do.

asked

Dev, Roger (RIS-HBE) 0

commented

Pritam Kabiraj 315 Microsoft External Staff Moderator

1 answer

I am not able to create cluster. I am trying to create single node cluster. However , when I am trying to select node type all seems disabled. Also I am getting a message that "cluster cannot be created because no node is enabled for this subscription".

I am trying to create a single node cluster using my free trial account . I have selected west india as my region while creating resource group. When I am trying to create "all purpose cluster" and then I am trying to select node type, all VM…

asked

bikash hota 0

edited an answer

Krupal Bandari 770 Microsoft External Staff Moderator

Filter

Content

2,542 questions with Azure Databricks tags

How to Fix Null Job IDs and Null Metadata from Databricks Usage Dashboard

The VM size you are specifying is not available. [details] QuotaExceeded: Operation could not be completed as it results in exceeding approved Total Regional Cores quota. Additional details - Deployment Model: Resource Manager, Location: CentralIndia, Cur

Guidance on Partition-Level Retry Strategy for Catch-Up CDC Ingestion and Reconciliation

My Databricks workspace (adb-3634739564378269.9.azuredatabricks.net) is not loading after login.

Databricks automate

ADF vs Databricks for Load in ETL into Hyperscale

When to Use MERGE INTO vs APPLY CHANGES INTO in Databricks CDC Pipelines

CDC Merge Hyperscale Options

Kafka CDC merge ordering

Best Practices for Reconciling Kafka CDC Operations (I/U/D) in Azure Databricks During Catch-Up

Auto loader in detail

Retry and Failure Handling Strategy for CDC Merge Pipeline from Kafka to Databricks and Hyperscale

Kafka Partitionings vs DB partitions

Guidance on Connecting Azure Databricks to External Kafka Cluster (GCP-Hosted) for Structured Streaming Ingestion

Best Practices for Handling Kafka Load Spikes in Structured Streaming Without Autoscaling

mpact of Kafka Partition Size on Databricks Streaming Performance When Writing to Azure SQL Hyperscale

Hash calculation strategy for datatypes mismatch

Unable to create or bring up the cluster on azure databricks. - Failed to perform resource identity operation

I can't get databricks to talk to my storage account. Error 403

I am not able to create cluster. I am trying to create single node cluster. However , when I am trying to select node type all seems disabled. Also I am getting a message that "cluster cannot be created because no node is enabled for this subscription".