How to get real-time data from a SQL Server running on a Self-Hosted VM?

Agrawal, Pranjal 0 Reputation points
2025-06-03T08:25:46.22+00:00

I have a SQL server running on a VM (which is Self-hosted and not managed by any cloud). Database and table which I want to use have CDC enabled on them. I want to have those tables data into KQL DB as real-time only. No batch or incremental load.

I tried below ways already and are ruled out,

EventStream - Came to know it only supports VM hosted on Azure or AWS or GCP. (MS Document)

CDC in ADF - But Self hosted IR aren't supported over there.

Dataflow in ADF - Linked service with self-hosted integration runtime is not supported in data flow.

There must be something which I can use to have real-time on a SQL Server running on a Self-hosted VM.

I'm open to options, but real-time only.

Azure Event Hubs
Azure Event Hubs
An Azure real-time data ingestion service.
719 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator
    2025-06-03T09:40:48.6333333+00:00

    @Agrawal, Pranjal

    Thanks for detailing your scenario clearly, you're right that many Microsoft-native tools like EventStream and ADF CDC have limitations with SQL Server hosted on self-managed VMs.

    Requirement:

    Real-time data ingestion from a SQL Server (with CDC enabled) hosted on a self-managed VM into a KQL database (Azure Data Explorer), with no batch or incremental loads.

    What you've correctly ruled out:

    EventStream – only supports cloud-hosted VMs (Azure, AWS, GCP)

    ADF CDC & Data Flows – don't support self-hosted IR for these use cases

    Recommended Approach:

    Since native integrations are limited in your case, a custom-built streaming pipeline is the best option. Here are steps:

    Option: User's image CDC is already enabled, good start!

    Build a lightweight service/app (C#, Python, etc.) on the same VM that:

    • Polls the CDC tables for new changes
    • Converts the changes into a stream-friendly format (e.g., JSON)
    • Sends them to Azure Event Hub

    In Azure Data Explorer, set up a data connection to Event Hub to ingest the data in near real-time using a mapping table.

    This approach gives you a low-latency, cloud-friendly, real-time pipeline from your on-prem SQL Server to ADX.

    Critical optimizations:

    Change Tracking - Use cdc.fn_cdc_get_net_changes over get_all_changes for deduplication.
    Batching - Send events in batches (100-500ms buffer) to Event Hub.
    Checkpointing - Store LSNs in Redis or local file with atomic updates.
    Schema Handling - Include metadata (__op for insert/update/delete).

    I hope this information helps. Please do let us know if you have any further queries.

    Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

    Thank you.

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.