Thanks for detailing your scenario clearly, you're right that many Microsoft-native tools like EventStream and ADF CDC have limitations with SQL Server hosted on self-managed VMs.
Requirement:
Real-time data ingestion from a SQL Server (with CDC enabled) hosted on a self-managed VM into a KQL database (Azure Data Explorer), with no batch or incremental loads.
What you've correctly ruled out:
EventStream – only supports cloud-hosted VMs (Azure, AWS, GCP)
ADF CDC & Data Flows – don't support self-hosted IR for these use cases
Recommended Approach:
Since native integrations are limited in your case, a custom-built streaming pipeline is the best option. Here are steps:
Option:
CDC is already enabled, good start!
Build a lightweight service/app (C#, Python, etc.) on the same VM that:
- Polls the CDC tables for new changes
- Converts the changes into a stream-friendly format (e.g., JSON)
- Sends them to Azure Event Hub
In Azure Data Explorer, set up a data connection to Event Hub to ingest the data in near real-time using a mapping table.
This approach gives you a low-latency, cloud-friendly, real-time pipeline from your on-prem SQL Server to ADX.
Critical optimizations:
Change Tracking - Use cdc.fn_cdc_get_net_changes
over get_all_changes
for deduplication.
Batching - Send events in batches (100-500ms buffer) to Event Hub.
Checkpointing - Store LSNs in Redis or local file with atomic updates.
Schema Handling - Include metadata (__op
for insert/update/delete).
I hope this information helps. Please do let us know if you have any further queries.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.
Thank you.