Unexpected Continuous Ingestion into Azure Data Explorer After Stopping Python Service

Mohsen Akhavan 831 Reputation points
2025-05-12T18:31:39.4533333+00:00

Hi,

Today I encountered a strange issue with Azure Data Explorer (ADX) and would appreciate some guidance.

Scenario:

I created a Python function that receives data and ingests it directly into a table in ADX. Everything worked fine. Then, using NSSM, I registered the Python function as a Windows service so it runs continuously in the background.

The issue started when I stopped the service to update the function. I noticed that the number of records in the ADX table was still increasing — meaning ingestion was still happening, even though the service was stopped.

🔹 Note: I use a service principal for authentication between the Python function and ADX.

My Investigations:

  1. I turned off the entire VM (where the Python function was running), and ran this query:
.show commands 
|limit 10 

Here’s a sample output:

"ClientActivityId": DM.IngestionExecutor;cb2bf413-bfcb-4815-b2ce-26faf0ca6179;1,
"CommandType": DataIngestPull,
"Text": .ingest-from-storage into Test-Table with (authorizationContext='******',format='json')<|'******',
"Database": Database-Test,
"StartedOn": 2025-05-09T21:38:54.3240997Z,
"LastUpdatedOn": 2025-05-09T21:38:54.6090388Z,
"Duration": 00:00:00.2849391,
"State": Completed,
"RootActivityId": 8ed1a503-0f5c-4974-815c-35fa33ad8d02,
"User": AAD app id=ebddf400XXXXXXXXdce7,
"FailureReason": ,
"Application": Kusto.WinSvc.DM.Svc,
"Principal": aadapp=ebddf400XXXXXXXXdce7;c790a59XXXXXXXXd039ea1,
"TotalCpu": 00:00:00.0156250,
"ResourcesUtilization": {
	"CacheStatistics": {
		"Shards": {
			"Hot": {
				"HitBytes": 0,
				"MissBytes": 0,
				"RetrieveBytes": 0
			},
			"Cold": {
				"HitBytes": 0,
				"MissBytes": 0,
				"RetrieveBytes": 0
			},
			"BypassBytes": 0
		}
	},
	"TotalCpu": "00:00:00.0156250",
	"MemoryPeak": 21268144,
	"ScannedExtentsStatistics": {
		"MinDataScannedTime": null,
		"MaxDataScannedTime": null,
		"TotalExtentsCount": 0,
		"ScannedExtentsCount": 0,
		"TotalRowsCount": 0,
		"ScannedRowsCount": 0
	}
},
"ClientRequestProperties": {
	"SecurityTokenPresent": false,
	"AuthorizationScheme": null,
	"ServiceToServiceAuthHeaderPresent": false,
	"RequestHostName": "https://XXXXXXXX.location.kusto.windows.net:443/",
	"LocalClusterName": "https://XXXXXXXX.location.kusto.windows.net/",
	"OriginClusterName": "https://XXXXXXXX.location.kusto.windows.net/",
	"Options": {
		"request_impersonation_disabled": true,
		"request_callout_disabled": true,
		"api_version": "v1",
		"version": "2024-12-12T00:00:00.0000000Z",
		"request_app_name": "Kusto.WinSvc.DM.Svc",
		"traceparent": "00-ba8a91615ba04fe493afa9e4642df64c-07424ade780759bc-01",
		"tracestate": "crid=DM.IngestionExecutor;cb2bf413-bxxxxaf0ca6179;1,raid=d554ce0d-b1fxxx8272c9f",
		"servertimeout": 6000000000,
		"servertimeoutorigin": "Gateway",
		"command_enable_reroute": true,
		"query_datascope": 1,
		"query_fanout_nodes_percent": 100,
		"query_fanout_threads_percent": 100,
		"maxmemoryconsumptionperiterator": 7515957248,
		"max_memory_consumption_per_query_per_node": 7515957248,
		"truncationmaxsize": "9223372036854775807",
		"truncationmaxrecords": "9223372036854775807"
	}
},
"WorkloadGroup": internal,
"VirtualCluster": 

Here I'm confused why I have ingestion still. The "Principal" and "User" are right in the output, which I created before. The “Text” field indicates ingestion is happening from storage, but I didn’t configure ingestion from storage in my code. I’m confused why .ingest-from-storage is being used here.
2. I dropped the table (Test-Table) and ran:

.show ingestion failures
| where FailedOn > ago(4h)
| limit 100

Sample output:

"OperationId": 3ac63100-3827-45ed-9415-e1ea481d2a52,
"Database": Database-Test,
"Table": Test-Table,
"FailedOn": 2025-05-09T20:39:01.6778256Z,
"IngestionSourcePath": https://XXXXXXXXorm01.blob.core.windows.net/20250507-ingestdata-e5c334ee145d4b4-0/Database-Test__Test-Table__66c3ebdXXXXXXXX0f04f9d__df_2403391118400_1746645752_339d65c7-bba9-4919-8d85-057ccf590238.json.gz,
"Details": Table 'Test-Table' in database 'Database-Test' could not be found.,
"FailureKind": Permanent,
"RootActivityId": 7a7a75d0-597d-4fb5-8a85-af04c067656b,
"OperationKind": DataIngestPull,
"OriginatesFromUpdatePolicy": 0,
"ErrorCode": BadRequest_TableNotExist,
"Principal": aadapp=d38bXXXXXXXXc8165;33e0192XXXXXXXXe33d,
"ShouldRetry": 0,
"User": AAD app id=d38bXXXXXXXXc8165,
"IngestionProperties": Format=Json, Mapping=[ToStringEmpty], ValidationPolicy=[Options=ValidateCsvInputConstantColumns, Implications=BestEffort, IsDetailedErrorReportingEnabled=False], CreationTime=[null], IgnoreFirstRecord=False, IgnoreLastRecordIfInvalid=False,
"NumberOfSources": 1
  • FailureReason: Table 'Test-Table' could not be found
  • IngestionSourcePath: A blob storage path that does not exist in my tenant
  • User and Principal: IDs I don’t recognize and are not in my Entra ID
  • OperationKind: DataIngestPull

I also ran this to check failed commands:

.show commands
| where StartedOn > ago(1d)
| where State == "Failed"
| limit 100

I saw similar results, including ingestion commands using .ingest-from-storage, with unknown blob paths and unfamiliar AAD app IDs.

Questions:

  1. Why is ADX still ingesting data even after I stopped and shut down the VM running the Python service?
  2. What triggers this .ingest-from-storage ingestion when I didn’t configure it?
  3. Is there any ingestion queue or buffer in Azure Data Explorer (ADX)? If so, how can I clear or manage it?
  4. How can I trace or stop any automatic ingestion tied to a service principal or storage I don’t recognise?
Azure Data Explorer
Azure Data Explorer
An Azure data analytics service for real-time analysis on large volumes of data streaming from sources including applications, websites, and internet of things devices.
576 questions
{count} votes

Accepted answer
  1. Chandra Boorla 14,585 Reputation points Microsoft External Staff Moderator
    2025-05-12T19:45:05.6766667+00:00

    @Mohsen Akhavan

    It sounds like you're encountering unexpected behavior with Azure Data Explorer (ADX) ingestion. Here’s what might be happening, Root Cause Analysis:

    Unexpected Data Ingestion:

    • The ingestion is happening via a .ingest-from-storage command, which indicates that the data is being pulled directly from Blob Storage, not your Python function.
    • This usually points to an Update Policy, Event Grid Trigger, or a Continuous Ingestion Mapping that is configured to automatically ingest data from a storage account.

    Unrecognized AAD App IDs:

    • The ingestion is logged with a service principal (AAD App ID) that you don’t recognize. This suggests that another application or process has permission to push data to your ADX table.
    • It’s also possible that there is a data connection configured at the database or cluster level.

    Why is ADX still ingesting data even after I stopped and shut down the VM running the Python service?

    The continuous ingestion into your Azure Data Explorer (ADX) table is likely due to a configured automated ingestion method that is independent of your Python service, such as:

    • Update Policies - If your table has an update policy, it can automatically trigger ingestion even if your Python service is stopped.
    • Data Connections (Event Hub, Blob Storage, IoT Hub) - These connections can continue ingesting data as long as data is arriving in the source.
    • Event Grid Subscriptions - If your ADX is subscribed to an Event Grid topic, it can trigger ingestion when new events arrive.

    Recommendation -

    Run this query to list any active Update Policies on your table:

    .show table Test-Table policy update
    

    List any configured Data Connections:

    .show ingestion blob storage
    .show ingestion eventhub
    .show ingestion iothub
    

    Note - Check for any Event Grid Subscriptions associated with your ADX cluster in the Azure Portal.

    What triggers this .ingest-from-storage ingestion when I didn’t configure it?

    The .ingest-from-storage command indicates that the ingestion is directly pulling data from a storage account. This is typically triggered by:

    • An Ingestion Data Connection: Your table might have an automatic data connection to Blob Storage, Event Hub, or IoT Hub.
    • An Update Policy: The table may have an update policy configured to pull data from storage automatically.
    • External Application or Service: Another application using your service principal credentials may be configured to trigger ingestion.

    Recommendation:

    Use this query to list any active Blob Storage Connections:

    .show ingestion blob storage
    

    Check the details of your Update Policies:

    .show table Test-Table policy update
    

    Note - If you see any unfamiliar blob paths, check the associated storage account’s access policies in the Azure Portal.

    Is there any ingestion queue or buffer in Azure Data Explorer (ADX)? If so, how can I clear or manage it?

    Yes, ADX can have an ingestion queue or buffer, especially if you are using batch ingestion or data connections. This queue temporarily holds data before it is fully ingested.

    Recommendation:

    List any active ingestion operations (including queued operations):

    .show ingestion status
    

    Clear any stuck or queued ingestion commands using:

    .cancel operation <OperationId>
    

    If your table has an update policy or a data connection, consider pausing it temporarily to stop further ingestion.

    How can I trace or stop any automatic ingestion tied to a service principal or storage I don’t recognise?

    To trace and stop automatic ingestion from an unknown service principal or storage:

    Step 1 - Identify Unknown Service Principals:

    • Use this query to list all recent ingestion commands and identify the unknown service principal (AAD App ID):
    .show commands 
    | where CommandType == "DataIngestPull" or CommandType == "DataIngestPush"
    | order by StartedOn desc
    
    • Look at the "User" and "Principal" columns.
    • Go to Azure Active Directory > App Registrations in the Azure Portal and search for the App ID.
    • If it is unauthorized, you can disable or delete the app registration.

    Step 2 - Review Active Data Connections:

    • Run this query to list all Data Connections in your ADX cluster:
    .show ingestion managed pipelines
    .show ingestion blob storage
    
    • If you see any unknown connections, delete or disable them.

    Step 3 - Investigate Unknown Storage Accounts:

    If ingestion is happening from a blob storage path you do not recognize:

    • Go to Azure Portal > Storage Accounts.
    • Review the Access Control (IAM) and Access Keys.
    • Revoke any unauthorized access.

    Step 4: Monitor for Future Ingestion:

    • Set up an alert in Azure Monitor to notify you of any unexpected ingestion activities.
    • Regularly monitor your ADX ingestion logs using:
    .show commands 
    | where CommandType == "DataIngestPull" or CommandType == "DataIngestPush"
    | order by StartedOn desc 
    | limit 50
    

    I hope this information helps. Please do let us know if you have any further queries.

    Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

    Thank you.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.