Share via

Best practices for spreading ingestion load to Azure Data Explorer when using ADF Copy activity

Suzana Pratljacic 0 Reputation points Microsoft Employee
2026-03-23T10:40:04.1366667+00:00

We have an Azure Data Factory (ADF) pipeline that ingests a large daily dataset from storage into Azure Data Explorer (Kusto).

Current behavior:

  • The daily dataset becomes available once per day
  • The dataset contains approximately ~95 million rows
  • Once available, ADF ingests the entire dataset into Kusto almost immediately
  • The ingestion completes in under one minute
  • This creates a short but very large ingestion spike on the Kusto cluster
  • The spike is highly consistent and occurs at roughly the same time every day

While ingesting the full dataset once per day is expected, we would like to understand whether it is possible to reduce or smooth the ingestion load, for example:

  • Spreading ingestion over a longer window (e.g. 1–2 hours instead of ~1 minute)

Questions:

  1. Are there recommended patterns or best practices for smoothing large, bursty ingestions into Azure Data Explorer?
  2. Can Azure Data Factory be configured to intentionally slow down or batch ingestion into Kusto?
  3. Are there Kusto‑side features (ingestion batching, policies, or limits) that help handle this scenario?

Any guidance or references would be appreciated.

Azure Data Factory
Azure Data Factory

An Azure service for ingesting, preparing, and transforming data at scale.


2 answers

Sort by: Most helpful
  1. Manoj Kumar Boyini 12,245 Reputation points Microsoft External Staff Moderator
    2026-03-23T14:09:15.8866667+00:00

    Hi Suzana Pratljacic,

    What you’re observing is expected behavior when using Azure Data Factory (ADF) Copy activity with Azure Data Explorer (ADX). The integration is designed for high-throughput bulk ingestion, so when a large dataset (like your ~95M rows) becomes available, it is ingested as quickly as possible, resulting in a short but intense ingestion spike.

    Currently, there is no native mechanism in ADF or ADX to intentionally throttle or rate-limit a single ingestion job. Controls such as DIUs in ADF do not apply to ADX ingestion behavior, and ADX itself is optimized to ingest data as fast as possible, not to spread ingestion over time.

    To smooth ingestion over a longer window (for example, 1–2 hours), the recommended approach is to reshape the ingestion pattern upstream rather than trying to slow down the ingestion engine.

    The most effective pattern is to partition the dataset into smaller chunks and ingest them in a controlled sequence. Instead of a single large Copy activity, split the data into multiple smaller files or logical partitions (for example, by range, shard, or time slice), and process them using a ForEach activity in ADF. By limiting concurrency (for example, 1–2 parallel executions) and optionally introducing small delays between batches, you can distribute ingestion over the desired time window.

    If your current ingestion is based on a single large file, splitting it into multiple smaller files upstream is particularly important. ADX handles multiple smaller ingestion units more flexibly, which allows better control over how ingestion load is distributed.

    It is also important to note that ADX ingestion batching policies are designed to optimize ingestion of many small files and improve efficiency, but they do not provide a way to control or smooth ingestion rate for large batch loads.

    Azure Data Explorer is designed to handle bursty ingestion patterns, and smoothing ingestion requires controlling how data is fed into the system (partitioning and orchestration) rather than relying on service-side throttling.

    References:
    https://learn.microsoft.com/en-us/azure/data-explorer/data-factory-integration
    https://learn.microsoft.com/en-us/azure/data-explorer/ingest-data-overview

    Hope this helps, Please let us know if you have any questions and concerns.

    0 comments No comments

  2. Q&A Assist
    2026-03-23T10:40:29.8666667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    Azure Data Explorer and Azure Data Factory already support queued, high‑throughput ingestion, and the described behavior (large batch finishing quickly) is expected. The available guidance in the context focuses on choosing the right ingestion path and tuning performance rather than explicitly throttling it, but it does indicate where control points exist.

    1. Patterns for smoothing large, bursty ingestions

    From the ingestion overview, Azure Data Explorer supports both one‑time and continuous ingestion, and multiple ingestion methods with different characteristics: ADF, Event Grid, Event Hub, IoT Hub, Spark connector, and the Get data experience.

    Key points from the comparison and ADF integration guidance:

    • ADF Copy → ADX is a queued ingestion path with built‑in load balancing, retries, and error handling, and is designed for high performance bulk loads.
    • The estimated throughput for Azure Data Explorer as an ADF sink is 11–13 MBps per pipeline, and ADX itself can ingest up to ~1 GBps when used with ADF in optimal conditions.
    • Performance is influenced by:
      • Components’ geographical proximity (same region for ADF IR, source, and ADX cluster)
      • Number of DIUs in ADF
      • Cluster size/SKU in ADX
      • Parallelism and partitioning in ADF

    These knobs are primarily about maximizing throughput, but they also represent the levers that can be used to reduce throughput (and thus smooth the spike) by doing the opposite of the performance‑tuning recommendations, for example:

    • Reducing parallelism (fewer concurrent copies/partitions).
    • Reducing DIUs so that fewer ADF VMs work in parallel.
    • Running multiple smaller copy activities in sequence instead of one large, highly parallel copy.
    1. Slowing down or batching ingestion from ADF

    The context does not describe an explicit “throttle” setting in ADF for Azure Data Explorer, but it does describe how ADF Copy to ADX works and what affects its speed:

    • ADF gets data from the source, converts it to tabular format, uploads to Azure Blobs, splits into chunks, then downloads those blobs to ingest into ADX.
    • Ingestions are queued and managed, ensuring high availability and handling small‑size ingestions.

    To intentionally slow or batch ingestion using ADF, the following patterns are implied by the performance guidance:

    • Control parallelism at the pipeline level:
      • Instead of one Copy activity over the entire dataset, partition the source (e.g., by file, date, or ID range) and use a ForEach loop with a limited degree of parallelism or even sequential execution.
      • This reduces the number of concurrent ingestion operations and stretches the total ingestion time.
    • Adjust DIUs:
      • Fewer DIUs mean fewer ADF compute resources, which lowers throughput and extends ingestion duration.
    • Use multiple scheduled runs:
      • If the daily data can be logically split (for example, multiple files), schedule several ADF runs over a 1–2 hour window, each ingesting a subset of the data.
    1. Kusto‑side features for handling the scenario

    The ingestion overview and ADF integration docs highlight that Azure Data Explorer:

    • Performs ingestion via queued ingestion when using ADF Copy, with internal batching, indexing, encoding, and compression.
    • Enforces a 64 MB limit per record (single string or entire record), but does not expose a user‑configurable throttle for overall ingestion rate in the provided context.

    Relevant points:

    • When ADF is the sink, ADX is designed to handle high‑throughput ingestion, and the recommended tuning is around cluster size/SKU and co‑location of components to achieve desired performance.
    • For very large historical migrations, the context suggests using LightIngest or other managed pipelines (Event Grid, Event Hub, ADF) and scaling the ADX cluster up during migration, then scaling down afterward. This pattern can also be applied to daily spikes: scale the cluster appropriately for the expected ingestion window.

    Within the provided material there is no explicit ADX‑side setting to “slow down” ingestion from ADF; instead, the recommended approach is to:

    • Use ADF configuration (parallelism, DIUs, partitioning) to shape the ingestion profile.
    • Use cluster sizing and standard ADX queued ingestion behavior to absorb the load safely.

    Summary of practical options based on the context

    • To smooth the spike over 1–2 hours:
      • Partition the daily dataset and orchestrate multiple smaller ADF Copy activities with controlled parallelism.
      • Reduce DIUs and avoid aggressive partitioning that maximizes throughput.
      • Ensure all components are in the same region and size the ADX cluster appropriately for the chosen ingestion window.
    • There is no documented ADX‑side “throttle” in the context; smoothing is primarily achieved via ADF pipeline design and cluster sizing, while ADX’s queued ingestion handles reliability and batching internally.

    References:

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.