Hi @Janice Chi
What is Microsoft’s recommended approach for dynamically estimating the size of each DB2 partition (e.g., row count or data volume) at runtime within ADF pipelines?
Since ADF’s Get Metadata activity doesn't support structured sources like IBM DB2, Microsoft recommends using SQL-based metadata queries as part of a metadata-driven ingestion framework.
- Use a Lookup or stored procedure activity in ADF Run a partition-specific SQL query to count rows at runtime:
SELECT COUNT(*) AS row_count
FROM schema.table
WHERE partition_column BETWEEN @Start AND @End;
This gives a dynamic estimate of the data volume in that specific partition.
- Write results to control table
Store partition_id, row_count, and size_category (e.g., SMALL, MEDIUM, LARGE) in a metadata/control table. This enables classification and downstream orchestration.
3.Use ADF control flow to act on size
- Use Set Variable to assign size category based on row count.
- Use Switch activity to trigger Copy activities with different DIU/parallelism settings.
Microsoft recommends using SQL queries via Lookup or Stored Proc to estimate row count per partition, then storing and acting on that metadata within ADF control flow logic.
Can we use a Lookup + Script/Stored Proc + Set Variable strategy to update the control table with the size category and then use a switch activity to trigger the appropriate copy activity?
Yes, this is a recommended pattern and is commonly used in metadata-driven pipelines.
Pattern:
Lookup/Stored Proc Activity → Run COUNT(*) or size estimation logic.
Set Variable Activity → Assign size category (SMALL, MEDIUM, LARGE) based on thresholds.
Switch Activity → Route flow to different Copy activities based on size.
This gives you dynamic, runtime branching.
Are there any built-in ADF activities or recommended patterns for classifying database partitions by size before ingestion, given that Get Metadata doesn’t work for structured sources?
There is no direct built-in ADF activity to classify SQL partitions by data size like you would with file systems. However:
Recommended approach:
- Implement partitioning logic in a control table.
- Use custom SQL logic + Lookup activity to dynamically classify partitions.
- Rely on metadata-driven pipeline design for scalability and maintainability.
Suggestions for tuning copy performance per partition size
Any suggestions for tuning copy performance per size category (e.g., SMALL=2 DIUs, MEDIUM=8, LARGE=16+)?
Partition Size | Recommended DIUs | Parallelism | Tips |
SMALL (<1M rows) | 2 | 2 | Inline copy OK |
MEDIUM (1–10M) | 8 | 4–6 | Enable staged copy |
LARGE (>10M) | 16+ | 8–16 | Use high concurrency + batch reads |
I hope this information helps. Please do let us know if you have any further queries.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.