Hi @DJ , Welcome to the Microsoft Q&A Platform! Thank you for asking your question here.
- Instrumentation
- Turn on Application Insights, and enable Storage metrics including ThrottledRequests. Collect CPU/memory, response times, dependency calls, and exceptions.
- Code fixes
- Make
TableClientor storage client static and reused. - Add bounded
SemaphoreSlim(value determined from the test above). - Add robust retry logic (exponential backoff) for transient 429s and 5xx. Use SDK retry policies not homemade ones when possible.
- Make
- If you need more throughput than a single instance can provide
- Option A — Queue fan-out: push per-partition work into a queue and let many functions instances process in parallel (no single instance overload). Good for high throughput and elastic scale.
- Option B — Durable Functions fan-out/fan-in: orchestrator calls activities for each partition, Durable manages scaling/retries. Use if you want checkpointing and easier orchestration but accept small latency overhead.
- Option C — Upgrade to Premium: more CPU/memory per instance and predictable scaling if you need heavier per-instance concurrency.
- Data model & Storage changes
- If partitions are uneven (hot partitions), redesign partitioning to distribute load. Consider multiple storage accounts if a single account is the bottleneck.
If still facing issue, please share below information in Private messages.
- Function runtime version and language (dotnet version / Functions runtime v3/v4/v4.x).
- Which Table SDK version (v2 vs v12) and whether clients are being reused.
- Average rows per partition, and average row size (KB).
- Observed App Insights/Function metrics: CPU% and memory at current runs, and any 429 or socket errors.
- Whether low latency per request is more important than throughput (latency-sensitive vs throughput batch).
- Whether moving to Premium or adding queues/Durable Functions is acceptable architecturally and cost-wise.