Hi Hongbo Jiao (CSI Interfusion Inc),
Currently, when publishing a pipeline job to a Batch Endpoint, the instance_count and process_count_per_node parameters specified inside a step (such as a parallel step using a PromptFlow component) are not respected at runtime. Instead, Batch Endpoints run with a default instance_count of 1, and there is no supported way to override this directly when invoking the endpoint.
This behavior is by design. Unlike standalone parallel jobs or pipeline jobs executed directly, Batch Endpoints do not currently support configuring compute resource scaling (e.g., instance_count) from the job definition or via endpoint invocation.
If scaling is required for your workload (e.g., increasing the number of nodes to speed up processing), we recommend running the parallel job or pipeline job directly, outside of the Batch Endpoint context, where these resource parameters will be honored.
We understand this limitation may impact certain use cases, and we encourage you to submit feedback through Azure feedback channels for support of custom scaling in Batch Endpoints.