Hi Duv,
Thank you for your questions,
From what you described, this delay seems to be related to the availability of the underlay infrastructure as workloads that require GPU resources usually take longer to run,
As the K80 GPUs are being retired soon, this might be aggravating the issue.
https://learn.microsoft.com/en-us/azure/container-instances/container-instances-resource-and-quota-limits#gpu-resources-preview
If this is the issue, and your workloads are pending while infrastructure is being provisioned there are no other logs (besides the ones you mentions) that you can gather.
If you want to follow up on this, I encourage you to open a Support Request for a more detailed investigation on one of those deployments.
As an alternative, like the documentation mentions, you can use AKS to provision these jobs. I understand you only need them in short bursts, so consider having a nodepool with GPUs that can autoscale with the requests/necessity of your workloads, to optimize your infrastructure costs.