Custom Vision model with a budget of 3 hours has been training for 5 days.

Charles Thomas 40 Reputation points
2024-04-08T21:37:02.9866667+00:00

Several models with datasets of around 1400-1500 images and a budget of 3 hours have been training for 5 days. Is it possible to find out if something is stuck or do we simply need to wait longer?

Using the standard tier.

Azure AI Custom Vision
Azure AI Custom Vision
An Azure artificial intelligence service and end-to-end platform for applying computer vision to specific domains.
249 questions
{count} votes

Accepted answer
  1. navba-MSFT 24,910 Reputation points Microsoft Employee
    2024-04-23T04:48:58.51+00:00

    @Charles Thomas @TimboBaggins Apologies for the late reply. We appreciate your patience on this.

    I have got an update from the Product Owners on this. Here is the root cause analysis and mitigation taken to fix this issue.

    Cause:

    The retry logic had a bug that, sometimes when the training job fails, the retry counter is reset so the job will be infinitely retried.

    Mitigation: 

    1. We have fixed the retry counter to catch that kind of failure and stop retrying.
    2. A hotfix has been deployed.

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    **

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.