GPT-4o Finetuning just keep running

Quan Nguyen 20 Reputation points
2025-04-26T18:42:31.8166667+00:00

I am following this fine-tuning tutorial: https://learn.microsoft.com/en-us/azure/ai-services/openai/tutorials/fine-tune?tabs=command-line.

I checked the log and found that after is logged "Finetuning started", it just stayed like that for hours (more than 8hrs). The training data is from tutorial, so there are just 10 samples. What is happening?

I tried all those, but cannot fix the issue:

I tried to fine-tune with those info:

  • Location: north central us
  • Base model: gpt-4o-mini-2024-07-18
  • Method of Customization: Supervised
  • Dataset: Copied from tutorial
  • Task parameters: Batch size:4, Learning rate multiplier: 0.1, Number of epochs:1, Seed: 42

Here is the log:

  • Apr 26, 2025 2:34 PM: status : Training started.
  • Apr 26, 2025 2:34 PM: status : Finetuning started.
  • Apr 26, 2025 2:34 PM: status : Data Import started.
  • Apr 26, 2025 2:23 PM: status : Preprocessing completed for file training file.
  • Apr 26, 2025 2:21 PM: status : Preprocessing running for file training file.
  • Apr 26, 2025 2:21 PM: status : Training may be less effective due to low specified learning rate multiplier 0.01. Recommended learning rate multiplier is 1.
  • Apr 26, 2025 2:21 PM: status : Job enqueued. Waiting for jobs ahead to complete.
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,096 questions
0 comments No comments
{count} votes

Accepted answer
  1. Sina Salam 22,031 Reputation points Volunteer Moderator
    2025-04-26T23:02:09.62+00:00

    Hello Quan Nguyen,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that your GPT-4o Finetuning just keep running.

    There have been reports of internal errors causing fine-tuning jobs to fail - https://community.openai.com/t/the-job-failed-due-to-an-internal-error-fine-tuning-gpt4o-mini/1042181 and https://community.openai.com/t/chatgpt-4o-mini-fine-tuning-fails-internal-error/1061245). This might be the underlying issue in your case. However, regarding the information provided I will suggest the following troubleshooting steps as your first aid:

    1. Since the log warns about a low learning rate multiplier (0.01), but you explicitly set it to 0.1. This discrepancy suggests a parameter override in your code/portal configuration and/or a UI/API bug misreporting the value. What to do:
      • Double-check your API request body or portal settings for typos (such as, learning_rate_multiplier=0.01 instead of 0.1).
      • If using Python, ensure no conflicting defaults are applied:
                # Example API call snippet
                response = openai.FineTuningJob.create(
                    training_file="file-abc123",
                    model="gpt-4o-mini-2024-07-18",
                    hyperparameters={
                        "batch_size": 4,
                        "learning_rate_multiplier": 0.1,  # Verify this line
                        "n_epochs": 1
                    }
                )
         
      
    2. Even if you followed the tutorial, hidden formatting errors (such as, BOM in UTF-8, invalid JSONL line breaks) can stall preprocessing. So, you have to:

    More troubleshooting:

    1. You can confirm whether the problem is data-specific or platform-related by submit a job using Azure’s example dataset - https://github.com/Azure/azure-openai-samples/blob/main/fine-tuning/10-examples.jsonl to rule out issues with your custom data.
    2. You can also switch regions to avoid queues because, jobs in North Central US/East US 2 might face high demand. Then, you can try West US 3 or Sweden Central regions with newer capacity.
    3. You can engage Azure Support for OpenAI - https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/get-support and/or Priority Customer Support - https://learn.microsoft.com/en-us/azure/azure-portal/supportability/priority-community-support and share the full activity log and job ID.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.