GPT-4o Finetuning just keep running

Question

GPT-4o Finetuning just keep running

Quan Nguyen 20

I am following this fine-tuning tutorial: https://learn.microsoft.com/en-us/azure/ai-services/openai/tutorials/fine-tune?tabs=command-line.

I checked the log and found that after is logged "Finetuning started", it just stayed like that for hours (more than 8hrs). The training data is from tutorial, so there are just 10 samples. What is happening?

I tried all those, but cannot fix the issue:

Tried both locations to North Central US and East US 2
Create fine-tune job for gpt-4o and gpt-4o-mini
I tried using both Python API and on Azure AI Studio portal
Tried canceling and resubmitting it several times
I tried increase to 200 samples for both training and validating data (following this question: https://learn.microsoft.com/en-us/answers/questions/2260568/fine-tuning-job-stuck-in-training-status-for-over)

I tried to fine-tune with those info:

Location: north central us
Base model: gpt-4o-mini-2024-07-18
Method of Customization: Supervised
Dataset: Copied from tutorial
Task parameters: Batch size:4, Learning rate multiplier: 0.1, Number of epochs:1, Seed: 42

Here is the log:

Apr 26, 2025 2:34 PM: status : Training started.
Apr 26, 2025 2:34 PM: status : Finetuning started.
Apr 26, 2025 2:34 PM: status : Data Import started.
Apr 26, 2025 2:23 PM: status : Preprocessing completed for file training file.
Apr 26, 2025 2:21 PM: status : Preprocessing running for file training file.
Apr 26, 2025 2:21 PM: status : Training may be less effective due to low specified learning rate multiplier 0.01. Recommended learning rate multiplier is 1.
Apr 26, 2025 2:21 PM: status : Job enqueued. Waiting for jobs ahead to complete.

Accepted answer

0 additional answers

Your answer

Answer 1

Hello Quan Nguyen,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that your GPT-4o Finetuning just keep running.

There have been reports of internal errors causing fine-tuning jobs to fail - https://community.openai.com/t/the-job-failed-due-to-an-internal-error-fine-tuning-gpt4o-mini/1042181 and https://community.openai.com/t/chatgpt-4o-mini-fine-tuning-fails-internal-error/1061245). This might be the underlying issue in your case. However, regarding the information provided I will suggest the following troubleshooting steps as your first aid:

Since the log warns about a low learning rate multiplier (0.01), but you explicitly set it to 0.1. This discrepancy suggests a parameter override in your code/portal configuration and/or a UI/API bug misreporting the value. What to do:
- Double-check your API request body or portal settings for typos (such as, learning_rate_multiplier=0.01 instead of 0.1).
- If using Python, ensure no conflicting defaults are applied:
```
          # Example API call snippet
          response = openai.FineTuningJob.create(
              training_file="file-abc123",
              model="gpt-4o-mini-2024-07-18",
              hyperparameters={
                  "batch_size": 4,
                  "learning_rate_multiplier": 0.1,  # Verify this line
                  "n_epochs": 1
              }
          )
   
```
- And if the issue persists, force the value to 1.0 (Azure’s recommended default) to bypass potential bugs - https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/fine-tuning#hyperparameters
Even if you followed the tutorial, hidden formatting errors (such as, BOM in UTF-8, invalid JSONL line breaks) can stall preprocessing. So, you have to:

Use jq to validate your JSONL file using bash command: jq '.' your_data.jsonl This command will fail if any line has invalid JSON.
Remove BOM using PowerShell (this is critical for Azure compatibility): Get-Content input.jsonl | Set-Content -Encoding utf8NoBOM output.jsonl
Check this link out for reference: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prepare-dataset#data-format-guidelines

More troubleshooting:

You can confirm whether the problem is data-specific or platform-related by submit a job using Azure’s example dataset - https://github.com/Azure/azure-openai-samples/blob/main/fine-tuning/10-examples.jsonl to rule out issues with your custom data.
You can also switch regions to avoid queues because, jobs in North Central US/East US 2 might face high demand. Then, you can try West US 3 or Sweden Central regions with newer capacity.
You can engage Azure Support for OpenAI - https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/get-support and/or Priority Customer Support - https://learn.microsoft.com/en-us/azure/azure-portal/supportability/priority-community-support and share the full activity log and job ID.

I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

kothapally Snigdha 3,020 Reputation points Microsoft External Staff Moderator

2025-04-29T06:08:28.6833333+00:00

Hello Quan Nguyen,

Did you get any chance to check the above response is helpful.

Thank you!

Share via

GPT-4o Finetuning just keep running

0 additional answers

Your answer