Hello Quan Nguyen,
Welcome to the Microsoft Q&A and thank you for posting your questions here.
I understand that your GPT-4o Finetuning just keep running.
There have been reports of internal errors causing fine-tuning jobs to fail - https://community.openai.com/t/the-job-failed-due-to-an-internal-error-fine-tuning-gpt4o-mini/1042181 and https://community.openai.com/t/chatgpt-4o-mini-fine-tuning-fails-internal-error/1061245). This might be the underlying issue in your case. However, regarding the information provided I will suggest the following troubleshooting steps as your first aid:
- Since the log warns about a low learning rate multiplier (
0.01
), but you explicitly set it to0.1
. This discrepancy suggests a parameter override in your code/portal configuration and/or a UI/API bug misreporting the value. What to do:- Double-check your API request body or portal settings for typos (such as,
learning_rate_multiplier=0.01
instead of0.1
). - If using Python, ensure no conflicting defaults are applied:
# Example API call snippet response = openai.FineTuningJob.create( training_file="file-abc123", model="gpt-4o-mini-2024-07-18", hyperparameters={ "batch_size": 4, "learning_rate_multiplier": 0.1, # Verify this line "n_epochs": 1 } )
- And if the issue persists, force the value to
1.0
(Azure’s recommended default) to bypass potential bugs - https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/fine-tuning#hyperparameters
- Double-check your API request body or portal settings for typos (such as,
- Even if you followed the tutorial, hidden formatting errors (such as, BOM in UTF-8, invalid JSONL line breaks) can stall preprocessing. So, you have to:
- Use
jq
to validate your JSONL file using bash command:jq '.' your_data.jsonl
This command will fail if any line has invalid JSON. - Remove BOM using PowerShell (this is critical for Azure compatibility):
Get-Content input.jsonl | Set-Content -Encoding utf8NoBOM output.jsonl
- Check this link out for reference: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prepare-dataset#data-format-guidelines
More troubleshooting:
- You can confirm whether the problem is data-specific or platform-related by submit a job using Azure’s example dataset - https://github.com/Azure/azure-openai-samples/blob/main/fine-tuning/10-examples.jsonl to rule out issues with your custom data.
- You can also switch regions to avoid queues because, jobs in
North Central US
/East US 2
might face high demand. Then, you can try West US 3 or Sweden Central regions with newer capacity. - You can engage Azure Support for OpenAI - https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/get-support and/or Priority Customer Support - https://learn.microsoft.com/en-us/azure/azure-portal/supportability/priority-community-support and share the full activity log and job ID.
I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.