Fine Tuning - Json File

Irina Sopas 80 Reputation points
2025-04-12T12:24:32.04+00:00

Hello. I am trying to Fine Tuning my OpenAi Model.

I would like to know how many json files I can add. I need to put all the information in one json file or I can have it divided by themes in various Json files.

Best regards,

IS

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,933 questions
{count} votes

Accepted answer
  1. Suhas M 75 Reputation points
    2025-04-12T13:14:33.4766667+00:00

    Hello IS! Great to hear you’re fine-tuning your model with Azure OpenAI Service.

    You can split your data into multiple JSONL files.

    Each file must follow the required format, and during the fine-tuning upload and training process, Azure OpenAI supports multiple files as long as they are properly formatted.

    Format Reminder:

    Each file must be in .jsonl (JSON Lines) format, meaning:

    {"prompt": "input text", "completion": "desired response"}
    

    Multiple Files – Best Practice:

    You can and often should organize your training data by theme into different .jsonl files (e.g., customer_service.jsonl, technical_docs.jsonl, sales_pitch.jsonl). This helps you:

    • Maintain your data more easily

    Debug or update specific topics later

    Ensure better control over how different data segments influence the model

    Then, when creating the fine-tuned model, you can upload and use them together.

    Notes:

    You can combine files during upload or before fine-tuning depending on the method you're using (e.g., CLI, API).

    Azure has some size and token limits (e.g., each file max 100 MB and total dataset should remain within token limits).

    Be mindful of balance and duplication across datasets to avoid model bias.

    Would you like help with how to format or combine your files, or a command example for uploading in Azure?Hello IS! Great to hear you’re fine-tuning your model with Azure OpenAI Service.

    You can split your data into multiple JSONL files.

    Each file must follow the required format, and during the fine-tuning upload and training process, Azure OpenAI supports multiple files as long as they are properly formatted.

    Format Reminder:

    Each file must be in .jsonl (JSON Lines) format, meaning:

    {"prompt": "input text", "completion": "desired response"}
    

    🗂 Multiple Files – Best Practice:

    You can and often should organize your training data by theme into different .jsonl files (e.g., customer_service.jsonl, technical_docs.jsonl, sales_pitch.jsonl). This helps you:

    Maintain your data more easily

    Debug or update specific topics later

    Ensure better control over how different data segments influence the model

    Then, when creating the fine-tuned model, you can upload and use them together.

    Notes:

    • You can combine files during upload or before fine-tuning depending on the method you're using (e.g., CLI, API).
    • Azure has some size and token limits (e.g., each file max 100 MB and total dataset should remain within token limits).
    • Be mindful of balance and duplication across datasets to avoid model bias.

1 additional answer

Sort by: Most helpful
  1. Irina Sopas 80 Reputation points
    2025-04-14T18:26:16.05+00:00

    Hello. Thanks for the answer. Can you tell me which software I can use to create a jsonl file? And a specific sample for

    {"prompt": "input text", "completion": "desired response"}
    
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.