Is there an estimate to how long a fine tuning process would take per 1k tokens? Is the process linear?

Kayla 75 Reputation points
2024-06-28T00:08:38.5666667+00:00

I'm starting a process on a fine tuning model that is pretty intensive. There is about 7,099,648 tokens amongst the training and validation file (about 45,000 lines). I want to get an estimate about my time + price cost but can't find any resources that help. I saw two examples of how long a 5,000 and 2,000 token file took but I assume the process isn't linear (I estimated a few months this way). Anyone have any idea or have good examples on their end on how long their heavier projects took?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,654 questions
0 comments No comments
{count} votes

Accepted answer
  1. dupammi 8,540 Reputation points Microsoft Vendor
    2024-06-28T06:51:47.4966667+00:00

    Hi @Kayla Farivar

    Thank you for using the Microsoft Q&A platform.

    For fine-tuning on Azure OpenAI with 7.1 million tokens, the process involves several factors including system-level throughput, per-call latency, and hardware configuration. System-level throughput determines your deployment's capacity to handle requests per minute and total tokens, while per-call latency depends on prompt size, generation size, model type, and system load. Using high-end GPUs, you might achieve a processing rate of around 1,000 tokens per second per GPU, giving a rough estimate of 1.97 hours for your dataset, though actual times may vary due to overheads. The throughput is also influenced by whether your deployment is provisioned, impacting how input size, output size, and call rate affect processing. Azure's documentation shows that PTU requirements scale roughly linearly with call rate and workload size, with examples like 800 tokens prompts needing 100 PTUs for 30 calls per minute.

    The number of PTUs scales roughly linearly with call rate (might be sublinear) when the workload distribution remains constant.

    To optimize, use Azure Monitor to track tokens processed and adjust your model parameters accordingly. Fine-tuning doesn’t scale linearly with token count, as data handling and model optimizations introduce complexities. Accurate time estimates require benchmarking with real traffic and workload characteristics.

    If you still need more details after going through above documentation, I request you to raise a support case through Azure portal.

    I hope you understand. Thank you.


    Please don't forget to click Accept Answer and Yes for was this answer helpful.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.