Understanding token count in gpt model fine tuning

Question

Understanding token count in gpt model fine tuning

Aimee Neary 0

We recently experimented with the fine tuning the gpt4-o model in OpenAI Studio. We trained with the same data and the same prompt varying the batch size and epochs and are very confused about the token counts we got after the model finished training:

Batch SizeEpochsTokens15223 million32492.2 millionWe are under the impression that the token count for the 4-epoch model should just be two times the amount as the one for the 2 epoch training, since it's running the exact same records, just twice as long. Can you explain to me why the tokens are actually 4x? Also if you can explain how these token counts are calculated that would be great!

romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator

2024-09-25T14:44:39.1433333+00:00

@Aimee Neary I understand you have changed the epoch and batch size from your subsequent training where the epoch was updated to 4 from the default 2. AFAIK you are correct about the epoch were incrementing it by 1 would increase one full cycle through the training dataset. So, your second run should be twice than your first since you changed it from 2 to 4.

Also, does your fine-tuning job complete successfully in all scenarios? I think the billing is done for the ones that are successful. Since this directly effects your billing, you might want to raise a support case from azure portal to get a confirmation on the actual usage that will be billed.

W.r.t token usage or calculation, please see this page on guidance to use tiktoken.
Aimee Neary 0 Reputation points

2024-09-25T16:16:05.8633333+00:00

Thank you @romungi-MSFT

We got those token counts from the openAI Studio UI after the model had completed, and yes both models completed. Also, the charges showed up in our cost management reports, so we were actually charged.

Can you send me the link to where I need to raise the ticket and let me know if I need to include any specific details.
romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator

2024-09-26T06:35:37.6833333+00:00

@Aimee Neary From the azure portal. Just search "help + support" and you should click on the option as seen below. Submit all the details requested.

Select your subscription details after you click Next and scroll to the bottom to create a support request for billing.

In the subsequent screens select the subscription details and the below options.

You will be provided with some recommendations, you can ignore that and go back to support ticket and hit Next to submit details of when you think the problem occurred.

Review the details in final screen and submit a case. I hope this helps!!

Your answer

romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator

2024-09-25T14:44:39.1433333+00:00

@Aimee Neary I understand you have changed the epoch and batch size from your subsequent training where the epoch was updated to 4 from the default 2. AFAIK you are correct about the epoch were incrementing it by 1 would increase one full cycle through the training dataset. So, your second run should be twice than your first since you changed it from 2 to 4.

Also, does your fine-tuning job complete successfully in all scenarios? I think the billing is done for the ones that are successful. Since this directly effects your billing, you might want to raise a support case from azure portal to get a confirmation on the actual usage that will be billed.

W.r.t token usage or calculation, please see this page on guidance to use tiktoken.
Aimee Neary 0 Reputation points

2024-09-25T16:16:05.8633333+00:00

Thank you @romungi-MSFT

We got those token counts from the openAI Studio UI after the model had completed, and yes both models completed. Also, the charges showed up in our cost management reports, so we were actually charged.

Can you send me the link to where I need to raise the ticket and let me know if I need to include any specific details.
romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator

2024-09-26T06:35:37.6833333+00:00

@Aimee Neary From the azure portal. Just search "help + support" and you should click on the option as seen below. Submit all the details requested.

Select your subscription details after you click Next and scroll to the bottom to create a support request for billing.

In the subsequent screens select the subscription details and the below options.

You will be provided with some recommendations, you can ignore that and go back to support ticket and hit Next to submit details of when you think the problem occurred.

Review the details in final screen and submit a case. I hope this helps!!

Share via

Understanding token count in gpt model fine tuning

Your answer