how to predict exact labels out of a finetuned Curie model

Kempkes, Rainer 0 Reputation points
2023-06-06T12:10:43.8733333+00:00

I finetuned the OpenAI model Curie and deployed it successfully. Now I want to predict the trained labels on new data. Trying it with the Completion API doesn't give back the exact trained labels but some text.
How can I exactly predict only the labels the model was finetuned with? Completion doesn't seem to work properly here.
Somehow it must work because how else would the model be able to measure the accuracy on the provided test data?
regards,
Rainer

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,378 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Sedat SALMAN 14,140 Reputation points MVP
    2023-06-06T13:57:35.7633333+00:00

    For setting parameters, OpenAI recommends using ada, as it is the fastest engine and capable of creating good predictions in a classification task after fine-tuning. At least 100 examples per class are needed for fine-tuning, and the performance tends to increase linearly as the number of examples doubles.

    To get confidence, the log probability of the first generated completion token can be used. The higher the log probability for the "yes" token, the more confident the prediction is that the output is supported.

    To determine a log probability threshold above which the ad is likely to be supported more than 98% of the time, follow these steps:

    1. Use the discriminator to predict the probability of "yes" on a held-out dataset.
    2. Convert the log probability measure into percentiles.
    3. For each percentile compute a precision, which is the share of actual truthful ads found above that threshold.
    4. Find a percentile at which the precision is just above 98%. The log probability threshold needed to obtain a precision of at least 98% is then the log probability at this percentile on the held-out dataset.

    By generating several samples and then picking the one with the highest log probability, you can increase the probability that the selected ad is indeed truthful. For example, by generating 3 samples you can achieve a sufficient truthfulness threshold of 98.3%. This can be increased to 99.6% by generating 10 samples, and 99.8% by generating 17 samples. There are diminishing returns as the number of samples significantly increases.


  2. Kempkes, Rainer 0 Reputation points
    2023-06-07T11:14:04.4133333+00:00

    I will use ada for finetuning. My usecase is the prediction of ~300 topics from user queries.

    My training data look like this:
    [{prompt: 'why is x?', completion: 'topic a'}, {prompt: 'weather is nice', completion: 'chitchat b'},...]

    "To get confidence, the log probability of the first generated completion token can be used."

    I don't know neither how to get this log prob from the completion API nor if the completion API is the best way to get the classification prediction.

    The higher the log probability for the "yes" token, the more confident the prediction is that the output is supported."

    I want the completion API to predict exactly one of the 300 finetuned topics. But what it does is predict some generated text (and that's what the original meaning of "completion" is... ;-) )
    example:
    prompt = 'why is x?'
    expected result: label 'reason y' (one of the n trained labels)
    completion API: 'reso_y and this and that' (none of the n trained labels, but similar to label x)

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.