Fine Tunes - Create

Creates a job that fine-tunes a specified model from a given training file. Response includes details of the enqueued job including job status and hyper parameters. The name of the fine-tuned model is added to the response once complete.

POST {endpoint}/openai/fine-tunes?api-version=2022-12-01

URI Parameters

Name In Required Type Description
endpoint
path True

string

url

Supported Cognitive Services endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace "aoairesource" with your Azure OpenAI account name).

api-version
query True

string

The requested API version.

Request Header

Name Required Type Description
api-key True

string

Provide your Cognitive Services Azure OpenAI account key here.

Request Body

Name Required Type Description
model True

string

The identifier (model-id) of the base model used for this fine-tune.

training_file True

string

The file identity (file-id) that is used for training this fine tuned model.

batch_size

integer

The batch size to use for training. The batch size is the number of training examples used to train a single forward and backward pass. In general, we've found that larger batch sizes tend to work better for larger datasets. The default value as well as the maximum value for this property are specific to a base model.

classification_betas

number[]

The classification beta values. If this is provided, we calculate F-beta scores at the specified beta values. The F-beta score is a generalization of F-1 score. This is only used for binary classification. With a beta of 1 (i.e.the F-1 score), precision and recall are given the same weight. A larger beta score puts more weight on recall and less on precision. A smaller beta score puts more weight on precision and less on recall.

classification_n_classes

integer

The number of classes in a classification task. This parameter is required for multiclass classification.

classification_positive_class

string

The positive class in binary classification. This parameter is needed to generate precision, recall, and F1 metrics when doing binary classification.

compute_classification_metrics

boolean

A value indicating whether to compute classification metrics. If set, we calculate classification-specific metrics such as accuracy and F-1 score using the validation set at the end of every epoch. These metrics can be viewed in the results file. In order to compute classification metrics, you must provide a validation_file.Additionally, you must specify classification_n_classes for multiclass classification or classification_positive_class for binary classification.

learning_rate_multiplier

number

The learning rate multiplier to use for training. The fine-tuning learning rate is the original learning rate used for pre-training multiplied by this value. Larger learning rates tend to perform better with larger batch sizes. We recommend experimenting with values in the range 0.02 to 0.2 to see what produces the best results.

n_epochs

integer

The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.

prompt_loss_weight

number

The weight to use for loss on the prompt tokens. This controls how much the model tries to learn to generate the prompt (as compared to the completion which always has a weight of 1.0), and can add a stabilizing effect to training when completions are short. If prompts are extremely long (relative to completions), it may make sense to reduce this weight so as to avoid over-prioritizing learning the prompt.

suffix

string

The suffix used to identify the fine-tuned model. The suffix can contain up to 40 characters (a-z, A-Z, 0-9,- and _) that will be added to your fine-tuned model name.

validation_file

string

The file identity (file-id) that is used to evaluate the fine tuned model during training.

Responses

Name Type Description
201 Created

FineTune

The fine tune has been successfully created.

Headers

Location: string

Other Status Codes

ErrorResponse

An error occurred.

Security

api-key

Provide your Cognitive Services Azure OpenAI account key here.

Type: apiKey
In: header

Examples

Creating a fine tune job for classification.
Creating a fine tune job.

Creating a fine tune job for classification.

Sample Request

POST https://aoairesource.openai.azure.com/openai/fine-tunes?api-version=2022-12-01


{
  "compute_classification_metrics": true,
  "classification_n_classes": 4,
  "model": "curie",
  "training_file": "file-181a1cbdcdcf4677ada87f63a0928099"
}

Sample Response

location: https://aoairesource.openai.azure.com/openai/fine-tunes/ft-72a2792ef7d24ba7b82c7fe4a37e379f
{
  "hyperparams": {
    "compute_classification_metrics": true,
    "classification_n_classes": 4,
    "batch_size": 32,
    "learning_rate_multiplier": 1,
    "n_epochs": 2,
    "prompt_loss_weight": 0.1
  },
  "model": "curie",
  "training_files": [
    {
      "statistics": {
        "tokens": 42,
        "examples": 23
      },
      "bytes": 140,
      "purpose": "fine-tune",
      "filename": "puppy.jsonl",
      "id": "file-181a1cbdcdcf4677ada87f63a0928099",
      "status": "succeeded",
      "created_at": 1646126127,
      "updated_at": 1646127311,
      "object": "file"
    }
  ],
  "id": "ft-72a2792ef7d24ba7b82c7fe4a37e379f",
  "status": "notRunning",
  "created_at": 1646126127,
  "updated_at": 1646127311,
  "object": "fine-tune"
}

Creating a fine tune job.

Sample Request

POST https://aoairesource.openai.azure.com/openai/fine-tunes?api-version=2022-12-01


{
  "model": "curie",
  "training_file": "file-181a1cbdcdcf4677ada87f63a0928099"
}

Sample Response

location: https://aoairesource.openai.azure.com/openai/fine-tunes/ft-72a2792ef7d24ba7b82c7fe4a37e379f
{
  "hyperparams": {
    "batch_size": 32,
    "learning_rate_multiplier": 1,
    "n_epochs": 2,
    "prompt_loss_weight": 0.1
  },
  "model": "curie",
  "training_files": [
    {
      "statistics": {
        "tokens": 42,
        "examples": 23
      },
      "bytes": 140,
      "purpose": "fine-tune",
      "filename": "puppy.jsonl",
      "id": "file-181a1cbdcdcf4677ada87f63a0928099",
      "status": "succeeded",
      "created_at": 1646126127,
      "updated_at": 1646127311,
      "object": "file"
    }
  ],
  "id": "ft-72a2792ef7d24ba7b82c7fe4a37e379f",
  "status": "notRunning",
  "created_at": 1646126127,
  "updated_at": 1646127311,
  "object": "fine-tune"
}

Definitions

Name Description
Error

Error

ErrorCode

ErrorCode

ErrorResponse

ErrorResponse

Event

Event

File

File

FileStatistics

FileStatistics

FineTune

FineTune

FineTuneCreation

FineTuneCreation

HyperParameters

HyperParameters

InnerError

InnerError

InnerErrorCode

InnerErrorCode

LogLevel

LogLevel

Purpose

Purpose

State

State

TypeDiscriminator

TypeDiscriminator

Error

Error

Name Type Description
code

ErrorCode

ErrorCode
Error codes as defined in the Microsoft REST guidelines (https://github.com/microsoft/api-guidelines/blob/vNext/Guidelines.md#7102-error-condition-responses).

details

Error[]

The error details if available.

innererror

InnerError

InnerError
Inner error as defined in the Microsoft REST guidelines (https://github.com/microsoft/api-guidelines/blob/vNext/Guidelines.md#7102-error-condition-responses).

message

string

The message of this error.

target

string

The location where the error happened if available.

ErrorCode

ErrorCode

Name Type Description
conflict

string

The requested operation conflicts with the current resource state.

fileImportFailed

string

Import of file failed.

forbidden

string

The operation is forbidden for the current user/api key.

internalFailure

string

Internal error. Please retry.

invalidPayload

string

The request data is invalid for this operation.

itemDoesAlreadyExist

string

The item does already exist.

jsonlValidationFailed

string

Validation of jsonl data failed.

notFound

string

The resource is not found.

quotaExceeded

string

Quota exceeded.

serviceUnavailable

string

The service is currently not available.

unexpectedEntityState

string

The operation cannot be executed in the current resource's state.

ErrorResponse

ErrorResponse

Name Type Description
error

Error

Error
Error content as defined in the Microsoft REST guidelines (https://github.com/microsoft/api-guidelines/blob/vNext/Guidelines.md#7102-error-condition-responses).

Event

Event

Name Type Description
created_at

integer

A timestamp when this event was created (in unix epochs).

level

LogLevel

LogLevel
The verbosity level of an event.

message

string

The message describing the event. This can be a change of state, e.g., enqueued, started, failed or completed, or other events like uploaded results.

object

TypeDiscriminator

TypeDiscriminator
Defines the type of an object.

File

File

Name Type Description
bytes

integer

The size of this file when available (can be null). File sizes larger than 2^53-1 are not supported to ensure compatibility with JavaScript integers.

created_at

integer

A timestamp when this job or item was created (in unix epochs).

error

Error

Error
Error content as defined in the Microsoft REST guidelines (https://github.com/microsoft/api-guidelines/blob/vNext/Guidelines.md#7102-error-condition-responses).

filename

string

The name of the file.

id

string

The identity of this item.

object

TypeDiscriminator

TypeDiscriminator
Defines the type of an object.

purpose

Purpose

Purpose
The intended purpose of the uploaded documents. Use "fine-tune" for fine-tuning. This allows us to validate the format of the uploaded file.

statistics

FileStatistics

FileStatistics
A file is a document usable for training and validation. It can also be a service generated document with result details.

status

State

State
The state of a job or item.

updated_at

integer

A timestamp when this job or item was modified last (in unix epochs).

FileStatistics

FileStatistics

Name Type Description
examples

integer

The number of contained training examples in files of kind "fine-tune" once validation of file content is complete.

tokens

integer

The number of tokens used in prompts and completions for files of kind "fine-tune" once validation of file content is complete.

FineTune

FineTune

Name Type Description
created_at

integer

A timestamp when this job or item was created (in unix epochs).

error

Error

Error
Error content as defined in the Microsoft REST guidelines (https://github.com/microsoft/api-guidelines/blob/vNext/Guidelines.md#7102-error-condition-responses).

events

Event[]

The events that show the progress of the fine-tune run including queued, running and completed.

fine_tuned_model

string

The identifier (model-id) of the resulting fine tuned model. This property is only populated for successfully completed fine-tune runs. Use this identifier to create a deployment for inferencing.

hyperparams

HyperParameters

HyperParameters
The hyper parameter settings used in a fine tune job.

id

string

The identity of this item.

model

string

The identifier (model-id) of the base model used for the fine-tune.

object

TypeDiscriminator

TypeDiscriminator
Defines the type of an object.

organisation_id

string

The organisation id of this fine tune job. Unused on Azure OpenAI; compatibility for OpenAI only.

result_files

File[]

The result file identities (file-id) containing training and evaluation metrics in csv format. The file is only available for successfully completed fine-tune runs.

status

State

State
The state of a job or item.

suffix

string

The suffix used to identify the fine-tuned model.

training_files

File[]

The file identities (file-id) that are used for training the fine tuned model.

updated_at

integer

A timestamp when this job or item was modified last (in unix epochs).

user_id

string

The user id of this fine tune job. Unused on Azure OpenAI; compatibility for OpenAI only.

validation_files

File[]

The file identities (file-id) that are used to evaluate the fine tuned model during training.

FineTuneCreation

FineTuneCreation

Name Type Description
batch_size

integer

The batch size to use for training. The batch size is the number of training examples used to train a single forward and backward pass. In general, we've found that larger batch sizes tend to work better for larger datasets. The default value as well as the maximum value for this property are specific to a base model.

classification_betas

number[]

The classification beta values. If this is provided, we calculate F-beta scores at the specified beta values. The F-beta score is a generalization of F-1 score. This is only used for binary classification. With a beta of 1 (i.e.the F-1 score), precision and recall are given the same weight. A larger beta score puts more weight on recall and less on precision. A smaller beta score puts more weight on precision and less on recall.

classification_n_classes

integer

The number of classes in a classification task. This parameter is required for multiclass classification.

classification_positive_class

string

The positive class in binary classification. This parameter is needed to generate precision, recall, and F1 metrics when doing binary classification.

compute_classification_metrics

boolean

A value indicating whether to compute classification metrics. If set, we calculate classification-specific metrics such as accuracy and F-1 score using the validation set at the end of every epoch. These metrics can be viewed in the results file. In order to compute classification metrics, you must provide a validation_file.Additionally, you must specify classification_n_classes for multiclass classification or classification_positive_class for binary classification.

learning_rate_multiplier

number

The learning rate multiplier to use for training. The fine-tuning learning rate is the original learning rate used for pre-training multiplied by this value. Larger learning rates tend to perform better with larger batch sizes. We recommend experimenting with values in the range 0.02 to 0.2 to see what produces the best results.

model

string

The identifier (model-id) of the base model used for this fine-tune.

n_epochs

integer

The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.

prompt_loss_weight

number

The weight to use for loss on the prompt tokens. This controls how much the model tries to learn to generate the prompt (as compared to the completion which always has a weight of 1.0), and can add a stabilizing effect to training when completions are short. If prompts are extremely long (relative to completions), it may make sense to reduce this weight so as to avoid over-prioritizing learning the prompt.

suffix

string

The suffix used to identify the fine-tuned model. The suffix can contain up to 40 characters (a-z, A-Z, 0-9,- and _) that will be added to your fine-tuned model name.

training_file

string

The file identity (file-id) that is used for training this fine tuned model.

validation_file

string

The file identity (file-id) that is used to evaluate the fine tuned model during training.

HyperParameters

HyperParameters

Name Type Description
batch_size

integer

The batch size to use for training. The batch size is the number of training examples used to train a single forward and backward pass. In general, we've found that larger batch sizes tend to work better for larger datasets. The default value as well as the maximum value for this property are specific to a base model.

classification_betas

number[]

The classification beta values. If this is provided, we calculate F-beta scores at the specified beta values. The F-beta score is a generalization of F-1 score. This is only used for binary classification. With a beta of 1 (i.e.the F-1 score), precision and recall are given the same weight. A larger beta score puts more weight on recall and less on precision. A smaller beta score puts more weight on precision and less on recall.

classification_n_classes

integer

The number of classes in a classification task. This parameter is required for multiclass classification.

classification_positive_class

string

The positive class in binary classification. This parameter is needed to generate precision, recall, and F1 metrics when doing binary classification.

compute_classification_metrics

boolean

A value indicating whether to compute classification metrics. If set, we calculate classification-specific metrics such as accuracy and F-1 score using the validation set at the end of every epoch. These metrics can be viewed in the results file. In order to compute classification metrics, you must provide a validation_file.Additionally, you must specify classification_n_classes for multiclass classification or classification_positive_class for binary classification.

learning_rate_multiplier

number

The learning rate multiplier to use for training. The fine-tuning learning rate is the original learning rate used for pre-training multiplied by this value. Larger learning rates tend to perform better with larger batch sizes. We recommend experimenting with values in the range 0.02 to 0.2 to see what produces the best results.

n_epochs

integer

The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.

prompt_loss_weight

number

The weight to use for loss on the prompt tokens. This controls how much the model tries to learn to generate the prompt (as compared to the completion which always has a weight of 1.0), and can add a stabilizing effect to training when completions are short. If prompts are extremely long (relative to completions), it may make sense to reduce this weight so as to avoid over-prioritizing learning the prompt.

InnerError

InnerError

Name Type Description
code

InnerErrorCode

InnerErrorCode
Inner error codes as defined in the Microsoft REST guidelines (https://github.com/microsoft/api-guidelines/blob/vNext/Guidelines.md#7102-error-condition-responses).

innererror

InnerError

InnerError
Inner error as defined in the Microsoft REST guidelines (https://github.com/microsoft/api-guidelines/blob/vNext/Guidelines.md#7102-error-condition-responses).

InnerErrorCode

InnerErrorCode

Name Type Description
invalidPayload

string

The request data is invalid for this operation.

LogLevel

LogLevel

Name Type Description
error

string

This message represents a non recoverable issue.

info

string

This event is for information only.

warning

string

This event represents a mitigated issue.

Purpose

Purpose

Name Type Description
fine-tune

string

This file contains training data for a fine tune job.

fine-tune-results

string

This file contains the results of a fine tune job.

State

State

Name Type Description
canceled

string

The operation has been canceled and is incomplete.

deleted

string

The entity has been deleted but may still be referenced by other entities predating the deletion.

failed

string

The operation has completed processing with a failure and cannot be further consumed.

notRunning

string

The operation was created and is not queued to be processed in the future.

running

string

The operation has started to be processed.

succeeded

string

The operation has successfully be processed and is ready for consumption.

TypeDiscriminator

TypeDiscriminator

Name Type Description
deployment

string

This object represents a deployment.

file

string

This object represents a file.

fine-tune

string

This object represents a fine tune job.

fine-tune-event

string

This object represents an event of a fine tune job.

list

string

This object represents a list of other objects.

model

string

This object represents a model (can be a base models or fine tune job result).