Share via


Ingestion Jobs - Create

Creates an ingestion job with the specified job id.

PUT {endpoint}/openai/ingestion/jobs/{job-id}?api-version=2024-05-01-preview

URI Parameters

Name In Required Type Description
endpoint
path True

string

url

Supported Cognitive Services endpoints (protocol and hostname, for example: https://aoairesource.openai.azure.com. Replace "aoairesource" with your Azure OpenAI account name).

job-id
path True

string

The id of the job that will be created.

api-version
query True

string

The requested API version.

Request Header

Name Required Type Description
api-key True

string

Provide your Cognitive Services Azure OpenAI account key here.

mgmt-user-token

string

The token used to access the workspace (needed only for user compute jobs).

aml-user-token

string

The token used to access the resources within the job in the workspace (needed only for user compute jobs).

Request Body

The request body can be one of the following:

Name Description
IngestionJobSystemCompute
IngestionJobUserCompute

IngestionJobSystemCompute

Name Required Type Description
kind True string:

system

IngestionJobType
The job type.

completionAction

IngestionJobCompletionAction

The completion action.

dataRefreshIntervalInHours

integer

datasource SystemComputeDatasource:

SystemComputeDatasource

jobId

string

searchServiceConnection BaseConnection:

BaseConnection
A connection to a resource.

IngestionJobUserCompute

Name Required Type Description
kind True string:

user

IngestionJobType
The job type.

workspaceId True

string

compute JobCompute:

JobCompute
The compute settings of the job.

dataRefreshIntervalInHours

integer

datasource UserComputeDatasource:

UserComputeDatasource

jobId

string

target TargetIndex:

TargetIndex
Information about the index to be created.

Responses

Name Type Description
200 OK IngestionJob:

Success

Other Status Codes

ErrorResponse

An error occurred.

Security

api-key

Provide your Cognitive Services Azure OpenAI account key here.

Type: apiKey
In: header

Examples

Create a system-compute ingestion job
Create a user-compute ingestion job

Create a system-compute ingestion job

Sample Request

PUT {endpoint}/openai/ingestion/jobs/{job-id}?api-version=2024-05-01-preview

{
  "kind": "SystemCompute",
  "searchServiceConnection": {
    "kind": "EndpointWithManagedIdentity",
    "endpoint": "https://aykame-dev-search.search.windows.net"
  },
  "datasource": {
    "kind": "Storage",
    "storageAccountConnection": {
      "kind": "EndpointWithManagedIdentity",
      "endpoint": "https://mystorage.blob.core.windows.net/",
      "resourceId": "ResourceId=/subscriptions/1234567-abcd-1234-5678-1234abcd/resourceGroups/my-resource/providers/Microsoft.Storage/storageAccounts/mystorage"
    },
    "containerName": "container",
    "chunkingSettings": {
      "maxChunkSizeInTokens": 2048
    },
    "embeddingsSettings": [
      {
        "embeddingResourceConnection": {
          "kind": "RelativeConnection"
        },
        "modelProvider": "AOAI",
        "deploymentName": "Ada"
      }
    ]
  },
  "dataRefreshIntervalInHours": 24,
  "completionAction": 0
}

Sample Response

operation-location: https://aoairesource.openai.azure.com/openai/ingestion/jobs/ingestion-job/runs/72a2792ef7d24ba7b82c7fe4a37e379f?api-version=2024-05-01-preview
{
  "kind": "SystemCompute",
  "jobId": "ingestion-job",
  "searchServiceConnection": {
    "kind": "EndpointWithManagedIdentity",
    "endpoint": "https://aykame-dev-search.search.windows.net"
  },
  "datasource": {
    "kind": "Storage",
    "storageAccountConnection": {
      "kind": "EndpointWithManagedIdentity",
      "endpoint": "https://mystorage.blob.core.windows.net/",
      "resourceId": "ResourceId=/subscriptions/1234567-abcd-1234-5678-1234abcd/resourceGroups/my-resource/providers/Microsoft.Storage/storageAccounts/mystorage"
    },
    "containerName": "container",
    "chunkingSettings": {
      "maxChunkSizeInTokens": 2048
    },
    "embeddingsSettings": [
      {
        "embeddingResourceConnection": {
          "kind": "RelativeConnection"
        },
        "modelProvider": "AOAI",
        "deploymentName": "Ada"
      }
    ]
  },
  "dataRefreshIntervalInHours": 24,
  "completionAction": 0
}

Create a user-compute ingestion job

Sample Request

PUT {endpoint}/openai/ingestion/jobs/{job-id}?api-version=2024-05-01-preview

{
  "kind": "UserCompute",
  "workspaceId": "/subscriptions/f375b912-331c-4fc5-8e9f-2d7205e3e036/resourceGroups/adrama-copilot-demo/providers/Microsoft.MachineLearningServices/workspaces/adrama-rag-dev",
  "compute": {
    "kind": "ServerlessCompute"
  },
  "target": {
    "kind": "AzureAISearch",
    "connectionId": "/subscriptions/f375b912-331c-4fc5-8e9f-2d7205e3e036/resourceGroups/adrama-copilot-demo/providers/Microsoft.MachineLearningServices/workspaces/adrama-rag-dev/connections/search-connection"
  },
  "datasource": {
    "kind": "Dataset",
    "datasetId": "azureml://locations/centraluseuap/workspaces/83317fe6-efa6-4e4a-b020-d0edd11ec382/data/PlainText/versions/1",
    "datasetType": "uri_folder"
  }
}

Sample Response

operation-location: https://aoairesource.openai.azure.com/openai/ingestion/jobs/ingestion-job/runs/72a2792ef7d24ba7b82c7fe4a37e379f?api-version=2024-05-01-preview
{
  "kind": "UserCompute",
  "jobId": "ingestion-job",
  "workspaceId": "/subscriptions/f375b912-331c-4fc5-8e9f-2d7205e3e036/resourceGroups/adrama-copilot-demo/providers/Microsoft.MachineLearningServices/workspaces/adrama-rag-dev",
  "compute": {
    "kind": "ServerlessCompute"
  },
  "target": {
    "kind": "AzureAISearch",
    "connectionId": "/subscriptions/f375b912-331c-4fc5-8e9f-2d7205e3e036/resourceGroups/adrama-copilot-demo/providers/Microsoft.MachineLearningServices/workspaces/adrama-rag-dev/connections/search-connection"
  },
  "datasource": {
    "kind": "Dataset",
    "datasetId": "azureml://locations/centraluseuap/workspaces/83317fe6-efa6-4e4a-b020-d0edd11ec382/data/PlainText/versions/1",
    "datasetType": "uri_folder"
  }
}

Definitions

Name Description
ACSIndex

ACS Index.

BaseConnection

BaseConnection

ChunkingSettings

ChunkingSettings

ComputeType

The compute type.

ConnectionStringConnection

Connection string connection.

ConnectionType

The connection type.

CosmosDBIndex

CosmosDB Index.

CrawlingSettings

CrawlingSettings

CustomCompute

Custom compute.

DatasourceType

The datasource type.

DeploymentConnection

Relative deployment connection.

EndpointKeyConnection

Endpoint key connection.

EndpointMIConnection

Endpoint Managed Identity connection.

Error

Error

ErrorCode

ErrorCode

ErrorResponse

ErrorResponse

GenericEmbeddingSettings

ConnectionEmbeddingSettings

IngestionJobCompletionAction

The completion action.

IngestionJobSystemCompute
IngestionJobType

IngestionJobType

IngestionJobUserCompute
InnerError

InnerError

InnerErrorCode

InnerErrorCode

PineconeIndex

Pinecone Index.

ServerlessCompute

Serverless compute.

SystemComputeDatasource

SystemComputeDatasource

SystemComputeStorage

SystemComputeStorage

SystemComputeUrl

SystemComputeUrl

TargetType

The target type.

UserComputeDataset

UserComputeStorage

UserComputeUrl

UserComputeUrl

WorkspaceConnection

AML Workspace connection.

WorkspaceConnectionEmbeddingSettings

WorkspaceConnectionEmbeddingSettings

ACSIndex

ACS Index.

Name Type Description
connectionId

string

The id of the connection pointing to the ACS Index.

kind string:

acs

The target type.

BaseConnection

BaseConnection

Name Type Description
kind

ConnectionType

The connection type.

ChunkingSettings

ChunkingSettings

Name Type Description
maxChunkSizeInTokens

integer

ComputeType

The compute type.

Name Type Description
custom

string

Custom user compute.

serverless

string

Serverless user compute.

ConnectionStringConnection

Connection string connection.

Name Type Description
connectionString

string

Connection string

kind

ConnectionType

The connection type.

ConnectionType

The connection type.

Name Type Description
connectionString

string

Connection string.

endpointKey

string

Endpoint and key connection.

endpointMI

string

Endpoint and managed identity.

workspace

string

AML Workspace connection.

CosmosDBIndex

CosmosDB Index.

Name Type Description
collectionName

string

The name of the cosmos DB collection.

connectionId

string

The id of the connection pointing to the cosmos DB.

databaseName

string

The name of the cosmos DB database.

kind string:

cosmosdb

The target type.

CrawlingSettings

CrawlingSettings

Name Type Description
maxCrawlDepth

integer

maxCrawlTimeInMins

integer

maxDownloadTimeInMins

integer

maxFileSize

integer

maxFiles

integer

maxRedirects

integer

CustomCompute

Custom compute.

Name Type Description
computeId

string

Id of the custom compute

kind string:

custom

The compute type.

DatasourceType

The datasource type.

Name Type Description
storage

string

Azure Storage Account.

urls

string

URLs.

DeploymentConnection

Relative deployment connection.

Name Type Description
kind

ConnectionType

The connection type.

EndpointKeyConnection

Endpoint key connection.

Name Type Description
endpoint

string

Endpoint

key

string

Key

kind

ConnectionType

The connection type.

EndpointMIConnection

Endpoint Managed Identity connection.

Name Type Description
endpoint

string

Endpoint

kind

ConnectionType

The connection type.

Error

Error

Name Type Description
code

ErrorCode

ErrorCode
Error codes as defined in the Microsoft REST guidelines (https://github.com/microsoft/api-guidelines/blob/vNext/Guidelines.md#7102-error-condition-responses).

details

Error[]

The error details if available.

innererror

InnerError

InnerError
Inner error as defined in the Microsoft REST guidelines (https://github.com/microsoft/api-guidelines/blob/vNext/Guidelines.md#7102-error-condition-responses).

message

string

The message of this error.

target

string

The location where the error happened if available.

ErrorCode

ErrorCode

Name Type Description
conflict

string

The requested operation conflicts with the current resource state.

contentFilter

string

Image generation failed as a result of our safety system.

fileImportFailed

string

Import of file failed.

forbidden

string

The operation is forbidden for the current user/api key.

internalFailure

string

Internal error. Please retry.

invalidPayload

string

The request data is invalid for this operation.

itemDoesAlreadyExist

string

The item does already exist.

jsonlValidationFailed

string

Validation of jsonl data failed.

notFound

string

The resource is not found.

quotaExceeded

string

Quota exceeded.

serviceUnavailable

string

The service is currently not available.

tooManyRequests

string

Too many requests. Please retry later.

unauthorized

string

The current user/api key is not authorized for the operation.

unexpectedEntityState

string

The operation cannot be executed in the current resource's state.

ErrorResponse

ErrorResponse

Name Type Description
error

Error

Error
Error content as defined in the Microsoft REST guidelines (https://github.com/microsoft/api-guidelines/blob/vNext/Guidelines.md#7102-error-condition-responses).

GenericEmbeddingSettings

ConnectionEmbeddingSettings

Name Type Description
connection BaseConnection:

BaseConnection
A connection to a resource.

deploymentName

string

modelName

string

IngestionJobCompletionAction

The completion action.

Name Type Description
cleanUpAssets

string

Will clean up intermediate assets created during the ingestion process.

keepAllAssets

string

Will not clean up any of the intermediate assets created during the ingestion process.

IngestionJobSystemCompute

Name Type Description
completionAction

IngestionJobCompletionAction

The completion action.

dataRefreshIntervalInHours

integer

datasource SystemComputeDatasource:

SystemComputeDatasource

jobId

string

kind string:

system

IngestionJobType
The job type.

searchServiceConnection BaseConnection:

BaseConnection
A connection to a resource.

IngestionJobType

IngestionJobType

Name Type Description
system

string

Jobs that run on service owned resources.

user

string

Jobs that run on user owned workspace.

IngestionJobUserCompute

Name Type Description
compute JobCompute:

JobCompute
The compute settings of the job.

dataRefreshIntervalInHours

integer

datasource UserComputeDatasource:

UserComputeDatasource

jobId

string

kind string:

user

IngestionJobType
The job type.

target TargetIndex:

TargetIndex
Information about the index to be created.

workspaceId

string

InnerError

InnerError

Name Type Description
code

InnerErrorCode

InnerErrorCode
Inner error codes as defined in the Microsoft REST guidelines (https://github.com/microsoft/api-guidelines/blob/vNext/Guidelines.md#7102-error-condition-responses).

innererror

InnerError

InnerError
Inner error as defined in the Microsoft REST guidelines (https://github.com/microsoft/api-guidelines/blob/vNext/Guidelines.md#7102-error-condition-responses).

InnerErrorCode

InnerErrorCode

Name Type Description
invalidPayload

string

The request data is invalid for this operation.

PineconeIndex

Pinecone Index.

Name Type Description
connectionId

string

The id of the connection pointing to the pinecone.

kind string:

pinecone

The target type.

ServerlessCompute

Serverless compute.

Name Type Description
instanceCount

integer

The count of instances to run the job on.

kind string:

serverless

The compute type.

sku

string

SKU Level

SystemComputeDatasource

SystemComputeDatasource

Name Type Description
kind

DatasourceType

The datasource type.

SystemComputeStorage

SystemComputeStorage

Name Type Description
chunking

ChunkingSettings

ChunkingSettings
Chunking settings

connection BaseConnection:

BaseConnection
A connection to a resource.

containerName

string

container name

embeddings

GenericEmbeddingSettings[]

ConnectionEmbeddingSettings
Connection Embedding Settings

kind

DatasourceType

The datasource type.

SystemComputeUrl

SystemComputeUrl

Name Type Description
chunking

ChunkingSettings

ChunkingSettings
Chunking settings

connection BaseConnection:

BaseConnection
A connection to a resource.

containerName

string

container name

crawling

CrawlingSettings

CrawlingSettings
Crawling settings

embeddings

GenericEmbeddingSettings[]

ConnectionEmbeddingSettings
Connection Embedding Settings

kind

DatasourceType

The datasource type.

urls

string[]

TargetType

The target type.

Name Type Description
acs

string

Azure AI Search Index.

cosmosdb

string

CosmosDB Index.

pinecone

string

Pinecone Index.

UserComputeDataset

UserComputeStorage

Name Type Description
chunking

ChunkingSettings

ChunkingSettings
Chunking settings

datasetId

string

datasetType

string

embeddings

WorkspaceConnectionEmbeddingSettings[]

WorkspaceConnectionEmbeddingSettings
Connection id to the embedding model

kind string:

dataset

The datasource type.

UserComputeUrl

UserComputeUrl

Name Type Description
chunking

ChunkingSettings

ChunkingSettings
Chunking settings

crawling

CrawlingSettings

CrawlingSettings
Crawling settings

embeddings

WorkspaceConnectionEmbeddingSettings[]

WorkspaceConnectionEmbeddingSettings
Connection id to the embedding model

kind string:

urls

The datasource type.

urls

string[]

WorkspaceConnection

AML Workspace connection.

Name Type Description
connectionId

string

ConnectionId

kind

ConnectionType

The connection type.

WorkspaceConnectionEmbeddingSettings

WorkspaceConnectionEmbeddingSettings

Name Type Description
connectionId

string

deploymentName

string

modelName

string