Azure OpenAI text completion input binding for Azure Functions

Important

The Azure OpenAI extension for Azure Functions is currently in preview.

The Azure OpenAI text completion input binding allows you to bring the results text completion APIs into your code executions. You can define the binding to use both predefined prompts with parameters or pass through an entire prompt.

For information on setup and configuration details of the Azure OpenAI extension, see Azure OpenAI extensions for Azure Functions. To learn more about Azure OpenAI completions, see Learn how to generate or manipulate text.

Note

References and examples are only provided for the Node.js v4 model.

Note

References and examples are only provided for the Python v2 model.

Note

While both C# process models are supported, only isolated worker model examples are provided.

Example

This example demonstrates the templating pattern, where the HTTP trigger function takes a name parameter and embeds it into a text prompt, which is then sent to the Azure OpenAI completions API by the extension. The response to the prompt is returned in the HTTP response.

[Function(nameof(WhoIs))]
public static HttpResponseData WhoIs(
    [HttpTrigger(AuthorizationLevel.Function, Route = "whois/{name}")] HttpRequestData req,
    [TextCompletionInput("Who is {name}?", Model = "%CHAT_MODEL_DEPLOYMENT_NAME%")] TextCompletionResponse response)
{
    HttpResponseData responseData = req.CreateResponse(HttpStatusCode.OK);
    responseData.WriteString(response.Content);
    return responseData;
}

This example takes a prompt as input, sends it directly to the completions API, and returns the response as the output.

[Function(nameof(GenericCompletion))]
public static HttpResponseData GenericCompletion(
    [HttpTrigger(AuthorizationLevel.Function, "post")] HttpRequestData req,
    [TextCompletionInput("{Prompt}", Model = "%CHAT_MODEL_DEPLOYMENT_NAME%")] TextCompletionResponse response,
    ILogger log)
{
    HttpResponseData responseData = req.CreateResponse(HttpStatusCode.OK);
    responseData.WriteString(response.Content);
    return responseData;
}

This example demonstrates the templating pattern, where the HTTP trigger function takes a name parameter and embeds it into a text prompt, which is then sent to the Azure OpenAI completions API by the extension. The response to the prompt is returned in the HTTP response.

@FunctionName("WhoIs")
public HttpResponseMessage whoIs(
    @HttpTrigger(
        name = "req", 
        methods = {HttpMethod.GET},
        authLevel = AuthorizationLevel.ANONYMOUS, 
        route = "whois/{name}") 
        HttpRequestMessage<Optional<String>> request,
    @BindingName("name") String name,
    @TextCompletion(prompt = "Who is {name}?", model = "%CHAT_MODEL_DEPLOYMENT_NAME%", name = "response") TextCompletionResponse response,
    final ExecutionContext context) {
    return request.createResponseBuilder(HttpStatus.OK)
        .header("Content-Type", "application/json")
        .body(response.getContent())
        .build();
}

This example takes a prompt as input, sends it directly to the completions API, and returns the response as the output.

@FunctionName("GenericCompletion")
public HttpResponseMessage genericCompletion(
    @HttpTrigger(
        name = "req", 
        methods = {HttpMethod.POST},
        authLevel = AuthorizationLevel.ANONYMOUS) 
        HttpRequestMessage<Optional<String>> request,
    @TextCompletion(prompt = "{prompt}", model = "%CHAT_MODEL_DEPLOYMENT_NAME%", name = "response") TextCompletionResponse response,
    final ExecutionContext context) {
    return request.createResponseBuilder(HttpStatus.OK)
        .header("Content-Type", "application/json")
        .body(response.getContent())
        .build();
}

Examples aren't yet available.

This example demonstrates the templating pattern, where the HTTP trigger function takes a name parameter and embeds it into a text prompt, which is then sent to the Azure OpenAI completions API by the extension. The response to the prompt is returned in the HTTP response.

import { app, input } from "@azure/functions";

// This OpenAI completion input requires a {name} binding value.
const openAICompletionInput = input.generic({
    prompt: 'Who is {name}?',
    maxTokens: '100',
    type: 'textCompletion',
    model: '%CHAT_MODEL_DEPLOYMENT_NAME%'
})

app.http('whois', {
    methods: ['GET'],
    route: 'whois/{name}',
    authLevel: 'function',
    extraInputs: [openAICompletionInput],
    handler: async (_request, context) => {
        var response: any = context.extraInputs.get(openAICompletionInput)
        return { body: response.content.trim() }
    }
});

This example demonstrates the templating pattern, where the HTTP trigger function takes a name parameter and embeds it into a text prompt, which is then sent to the Azure OpenAI completions API by the extension. The response to the prompt is returned in the HTTP response.

Here's the function.json file for TextCompletionResponse:

{
  "bindings": [
    {
      "authLevel": "function",
      "type": "httpTrigger",
      "direction": "in",
      "name": "Request",
      "route": "whois/{name}",
      "methods": [
        "get"
      ]
    },
    {
      "type": "http",
      "direction": "out",
      "name": "Response"
    },
    {
      "type": "textCompletion",
      "direction": "in",
      "name": "TextCompletionResponse",
      "prompt": "Who is {name}?",
      "maxTokens": "100",
      "model": "%CHAT_MODEL_DEPLOYMENT_NAME%"
    }
  ]
}

For more information about function.json file properties, see the Configuration section.

The code simply returns the text from the completion API as the response:

using namespace System.Net

param($Request, $TriggerMetadata, $TextCompletionResponse)

Push-OutputBinding -Name Response -Value ([HttpResponseContext]@{
        StatusCode = [HttpStatusCode]::OK
        Body       = $TextCompletionResponse.Content
    })

This example demonstrates the templating pattern, where the HTTP trigger function takes a name parameter and embeds it into a text prompt, which is then sent to the Azure OpenAI completions API by the extension. The response to the prompt is returned in the HTTP response.

@app.route(route="whois/{name}", methods=["GET"])
@app.text_completion_input(arg_name="response", prompt="Who is {name}?", max_tokens="100", model = "%CHAT_MODEL_DEPLOYMENT_NAME%")
def whois(req: func.HttpRequest, response: str) -> func.HttpResponse:
    response_json = json.loads(response)
    return func.HttpResponse(response_json["content"], status_code=200)

This example takes a prompt as input, sends it directly to the completions API, and returns the response as the output.

@app.route(route="genericcompletion", methods=["POST"])
@app.text_completion_input(arg_name="response", prompt="{Prompt}", model = "%CHAT_MODEL_DEPLOYMENT_NAME%")
def genericcompletion(req: func.HttpRequest, response: str) -> func.HttpResponse:
    response_json = json.loads(response)
    return func.HttpResponse(response_json["content"], status_code=200)

Attributes

The specific attribute you apply to define a text completion input binding depends on your C# process mode.

In the isolated worker model, apply TextCompletionInput to define a text completion input binding.

The attribute supports these parameters:

Parameter Description
Prompt Gets or sets the prompt to generate completions for, encoded as a string.
Model Gets or sets the ID of the model to use as a string, with a default value of gpt-3.5-turbo.
Temperature Optional. Gets or sets the sampling temperature to use, as a string between 0 and 2. Higher values (0.8) make the output more random, while lower values like (0.2) make output more focused and deterministic. You should use either Temperature or TopP, but not both.
TopP Optional. Gets or sets an alternative to sampling with temperature, called nucleus sampling, as a string. In this sampling method, the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. You should use either Temperature or TopP, but not both.
MaxTokens Optional. Gets or sets the maximum number of tokens to generate in the completion, as a string with a default of 100. The token count of your prompt plus max_tokens can't exceed the model's context length. Most models have a context length of 2,048 tokens (except for the newest models, which support 4096).

Annotations

The TextCompletion annotation enables you to define a text completion input binding, which supports these parameters:

Element Description
name Gets or sets the name of the input binding.
prompt Gets or sets the prompt to generate completions for, encoded as a string.
model Gets or sets the ID of the model to use as a string, with a default value of gpt-3.5-turbo.
temperature Optional. Gets or sets the sampling temperature to use, as a string between 0 and 2. Higher values (0.8) make the output more random, while lower values like (0.2) make output more focused and deterministic. You should use either Temperature or TopP, but not both.
topP Optional. Gets or sets an alternative to sampling with temperature, called nucleus sampling, as a string. In this sampling method, the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. You should use either Temperature or TopP, but not both.
maxTokens Optional. Gets or sets the maximum number of tokens to generate in the completion, as a string with a default of 100. The token count of your prompt plus max_tokens can't exceed the model's context length. Most models have a context length of 2,048 tokens (except for the newest models, which support 4096).

Decorators

During the preview, define the input binding as a generic_input_binding binding of type textCompletion, which supports these parameters:

Parameter Description
arg_name The name of the variable that represents the binding parameter.
prompt Gets or sets the prompt to generate completions for, encoded as a string.
model Gets or sets the ID of the model to use as a string, with a default value of gpt-3.5-turbo.
temperature Optional. Gets or sets the sampling temperature to use, as a string between 0 and 2. Higher values (0.8) make the output more random, while lower values like (0.2) make output more focused and deterministic. You should use either Temperature or TopP, but not both.
top_p Optional. Gets or sets an alternative to sampling with temperature, called nucleus sampling, as a string. In this sampling method, the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. You should use either Temperature or TopP, but not both.
max_tokens Optional. Gets or sets the maximum number of tokens to generate in the completion, as a string with a default of 100. The token count of your prompt plus max_tokens can't exceed the model's context length. Most models have a context length of 2,048 tokens (except for the newest models, which support 4096).

Configuration

The binding supports these configuration properties that you set in the function.json file.

Property Description
type Must be textCompletion.
direction Must be in.
name The name of the input binding.
prompt Gets or sets the prompt to generate completions for, encoded as a string.
model Gets or sets the ID of the model to use as a string, with a default value of gpt-3.5-turbo.
temperature Optional. Gets or sets the sampling temperature to use, as a string between 0 and 2. Higher values (0.8) make the output more random, while lower values like (0.2) make output more focused and deterministic. You should use either Temperature or TopP, but not both.
topP Optional. Gets or sets an alternative to sampling with temperature, called nucleus sampling, as a string. In this sampling method, the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. You should use either Temperature or TopP, but not both.
maxTokens Optional. Gets or sets the maximum number of tokens to generate in the completion, as a string with a default of 100. The token count of your prompt plus max_tokens can't exceed the model's context length. Most models have a context length of 2,048 tokens (except for the newest models, which support 4096).

Configuration

The binding supports these properties, which are defined in your code:

Property Description
prompt Gets or sets the prompt to generate completions for, encoded as a string.
model Gets or sets the ID of the model to use as a string, with a default value of gpt-3.5-turbo.
temperature Optional. Gets or sets the sampling temperature to use, as a string between 0 and 2. Higher values (0.8) make the output more random, while lower values like (0.2) make output more focused and deterministic. You should use either Temperature or TopP, but not both.
topP Optional. Gets or sets an alternative to sampling with temperature, called nucleus sampling, as a string. In this sampling method, the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. You should use either Temperature or TopP, but not both.
maxTokens Optional. Gets or sets the maximum number of tokens to generate in the completion, as a string with a default of 100. The token count of your prompt plus max_tokens can't exceed the model's context length. Most models have a context length of 2,048 tokens (except for the newest models, which support 4096).

Usage

See the Example section for complete examples.