Azure OpenAI text completion input binding for Azure Functions

Article
05/23/2024

Important

The Azure OpenAI extension for Azure Functions is currently in preview.

The Azure OpenAI text completion input binding allows you to bring the results text completion APIs into your code executions. You can define the binding to use both predefined prompts with parameters or pass through an entire prompt.

For information on setup and configuration details of the Azure OpenAI extension, see Azure OpenAI extensions for Azure Functions. To learn more about Azure OpenAI completions, see Learn how to generate or manipulate text.

Note

References and examples are only provided for the Node.js v4 model.

Note

References and examples are only provided for the Python v2 model.

Note

While both C# process models are supported, only isolated worker model examples are provided.

Example

This example demonstrates the templating pattern, where the HTTP trigger function takes a name parameter and embeds it into a text prompt, which is then sent to the Azure OpenAI completions API by the extension. The response to the prompt is returned in the HTTP response.

[Function(nameof(WhoIs))]
public static HttpResponseData WhoIs(
    [HttpTrigger(AuthorizationLevel.Function, Route = "whois/{name}")] HttpRequestData req,
    [TextCompletionInput("Who is {name}?", Model = "%CHAT_MODEL_DEPLOYMENT_NAME%")] TextCompletionResponse response)
{
    HttpResponseData responseData = req.CreateResponse(HttpStatusCode.OK);
    responseData.WriteString(response.Content);
    return responseData;
}

This example takes a prompt as input, sends it directly to the completions API, and returns the response as the output.

[Function(nameof(GenericCompletion))]
public static HttpResponseData GenericCompletion(
    [HttpTrigger(AuthorizationLevel.Function, "post")] HttpRequestData req,
    [TextCompletionInput("{Prompt}", Model = "%CHAT_MODEL_DEPLOYMENT_NAME%")] TextCompletionResponse response,
    ILogger log)
{
    HttpResponseData responseData = req.CreateResponse(HttpStatusCode.OK);
    responseData.WriteString(response.Content);
    return responseData;
}

@FunctionName("WhoIs")
public HttpResponseMessage whoIs(
    @HttpTrigger(
        name = "req", 
        methods = {HttpMethod.GET},
        authLevel = AuthorizationLevel.ANONYMOUS, 
        route = "whois/{name}") 
        HttpRequestMessage<Optional<String>> request,
    @BindingName("name") String name,
    @TextCompletion(prompt = "Who is {name}?", model = "%CHAT_MODEL_DEPLOYMENT_NAME%", name = "response") TextCompletionResponse response,
    final ExecutionContext context) {
    return request.createResponseBuilder(HttpStatus.OK)
        .header("Content-Type", "application/json")
        .body(response.getContent())
        .build();
}

This example takes a prompt as input, sends it directly to the completions API, and returns the response as the output.

@FunctionName("GenericCompletion")
public HttpResponseMessage genericCompletion(
    @HttpTrigger(
        name = "req", 
        methods = {HttpMethod.POST},
        authLevel = AuthorizationLevel.ANONYMOUS) 
        HttpRequestMessage<Optional<String>> request,
    @TextCompletion(prompt = "{prompt}", model = "%CHAT_MODEL_DEPLOYMENT_NAME%", name = "response") TextCompletionResponse response,
    final ExecutionContext context) {
    return request.createResponseBuilder(HttpStatus.OK)
        .header("Content-Type", "application/json")
        .body(response.getContent())
        .build();
}

Examples aren't yet available.

import { app, input } from "@azure/functions";

// This OpenAI completion input requires a {name} binding value.
const openAICompletionInput = input.generic({
    prompt: 'Who is {name}?',
    maxTokens: '100',
    type: 'textCompletion',
    model: '%CHAT_MODEL_DEPLOYMENT_NAME%'
})

app.http('whois', {
    methods: ['GET'],
    route: 'whois/{name}',
    authLevel: 'function',
    extraInputs: [openAICompletionInput],
    handler: async (_request, context) => {
        var response: any = context.extraInputs.get(openAICompletionInput)
        return { body: response.content.trim() }
    }
});

Here's the function.json file for TextCompletionResponse:

{
  "bindings": [
    {
      "authLevel": "function",
      "type": "httpTrigger",
      "direction": "in",
      "name": "Request",
      "route": "whois/{name}",
      "methods": [
        "get"
      ]
    },
    {
      "type": "http",
      "direction": "out",
      "name": "Response"
    },
    {
      "type": "textCompletion",
      "direction": "in",
      "name": "TextCompletionResponse",
      "prompt": "Who is {name}?",
      "maxTokens": "100",
      "model": "%CHAT_MODEL_DEPLOYMENT_NAME%"
    }
  ]
}

For more information about function.json file properties, see the Configuration section.

The code simply returns the text from the completion API as the response:

using namespace System.Net

param($Request, $TriggerMetadata, $TextCompletionResponse)

Push-OutputBinding -Name Response -Value ([HttpResponseContext]@{
        StatusCode = [HttpStatusCode]::OK
        Body       = $TextCompletionResponse.Content
    })

@app.route(route="whois/{name}", methods=["GET"])
@app.text_completion_input(arg_name="response", prompt="Who is {name}?", max_tokens="100", model = "%CHAT_MODEL_DEPLOYMENT_NAME%")
def whois(req: func.HttpRequest, response: str) -> func.HttpResponse:
    response_json = json.loads(response)
    return func.HttpResponse(response_json["content"], status_code=200)

This example takes a prompt as input, sends it directly to the completions API, and returns the response as the output.

@app.route(route="genericcompletion", methods=["POST"])
@app.text_completion_input(arg_name="response", prompt="{Prompt}", model = "%CHAT_MODEL_DEPLOYMENT_NAME%")
def genericcompletion(req: func.HttpRequest, response: str) -> func.HttpResponse:
    response_json = json.loads(response)
    return func.HttpResponse(response_json["content"], status_code=200)

Attributes

The specific attribute you apply to define a text completion input binding depends on your C# process mode.

Isolated process
In-process

In the isolated worker model, apply TextCompletionInput to define a text completion input binding.

The attribute supports these parameters:

Parameter	Description
Prompt	Gets or sets the prompt to generate completions for, encoded as a string.
Model	Gets or sets the ID of the model to use as a string, with a default value of `gpt-3.5-turbo`.
Temperature	Optional. Gets or sets the sampling temperature to use, as a string between `0` and `2`. Higher values (`0.8`) make the output more random, while lower values like (`0.2`) make output more focused and deterministic. You should use either `Temperature` or `TopP`, but not both.
TopP	Optional. Gets or sets an alternative to sampling with temperature, called nucleus sampling, as a string. In this sampling method, the model considers the results of the tokens with `top_p` probability mass. So `0.1` means only the tokens comprising the top 10% probability mass are considered. You should use either `Temperature` or `TopP`, but not both.
MaxTokens	Optional. Gets or sets the maximum number of tokens to generate in the completion, as a string with a default of `100`. The token count of your prompt plus `max_tokens` can't exceed the model's context length. Most models have a context length of 2,048 tokens (except for the newest models, which support 4096).

Annotations

The TextCompletion annotation enables you to define a text completion input binding, which supports these parameters:

Element	Description
name	Gets or sets the name of the input binding.
prompt	Gets or sets the prompt to generate completions for, encoded as a string.
model	Gets or sets the ID of the model to use as a string, with a default value of `gpt-3.5-turbo`.
temperature	Optional. Gets or sets the sampling temperature to use, as a string between `0` and `2`. Higher values (`0.8`) make the output more random, while lower values like (`0.2`) make output more focused and deterministic. You should use either `Temperature` or `TopP`, but not both.
topP	Optional. Gets or sets an alternative to sampling with temperature, called nucleus sampling, as a string. In this sampling method, the model considers the results of the tokens with `top_p` probability mass. So `0.1` means only the tokens comprising the top 10% probability mass are considered. You should use either `Temperature` or `TopP`, but not both.
maxTokens	Optional. Gets or sets the maximum number of tokens to generate in the completion, as a string with a default of `100`. The token count of your prompt plus `max_tokens` can't exceed the model's context length. Most models have a context length of 2,048 tokens (except for the newest models, which support 4096).

Decorators

During the preview, define the input binding as a generic_input_binding binding of type textCompletion, which supports these parameters:

Parameter	Description
arg_name	The name of the variable that represents the binding parameter.
prompt	Gets or sets the prompt to generate completions for, encoded as a string.
model	Gets or sets the ID of the model to use as a string, with a default value of `gpt-3.5-turbo`.
temperature	Optional. Gets or sets the sampling temperature to use, as a string between `0` and `2`. Higher values (`0.8`) make the output more random, while lower values like (`0.2`) make output more focused and deterministic. You should use either `Temperature` or `TopP`, but not both.
top_p	Optional. Gets or sets an alternative to sampling with temperature, called nucleus sampling, as a string. In this sampling method, the model considers the results of the tokens with `top_p` probability mass. So `0.1` means only the tokens comprising the top 10% probability mass are considered. You should use either `Temperature` or `TopP`, but not both.
max_tokens	Optional. Gets or sets the maximum number of tokens to generate in the completion, as a string with a default of `100`. The token count of your prompt plus `max_tokens` can't exceed the model's context length. Most models have a context length of 2,048 tokens (except for the newest models, which support 4096).

Configuration

The binding supports these configuration properties that you set in the function.json file.

Property	Description
type	Must be `textCompletion`.
direction	Must be `in`.
name	The name of the input binding.
prompt	Gets or sets the prompt to generate completions for, encoded as a string.
model	Gets or sets the ID of the model to use as a string, with a default value of `gpt-3.5-turbo`.
temperature	Optional. Gets or sets the sampling temperature to use, as a string between `0` and `2`. Higher values (`0.8`) make the output more random, while lower values like (`0.2`) make output more focused and deterministic. You should use either `Temperature` or `TopP`, but not both.
topP	Optional. Gets or sets an alternative to sampling with temperature, called nucleus sampling, as a string. In this sampling method, the model considers the results of the tokens with `top_p` probability mass. So `0.1` means only the tokens comprising the top 10% probability mass are considered. You should use either `Temperature` or `TopP`, but not both.
maxTokens	Optional. Gets or sets the maximum number of tokens to generate in the completion, as a string with a default of `100`. The token count of your prompt plus `max_tokens` can't exceed the model's context length. Most models have a context length of 2,048 tokens (except for the newest models, which support 4096).

Configuration

The binding supports these properties, which are defined in your code:

Property	Description
prompt	Gets or sets the prompt to generate completions for, encoded as a string.
model	Gets or sets the ID of the model to use as a string, with a default value of `gpt-3.5-turbo`.
temperature	Optional. Gets or sets the sampling temperature to use, as a string between `0` and `2`. Higher values (`0.8`) make the output more random, while lower values like (`0.2`) make output more focused and deterministic. You should use either `Temperature` or `TopP`, but not both.
topP	Optional. Gets or sets an alternative to sampling with temperature, called nucleus sampling, as a string. In this sampling method, the model considers the results of the tokens with `top_p` probability mass. So `0.1` means only the tokens comprising the top 10% probability mass are considered. You should use either `Temperature` or `TopP`, but not both.
maxTokens	Optional. Gets or sets the maximum number of tokens to generate in the completion, as a string with a default of `100`. The token count of your prompt plus `max_tokens` can't exceed the model's context length. Most models have a context length of 2,048 tokens (except for the newest models, which support 4096).

Usage

See the Example section for complete examples.

Share via

Azure OpenAI text completion input binding for Azure Functions

Example

Attributes

Annotations

Decorators

Configuration

Configuration

Usage

Feedback

Feedback

Additional resources

Share via

Azure OpenAI text completion input binding for Azure Functions

Example

Attributes

Annotations

Decorators

Configuration

Configuration

Usage

Related content

Feedback

Feedback

Additional resources