OpenAI GPT-4V (preview)

Article
08/28/2024

OpenAI GPT-4V tool enables you to use OpenAI's GPT-4 with vision, also referred to as GPT-4V or gpt-4-vision-preview in the API, to take images as input and answer questions about them.

Important

OpenAI GPT-4V tool is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Prerequisites

Create OpenAI resources
- Make an account on the OpenAI website
- Sign in and find personal API key.
Get Access to GPT-4 API

To use GPT-4 with vision, you need access to GPT-4 API. To learn more, see how to get access to GPT-4 API

Connection

Set up connections to provisioned resources in prompt flow.

Type	Name	API KEY
OpenAI	Required	Required

Inputs

Name	Type	Description	Required
connection	OpenAI	The OpenAI connection to be used in the tool.	Yes
model	string	The language model to use, currently only support gpt-4-vision-preview.	Yes
prompt	string	Text prompt that the language model uses to generate its response. The Jinja template for composing prompts in this tool follows a similar structure to the chat API in the LLM tool. To represent an image input within your prompt, you can use the syntax `![image]({{INPUT NAME}})`. Image input can be passed in the `user`, `system` and `assistant` messages.	Yes
max_tokens	integer	The maximum number of tokens to generate in the response. Default is a low value decided by OpenAI API.	No
temperature	float	The randomness of the generated text. Default is 1.	No
stop	list	The stopping sequence for the generated text. Default is null.	No
top_p	float	The probability of using the top choice from the generated tokens. Default is 1.	No
presence_penalty	float	Value that controls the model's behavior regarding repeating phrases. Default is 0.	No
frequency_penalty	float	Value that controls the model's behavior regarding generating rare phrases. Default is 0.	No

Outputs

Return Type	Description
string	The text of one response of conversation

Next step

Learn more about how to process images in prompt flow.

Share via