Vision in Azure OpenAI Assistants API

Question

Vision in Azure OpenAI Assistants API

Uralstech 20

I've been trying to get vision working with the Azure OpenAI Assistants API. So far, I've tried 3 things:

Upload the image as a file, and included it as an image_file, with the purpose "vision" (which works in the OpenAI Assistants API), in the thread message. This gives me the following "purpose contains an invalid purpose" error:
Same as method 1, but with the purpose "assistants". This gives me the following "gpt-4o-2024-11-20 does not support image message content types" error:
Upload the image a file and add it to the thread message as an attachment, as you would with other file types. This gives me the following "Files with extension [.png] are not supported for retrieval" error:

Uralstech 20 Reputation points

2025-04-22T17:35:45.7166667+00:00

Thank you Manas for the quick response. So, do I have to use Code Interpreter or custom functions to make the model interact with image inputs? This is not necessary for OpenAI's Assistants API, so is Vision just not supported by Azure OpenAI yet?
Uralstech 20 Reputation points

2025-04-22T17:57:23.0133333+00:00

I tried the code interpreter as you suggested and it does only OCR. I want the image to be described.
Manas Mohanty 5,620 Reputation points Microsoft External Staff Moderator

2025-04-23T05:33:52.0766667+00:00

Hi Uralstech

We have vision models to summarize the image which can use without assistant.

If the need is to describe the image, we need to use vision enabled models (like GPT 4 Turbo) clients in our custom function.

Thank you.
Manas Mohanty 5,620 Reputation points Microsoft External Staff Moderator

2025-04-24T05:39:11.2066667+00:00

Hi Uralstech

We are waiting to hear your inputs. Please let us know if you have further queries on it.

Thank you.
Uralstech 20 Reputation points

2025-04-24T07:08:27.9433333+00:00

As I understood, you cannot directly have images as input in the Assistants API. Is that correct? Meanwhile, I am using GPT-4o to describe the image and sending the description to the assistant.
Manas Mohanty 5,620 Reputation points Microsoft External Staff Moderator

2025-04-24T07:31:55.4266667+00:00

Yes Uralstech

You can provide image inputs to assistant in code interpreter option but not in file search option. Yes, you can use GPT 4o to describe the image in a custom function and send it to assistant.

Thank you for agreeing.
Uralstech 20 Reputation points

2025-04-24T08:33:08.0766667+00:00

Okay, thank you!

Accepted answer

0 additional answers

Your answer

Uralstech 20 Reputation points

2025-04-22T17:35:45.7166667+00:00

Thank you Manas for the quick response. So, do I have to use Code Interpreter or custom functions to make the model interact with image inputs? This is not necessary for OpenAI's Assistants API, so is Vision just not supported by Azure OpenAI yet?
Uralstech 20 Reputation points

2025-04-22T17:57:23.0133333+00:00

I tried the code interpreter as you suggested and it does only OCR. I want the image to be described.
Manas Mohanty 5,620 Reputation points Microsoft External Staff Moderator

2025-04-23T05:33:52.0766667+00:00

Hi Uralstech

We have vision models to summarize the image which can use without assistant.

If the need is to describe the image, we need to use vision enabled models (like GPT 4 Turbo) clients in our custom function.

Thank you.
Manas Mohanty 5,620 Reputation points Microsoft External Staff Moderator

2025-04-24T05:39:11.2066667+00:00

Hi Uralstech

We are waiting to hear your inputs. Please let us know if you have further queries on it.

Thank you.
Uralstech 20 Reputation points

2025-04-24T07:08:27.9433333+00:00

As I understood, you cannot directly have images as input in the Assistants API. Is that correct? Meanwhile, I am using GPT-4o to describe the image and sending the description to the assistant.
Manas Mohanty 5,620 Reputation points Microsoft External Staff Moderator

2025-04-24T07:31:55.4266667+00:00

Yes Uralstech

You can provide image inputs to assistant in code interpreter option but not in file search option. Yes, you can use GPT 4o to describe the image in a custom function and send it to assistant.

Thank you for agreeing.
Uralstech 20 Reputation points

2025-04-24T08:33:08.0766667+00:00

Okay, thank you!

Answer 1

Hi Uralstech

Here is the analysis and suggestion based on the errors you encountered:

Analysis

It seems you have used File search mode. Upload an image to vector store and have used file search mode.

Purpose "vision" is not valid: It seems that the purpose tag "vision" is not supported. Please check the supported purpose tags here.
"gpt-4o-2024-11-20 does not support image message content types: The model "gpt-4o-2024-11-20" does not support image message content types such as .jpg and .png directly in assistants with file search mode. These formats are supported in code interpreter mode.
File search mode does not support png or jpg: The file search mode does not support image formats like .png or .jpg. For more information, you can refer to Supported file types documentation for file search

Remedial

1.Please use Code interpreter documentation to interact with images, Debugging and asking advisory on code images.

Attached screenshot from code interpreter trials with gpt-4-vision (version:turbo-2024-04-09) for reference. Prompt given here was - "Could you analyze the attached image and tell whether Functions/Tools is supported for O1 preview models"

Screenshot (100)

Screenshot (101)

2.To interact with images and get the any other desired behavior, you might need to use a custom function that utilizes an image-enabled model.

Hope it helps.

Thank you

Share via

Vision in Azure OpenAI Assistants API

0 additional answers

Your answer