AzureOpenAI with python add an image in prompt for GPT4o-mini

Simon Nakhoul 5 Reputation points
2024-10-04T08:46:14.1+00:00

Hi all, not sure if this is the place to ask the question, but here we go:

I am able to use the web interface of azure OpenAI studio in the chat playground to analyze images but I would like to do the same using python. It seems that it's not working and I could not (so far) find a reference online on how to include an image in the prompt. Could anyone please help or provide a reference on how I can do this? Or is it even possible ?

My code:

  import os
    from openai import AzureOpenAI
    
    client = AzureOpenAI(
        api_key=os.getenv("AZURE_OPENAI_API_KEY"),
        api_version=os.getenv("AZURE_API_VERSION"),
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    )
    
    deployment_name = "gpt-4o-mini"  
    # Send a completion call to generate an answer
    print("Sending a test completion job")
    #image local path or I provided the URL for the image in the blob storage. 
    image_input = r"c:/users/..../image.jpeg"
    prompt = "Tell me what do you see in this image ![image]({{image_input}}) "
    
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        temperature=0,
    )
    generated_text = response.choices[0].message.content
    print("Response:", generated_text)
Response: It seems that I can't view images directly. However, if you describe the image or provide details about its content, I can help analyze it or provide insights based on your description!
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,215 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Simon Nakhoul 5 Reputation points
    2024-10-17T08:28:11.5833333+00:00
    1 person found this answer helpful.
    0 comments No comments

  2. santoshkc 9,235 Reputation points Microsoft Vendor
    2024-10-04T11:42:43.5833333+00:00

    Hi @Simon Nakhoul,

    Thank you for reaching out to Microsoft Q&A forum!

    Currently, the Azure OpenAI API, including the GPT-4o model, does not natively support the direct analysis of images through prompts. Unlike the web interface in Azure OpenAI Studio, where you can upload images for analysis, the API version does not accept image files or URLs within the prompt for processing.

    As a workaround, you can use Azure Computer Vision to analyze the image and retrieve a description. Then, you can use that description as input to Azure OpenAI, but you cannot directly include the image in the prompts.
    Install Required Packages: pip install azure-cognitiveservices-vision-computervision openai

    Use the Following Code:

    import os
    from azure.cognitiveservices.vision.computervision import ComputerVisionClient
    from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
    from msrest.authentication import CognitiveServicesCredentials
    from openai import AzureOpenAI
    
    # Set your Azure Computer Vision credentials
    subscription_key = "YOUR_AZURE_SUBSCRIPTION_KEY" # Replace Azure Computer Vision key endpoint = "YOUR_AZURE_COMPUTER_VISION_ENDPOINT" # Replace with your Azure Computer Vision endpoint  
    
    # Set your Azure OpenAI API key
    openai_api_key = "YOUR_AZURE_OPENAI_API_KEY" # Replace with your Azure OpenAI API key
    
    # Create a Computer Vision client
    computervision_client = ComputerVisionClient(endpoint, CognitiveServicesCredentials(subscription_key))
    
    # Create an Azure OpenAI client
    openai_client = AzureOpenAI(
        api_key=openai_api_key,
        api_version="2023-05-15", # Ensure this is the correct version
        azure_endpoint="https://YOUR_AZURE_OPENAI_ENDPOINT" # Replace with your Azure OpenAI endpoint
    )
    
    def analyze_image(image_path):
        # Open the image file
        with open(image_path, "rb") as image_file:
            # Call the Azure API for image analysis
            response = computervision_client.analyze_image_in_stream(
                image_file,
                visual_features=[VisualFeatureTypes.description]
            )
    
        # Extract description from Azure's response
        if response.description.captions:
            description = response.description.captions[0].text
        else:
            description = "No description available."
        return description
    
    def generate_response_with_openai(image_description):
        prompt = f"Based on the following description of an image, please provide insights: {image_description}"
        messages = [{"role": "user", "content": prompt}]
        response = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            temperature=0.7,
        )
        return response.choices[0].message.content
    
    # Example usage
    if __name__ == "__main__":
        image_path = r"C:\Users\XXXXXXXXXXXXXXX\orange.jpg" # Replace with your image path
        image_description = analyze_image(image_path)
        print("Azure Description:", image_description)
    
        # Generate a response using OpenAI
        openai_response = generate_response_with_openai(image_description)
        print("OpenAI Generated Response:", openai_response)
    
    

    Output:
    User's image I hope you understand. Thank you.


  3. Simon Nakhoul 5 Reputation points
    2024-10-17T08:26:30.5+00:00
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.