How to send an input file to GPT4.o multi modal along with the text prompt

Baijnath Singh 60 Reputation points
2024-05-28T02:53:52.71+00:00

I am testing GPT4.o multi modal LLM , where my python application code should send a file along with text prompt. I am trying below code, but it is throwing error. Can some one suggest me right approach to send the file and text as prompt to GPT4.o , any input is much appreciated.

code snippet:

        if file:
            file_path = os.path.join(app.config['UPLOAD_FOLDER'], file.filename)
            file.save(file_path)
            print(f'File uploaded and saved at: {file_path}')

            mime_type, _ = guess_type(file_path)
            if mime_type is None:
                mime_type = 'application/octet-stream'  # Default MIME type if none is found
            
            # Read and encode the image file
            with open(file_path, "rb") as image_file:
                base64_encoded_data = base64.b64encode(image_file.read()).decode('utf-8')
            # Construct the data URL
            data_url = f"data:{mime_type};base64,{base64_encoded_data}"
        # Prepare the messages for the chat completion
        messages = [
            {
                "role": "system", 
                "content": f"You need to translate the following document into {target_language}"
            },
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "_url"
                        }
                    }
                ]
            },
        ]

If i dump data_url on console , it displays the encoded content of file.

Error during Rest API call:
{

"error": "Error code: 400 - {'error': {'code': 'BadRequest', 'message': 'Invalid image (base64) data.', 'param': None, 'type': None}}"

}

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,083 questions
{count} votes

Accepted answer
  1. navba-MSFT 27,540 Reputation points Microsoft Employee Moderator
    2024-05-28T05:09:10.3733333+00:00

    @Baijnath Singh Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    .

    I used the below sample for my Gpt4o model and it worked fine:

    .

    import base64
    import requests
    
    # Encode the image to base64
    sLongImageFn='MyImage.png'
    sImageData = base64.b64encode(open(sLongImageFn, 'rb').read()).decode('ascii')
    sEndpoint='https://XXXXXXXXXXX.openai.azure.com/'
    sKey='5f3fb92XXXXXXXXXXXXXXXXXXXXXX372c51'
    sDeployment='XXXXXXX'
    
    # dData is copied from the tutorial
    dData = {
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Describe this picture:"
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{sImageData}",
                        }
                    }
                ]
            }
        ],
        "max_tokens": 100,
        "stream": False
    }
    
    # Make the API request
    response = requests.post(
        f'{sEndpoint}openai/deployments/{sDeployment}/chat/completions?api-version=2024-02-01',
        headers={'api-key': sKey, 'Content-Type': 'application/json'},
        json=dData
    )
    
    # Print the response
    print(response.json())
    

    .

    Please note: Before running the sample, update the endpoint, keys and the deployment name accordingly.

    .

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.