Hello @Billy Zhou ,
It is probably limitation from model side.
So, possible workaround would be to make request with 1 image per request and append the summaries into list like below.
def summarize_images_one_by_one(image_paths: List[str]) -> str:
model = get_llama_maverick_instruct_llm()
summaries = []
for path in image_paths:
message = [
{
"role": "user",
"content": [
{"type": "text", "text": "Summarize the content of this image:"},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encode_image_to_base64(path)}"}},
],
}
]
response = model.invoke(message)
summaries.append(response.content)
return "\n".join(summaries)
Or combine the images into 1 and make single request,
from PIL import Image
import io
import base64
image_paths = [r" 2025-03-28 145848.png", r"2025-03-28 145737.png"]
images = [Image.open(x) for x in image_paths]
widths, heights = zip(*(i.size for i in images))
total_width = sum(widths)
max_height = max(heights)
new_im = Image.new('RGB', (total_width, max_height))
x_offset = 0
for im in images:
new_im.paste(im, (x_offset,0))
x_offset += im.size[0]
new_im.save('test.jpg')
def summarize_images(image_path: List[str]) -> str:
model = get_llama_maverick_instruct_llm()
image_contents = f"data:image/jpeg;base64,{encode_image_to_base64(image_path)}"
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Summarize the input image"},
{"type": "image_url", "image_url": {"url": image_contents}}
],
}
]
response = model.invoke(messages
)
return response
summary = summarize_images("test.jpg")
print("Summary:", summary)
Sample output i got:
'The image shows two Microsoft Azure portal windows side by side, with the left window displaying a PowerShell terminal and the right window showing a diagnostic settings page for a storage account.\n\n**Left Window: PowerShell Terminal**\n\n* The terminal is open to a directory path `/home/jaya/storage`\n* A Terraform plan is being executed, with the output displayed in the terminal\n* The plan involves creating and modifying resources, including a storage account and diagnostic settings\n* The output indicates that 1 resource will be added, 1 changed, and 0 destroyed\n* The user is prompted to confirm the actions by typing \'yes\'\n* After confirming, the Terraform apply command is executed, and the resources are created/modified successfully\n\n**Right Window: Diagnostic Settings Page**\n\n* The page is titled "samyustorage | Diagnostic settings" and displays the diagnostic settings for a storage account named "samyustorage"\n* The storage account is part of a resource group named "samyutha-terraform"\n* The diagnostic settings are enabled for the storage account, as well as for a blob storage account within it\n* Other storage accounts (queue, table, file) have their diagnostic settings disabled\n\n**Overall**\n\n* The image suggests that the user is using Terraform to manage Azure resources, including storage accounts and diagnostic settings\n* The Terraform plan and apply commands are being used to create and modify these resources\n* The diagnostic settings page provides a visual representation of the diagnostic settings for the storage account and its sub-resources.'
but i would recommend going with single image in a request till model supports.
If you have any query, please let us know in comments or private message.
Thank you