Hi Giacomo Maccagni
I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer.
Ask: Phi-4-Multimodal-Instruct - Unable to Send Audio Input to Phi-4-Multimodal-Instruct – "Invalid Input" Error
Solution: The issue is resolved. That You solved the Audio issue. Since the correct format to use for the request is:
payload_text = {
"model": 'Phi-4-multimodal-instruct',
"messages": [
{
"role": "system",
# "content": 'What is the result of 1+1?'
"content": [
{"text": "What''s in this image?", "type": "text"},
{"image_url": {"url": f"data:image/jpeg;base64,{image_b64}", "detail":"low"}, "type": "image_url"},
{
"audio_url": {
"url": f"data:audio/mp3;base64,{audio_b64}",
"format": "mp3"
}, "type": "audio_url"
}
]
}
],
"temperature": 0.10,
"top_p": 0.70,
"stream": stream
}
If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.
If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.
Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.
Thank you.