Thank you for reaching out to Microsoft Q&A forum!
To send images as tokens for processing with Azure GPT-4o mini, you'll need to input the images into a format such as base64. Currently, GPT-4o mini supports both text and vision tasks. Ensure that your model deployment settings are configured to accept and process image data. As multimodal support for image, video, and audio inputs will expand in future updates, it's essential to stay informed about any enhancements that may further streamline this process.
I hope you understand! Thank you.