Thanks for reaching out to us! There is a repo which the sample team is managing related question.
Could you please post your question in the repo so that the engineering team managing it can help?
Repo - https://github.com/microsoft/sample-app-aoai-chatGPT/issues
Generally, there are four ways may help you achieve this goal.
- Custom Web Interface: Develop a custom web application where users can upload images as input. You can use a frontend framework like React, Angular, or Vue.js combined with a backend framework (Node.js, Flask, Django, etc.). In the backend, you can handle image uploads, process them if needed (e.g., resizing, format conversion), and then pass them along with text inputs to the GPT-4o model for processing.
- API Gateway Integration: Use an API gateway service like Azure API Management to create an API endpoint that accepts image and text inputs. This endpoint can then forward the inputs to a serverless function (AWS Lambda, Google Cloud Functions, Azure Functions) or a backend service where GPT-4o is hosted. The function or service processes the inputs and returns the generated text output.
- Platform SDKs: If your deployment environment supports SDKs for integrating models, check if there's an SDK available for GPT-4o that includes image input capabilities. SDKs often provide pre-built components and APIs that simplify the integration process, allowing you to focus on application logic rather than low-level implementation details.
- Containerization: Containerize your GPT-4o model along with any necessary preprocessing and image handling logic using Docker or similar tools. Deploy these containers on a platform like Kubernetes for scalability and manageability. Your interface can then communicate with these containers via RESTful APIs or messaging queues.
I hope those cluses helps!
Regards,
Yutong
-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.