In the new playground, you can see the token breakdown here:
if you send the image, then enable [show JSON], you shall see the image is encoded into base64 format and sent to the server side.
based on the info from openai forum, the image_url -> detail field would control the image to be processed in high / low or auto mode. based on the discussion below, it seems to indicate the detail value is default to auto when unspecified. My thinking is that the vision model will process based on the actual resolution of the image. The best way to find out the actual consumed token for a request is to use a curl call to the endpoint with the json payload containing base64 image. you can get the curl sample using the View code
option then choose curl
. make sure to add curl -i
so that it prints out headers too.
image_url: { url: "https://your-image-url.com", detail: "low", }
https://community.openai.com/t/gpt-4-vision-preview-fidelity-detail-parameter/477563