I am using the following request body
{
"enhancements": {
"ocr": {
"enabled": true
},
"grounding": {
"enabled": true
}
},
"dataSources": [
{
"type": "AzureComputerVision",
"parameters": {
"endpoint": "XXXX",
"key": "XXXX"
}
}],
"messages": [
{
"role": "system",
"content": "Using the image file provided, you will need to analyze the image based on the prompt and return an array of x and y coordinates for the top left and bottom right of each item detected. "
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Give me back all of the small icons for each item on the list. they should contain a small representation of the item with a solid blue shape of each gun"
},
{
"type": "image_url",
"image_url": {
"url":"http://yomotherboard.com/grounding-test.jpg"
}
}
]
}
],
"max_tokens": 4000,
"stream": false
}
The API responses with the following. It seems to have correctly picked up each item in the list as it has detected the correct name but it doesnt seem to be able to grab the correct coordinates for the icons. The icons start around 400px across.
{
"id": "xxxx",
"object": "chat.completion",
"created": 1710342480,
"model": "gpt-4",
"prompt_filter_results": [
{
"prompt_index": 0,
"content_filter_results": {
"hate": {
"filtered": false,
"severity": "safe"
},
"self_harm": {
"filtered": false,
"severity": "safe"
},
"sexual": {
"filtered": false,
"severity": "safe"
},
"violence": {
"filtered": false,
"severity": "safe"
}
}
}
],
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "The array of x and y coordinates for the top left and bottom right of each small icon (with a solid blue shape) for the items in the list are as follows:\n\n1. Machine Gun: [37, 175], [91, 229]\n2. Anti-Materiel Rifle: [37, 251], [91, 305]\n3. Stalwart: [37, 327], [91, 381]\n4. Expendable Anti-Tank: [37, 403], [91, 457]\n5. Recoilless Rifle: [37, 479], [91, 533]\n6. Flamethrower: [37, 555], [91, 609]\n7. Autocannon: [37, 631], [91, 685]\n8. Railgun: [37, 707], [91, 761]\n9. Spear: [37, 783], [91, 837]"
},
"content_filter_results": {
"hate": {
"filtered": false,
"severity": "safe"
},
"self_harm": {
"filtered": false,
"severity": "safe"
},
"sexual": {
"filtered": false,
"severity": "safe"
},
"violence": {
"filtered": false,
"severity": "safe"
}
},
"enhancements": {
"grounding": {
"lines": [
{
"text": "The array of x and y coordinates for the top left and bottom right of each small icon (with a solid blue shape) for the items in the list are as follows:\n\n1. Machine Gun: [37, 175], [91, 229]\n2. Anti-Materiel Rifle: [37, 251], [91, 305]\n3. Stalwart: [37, 327], [91, 381]\n4. Expendable Anti-Tank: [37, 403], [91, 457]\n5. Recoilless Rifle: [37, 479], [91, 533]\n6. Flamethrower: [37, 555], [91, 609]\n7. Autocannon: [37, 631], [91, 685]\n8. Railgun: [37, 707], [91, 761]\n9. Spear: [37, 783], [91, 837]",
"spans": []
}
],
"status": "Success"
}
}
}
],
"usage": {
"prompt_tokens": 1513,
"completion_tokens": 200,
"total_tokens": 1713
}
}