hOCR generator sample skill for AI search
This custom skill generates an hOCR document from the output of the OCR skill.
Requirements
This skill has no additional requirements than the ones described in the root README.md
file.
Settings
This function doesn't require any application settings.
Deployment
Sample Input:
{
"values": [
{
"recordId": "r1",
"data": {
"ocrImageMetadataList": [
{
"layoutText": {
"language": "en",
"text": "Hello World. -John",
"lines": [
{
"boundingBox": [
{ "x": 10, "y": 10 },
{ "x": 50, "y": 10 },
{ "x": 50, "y": 30 },
{ "x": 10, "y": 30 }
],
"text": "Hello World."
},
{
"boundingBox": [
{ "x": 110, "y": 10 },
{ "x": 150, "y": 10 },
{ "x": 150, "y": 30 },
{ "x": 110, "y": 30 }
],
"text": "-John"
}
],
"words": [
{
"boundingBox": [
{ "x": 10, "y": 10 },
{ "x": 50, "y": 10 },
{ "x": 50, "y": 14 },
{ "x": 10, "y": 14 }
],
"text": "Hello"
},
{
"boundingBox": [
{ "x": 10, "y": 16 },
{ "x": 50, "y": 16 },
{ "x": 50, "y": 30 },
{ "x": 10, "y": 30 }
],
"text": "World."
},
{
"boundingBox": [
{ "x": 110, "y": 10 },
{ "x": 150, "y": 10 },
{ "x": 150, "y": 30 },
{ "x": 110, "y": 30 }
],
"text": "-John"
}
]
},
"imageStoreUri": "https://[somestorageaccount].blob.core.windows.net/pics/lipsum.tiff",
"width": 40,
"height": 200
}
],
"wordAnnotations": [
{
"value": "Hello",
"description": "An annotation on 'Hello'"
}
]
}
}
]
}
Sample Output:
{
"values": [
{
"recordId": "r1",
"data": {
"hocrDocument": {
"metadata": "\r\n <?xml version='1.0' encoding='UTF-8'?>\r\n <!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>\r\n <html xmlns='http://www.w3.org/1999/xhtml' xml:lang='en' lang='en'>\r\n <head>\r\n <title></title>\r\n <meta http-equiv='Content-Type' content='text/html;charset=utf-8' />\r\n <meta name='ocr-system' content='Microsoft Cognitive Services' />\r\n <meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par ocr_line ocrx_word'/>\r\n </head>\r\n <body>\r\n<div class='ocr_page' id='page_0' title='image \"https://[somestorageaccount].blob.core.windows.net/pics/lipsum.tiff\"; bbox 0 0 40 200; ppageno 0'>\r\n<div class='ocr_carea' id='block_0_1'>\r\n<span class='ocr_line' id='line_0_0' title='baseline -0.002 -5; x_size 30; x_descenders 6; x_ascenders 6'>\r\n<span class='ocrx_word' id='word_0_0_0' title='bbox 10 10 50 14' data-annotation='An annotation on 'Hello''>Hello</span>\r\n<span class='ocrx_word' id='word_0_0_1' title='bbox 10 16 50 30' >World.</span>\r\n</span>\r\n<span class='ocr_line' id='line_0_1' title='baseline -0.002 -5; x_size 30; x_descenders 6; x_ascenders 6'>\r\n<span class='ocrx_word' id='word_0_1_2' title='bbox 110 10 150 30' >-John</span>\r\n</span>\r\n</div>\r\n</div>\r\n\r\n</body></html>",
"text": "Hello World. -John "
}
},
"errors": [],
"warnings": []
}
]
}
Sample Skillset Integration
In order to use this skill in a AI search pipeline, you'll need to add a skill definition to your skillset. Here's a sample skill definition for this example (inputs and outputs should be updated to reflect your particular scenario and skillset environment):
{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"description": "Generate HOCR for webpage rendering",
"uri": "[AzureFunctionEndpointUrl]/api/hocr-generator?code=[AzureFunctionDefaultHostKey]",
"batchSize": 1,
"context": "/document",
"inputs": [
{
"name": "ocrImageMetadataList",
"source": "/document/normalized_images/*/ocrImageMetadata"
},
{
"name": "wordAnnotations",
"source": "/document/acronyms"
}
],
"outputs": [
{
"name": "hocrDocument",
"targetName": "hocrDocument"
}
]
}