hOCR generator sample skill for AI search

Code Sample
11/17/2023

Deploy to Azure Browse code

This custom skill generates an hOCR document from the output of the OCR skill.

Requirements

This skill has no additional requirements than the ones described in the root README.md file.

Settings

This function doesn't require any application settings.

Deployment

Sample Input:

{
	"values": [
	    {
	        "recordId": "r1",
	        "data": {
	            "ocrImageMetadataList": [
	                {
	                    "layoutText": {
	                        "language": "en",
	                        "text": "Hello World. -John",
	                        "lines": [
	                            {
	                                "boundingBox": [
	                                    { "x": 10, "y": 10 },
	                                    { "x": 50, "y": 10 },
	                                    { "x": 50, "y": 30 },
	                                    { "x": 10, "y": 30 }
	                                ],
	                                "text": "Hello World."
	                            },
	                            {
	                                "boundingBox": [
	                                    { "x": 110, "y": 10 },
	                                    { "x": 150, "y": 10 },
	                                    { "x": 150, "y": 30 },
	                                    { "x": 110, "y": 30 }
	                                ],
	                                "text": "-John"
	                            }
	                        ],
	                        "words": [
	                            {
	                                "boundingBox": [
	                                    { "x": 10, "y": 10 },
	                                    { "x": 50, "y": 10 },
	                                    { "x": 50, "y": 14 },
	                                    { "x": 10, "y": 14 }
	                                ],
	                                "text": "Hello"
	                            },
	                            {
	                                "boundingBox": [
	                                    { "x": 10, "y": 16 },
	                                    { "x": 50, "y": 16 },
	                                    { "x": 50, "y": 30 },
	                                    { "x": 10, "y": 30 }
	                                ],
	                                "text": "World."
	                            },
	                            {
	                                "boundingBox": [
	                                    { "x": 110, "y": 10 },
	                                    { "x": 150, "y": 10 },
	                                    { "x": 150, "y": 30 },
	                                    { "x": 110, "y": 30 }
	                                ],
	                                "text": "-John"
	                            }
	                        ]
	                    },
	                    "imageStoreUri": "https://[somestorageaccount].blob.core.windows.net/pics/lipsum.tiff",
	                    "width": 40,
	                    "height": 200
	                }
	            ],
	            "wordAnnotations": [
	                {
	                    "value": "Hello",
	                    "description": "An annotation on 'Hello'"
	                }
	            ]
	        }
	    }
	]
}

Sample Output:

{
    "values": [
        {
            "recordId": "r1",
            "data": {
                "hocrDocument": {
                    "metadata": "\r\n            <?xml version='1.0' encoding='UTF-8'?>\r\n            <!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>\r\n            <html xmlns='http://www.w3.org/1999/xhtml' xml:lang='en' lang='en'>\r\n            <head>\r\n                <title></title>\r\n                <meta http-equiv='Content-Type' content='text/html;charset=utf-8' />\r\n                <meta name='ocr-system' content='Microsoft Cognitive Services' />\r\n                <meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par ocr_line ocrx_word'/>\r\n            </head>\r\n            <body>\r\n<div class='ocr_page' id='page_0' title='image \"https://[somestorageaccount].blob.core.windows.net/pics/lipsum.tiff\"; bbox 0 0 40 200; ppageno 0'>\r\n<div class='ocr_carea' id='block_0_1'>\r\n<span class='ocr_line' id='line_0_0' title='baseline -0.002 -5; x_size 30; x_descenders 6; x_ascenders 6'>\r\n<span class='ocrx_word' id='word_0_0_0' title='bbox 10 10 50 14' data-annotation='An annotation on 'Hello''>Hello</span>\r\n<span class='ocrx_word' id='word_0_0_1' title='bbox 10 16 50 30' >World.</span>\r\n</span>\r\n<span class='ocr_line' id='line_0_1' title='baseline -0.002 -5; x_size 30; x_descenders 6; x_ascenders 6'>\r\n<span class='ocrx_word' id='word_0_1_2' title='bbox 110 10 150 30' >-John</span>\r\n</span>\r\n</div>\r\n</div>\r\n\r\n</body></html>",
                    "text": "Hello World. -John "
                }
            },
            "errors": [],
            "warnings": []
        }
    ]
}

Sample Skillset Integration

In order to use this skill in a AI search pipeline, you'll need to add a skill definition to your skillset. Here's a sample skill definition for this example (inputs and outputs should be updated to reflect your particular scenario and skillset environment):

{
    "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
    "description": "Generate HOCR for webpage rendering",
    "uri": "[AzureFunctionEndpointUrl]/api/hocr-generator?code=[AzureFunctionDefaultHostKey]",
    "batchSize": 1,
    "context": "/document",
    "inputs": [
        {
            "name": "ocrImageMetadataList",
            "source": "/document/normalized_images/*/ocrImageMetadata"
        },
        {
            "name": "wordAnnotations",
            "source": "/document/acronyms"
        }
    ],
    "outputs": [
        {
            "name": "hocrDocument",
            "targetName": "hocrDocument"
        }
    ]
}

Share via