Share via


Recognize Printed Text - Recognize Printed Text

Optical Character Recognition (OCR) detects text in an image and extracts the recognized characters into a machine-usable character stream. Upon success, the OCR results will be returned. Upon failure, the error code together with an error message will be returned. The error code can be one of InvalidImageUrl, InvalidImageFormat, InvalidImageSize, NotSupportedImage, NotSupportedLanguage, or InternalServerError.

POST {Endpoint}/vision/v2.1/ocr?detectOrientation={detectOrientation}
POST {Endpoint}/vision/v2.1/ocr?detectOrientation={detectOrientation}&language={language}

URI Parameters

Name In Required Type Description
Endpoint
path True

string

Supported Cognitive Services endpoints.

detectOrientation
query True

boolean

Whether detect the text orientation in the image. With detectOrientation=true the OCR service tries to detect the image orientation and correct it before further processing (e.g. if it's upside-down).

language
query

OcrLanguages

The BCP-47 language code of the text to be detected in the image. The default value is 'unk'.

Request Header

Name Required Type Description
Ocp-Apim-Subscription-Key True

string

Request Body

Name Required Type Description
url True

string

Publicly reachable URL of an image.

Responses

Name Type Description
200 OK

OcrResult

The OCR results in the hierarchy of region/line/word. The results include text, bounding box for regions, lines and words. The angle, in radians, of the detected text with respect to the closest horizontal or vertical direction. After rotating the input image clockwise by this angle, the recognized text lines become horizontal or vertical. In combination with the orientation property it can be used to overlay recognition results correctly on the original image, by rotating either the original image or recognition results by a suitable angle around the center of the original image. If the angle cannot be confidently detected, this property is not present. If the image contains text at different angles, only part of the text will be recognized correctly.

Other Status Codes

ComputerVisionError

Error response.

Security

Ocp-Apim-Subscription-Key

Type: apiKey
In: header

Examples

Successful RecognizePrintedText request

Sample request

POST https://westus.api.cognitive.microsoft.com/vision/v2.1/ocr?detectOrientation=true&language=en


"{url}"

Sample response

{
  "language": "en",
  "textAngle": -2.0000000000000338,
  "orientation": "Up",
  "regions": [
    {
      "boundingBox": "462,379,497,258",
      "lines": [
        {
          "boundingBox": "462,379,497,74",
          "words": [
            {
              "boundingBox": "462,379,41,73",
              "text": "A"
            },
            {
              "boundingBox": "523,379,153,73",
              "text": "GOAL"
            },
            {
              "boundingBox": "694,379,265,74",
              "text": "WITHOUT"
            }
          ]
        },
        {
          "boundingBox": "565,471,289,74",
          "words": [
            {
              "boundingBox": "565,471,41,73",
              "text": "A"
            },
            {
              "boundingBox": "626,471,150,73",
              "text": "PLAN"
            },
            {
              "boundingBox": "801,472,53,73",
              "text": "IS"
            }
          ]
        },
        {
          "boundingBox": "519,563,375,74",
          "words": [
            {
              "boundingBox": "519,563,149,74",
              "text": "JUST"
            },
            {
              "boundingBox": "683,564,41,72",
              "text": "A"
            },
            {
              "boundingBox": "741,564,153,73",
              "text": "WISH"
            }
          ]
        }
      ]
    }
  ]
}

Definitions

Name Description
ComputerVisionError

Details about the API request error.

ComputerVisionErrorCodes

The error code.

ImageUrl
OcrLanguages

The BCP-47 language code of the text to be detected in the image. The default value is 'unk'.

OcrLine

An object describing a single recognized line of text.

OcrRegion

A region consists of multiple lines (e.g. a column of text in a multi-column document).

OcrResult
OcrWord

Information on a recognized word.

ComputerVisionError

Details about the API request error.

Name Type Description
code

ComputerVisionErrorCodes

The error code.

message

string

A message explaining the error reported by the service.

requestId

string

A unique request identifier.

ComputerVisionErrorCodes

The error code.

Value Description
InvalidImageFormat
UnsupportedMediaType
InvalidImageUrl
NotSupportedFeature
NotSupportedImage
Timeout
InternalServerError
InvalidImageSize
BadArgument
DetectFaceError
NotSupportedLanguage
InvalidThumbnailSize
InvalidDetails
InvalidModel
CancelledRequest
NotSupportedVisualFeature
FailedToProcess
Unspecified
StorageException

ImageUrl

Name Type Description
url

string

Publicly reachable URL of an image.

OcrLanguages

The BCP-47 language code of the text to be detected in the image. The default value is 'unk'.

Value Description
unk
zh-Hans
zh-Hant
cs
da
nl
en
fi
fr
de
el
hu
it
ja
ko
nb
pl
pt
ru
es
sv
tr
ar
ro
sr-Cyrl
sr-Latn
sk

OcrLine

An object describing a single recognized line of text.

Name Type Description
boundingBox

string

Bounding box of a recognized line. The four integers represent the x-coordinate of the left edge, the y-coordinate of the top edge, width, and height of the bounding box, in the coordinate system of the input image, after it has been rotated around its center according to the detected text angle (see textAngle property), with the origin at the top-left corner, and the y-axis pointing down.

words

OcrWord[]

An array of objects, where each object represents a recognized word.

OcrRegion

A region consists of multiple lines (e.g. a column of text in a multi-column document).

Name Type Description
boundingBox

string

Bounding box of a recognized region. The four integers represent the x-coordinate of the left edge, the y-coordinate of the top edge, width, and height of the bounding box, in the coordinate system of the input image, after it has been rotated around its center according to the detected text angle (see textAngle property), with the origin at the top-left corner, and the y-axis pointing down.

lines

OcrLine[]

An array of recognized lines of text.

OcrResult

Name Type Description
language

string

The BCP-47 language code of the text in the image.

orientation

string

Orientation of the text recognized in the image, if requested. The value (up, down, left, or right) refers to the direction that the top of the recognized text is facing, after the image has been rotated around its center according to the detected text angle (see textAngle property). If detection of the orientation was not requested, or no text is detected, the value is 'NotDetected'.

regions

OcrRegion[]

An array of objects, where each object represents a region of recognized text.

textAngle

number (double)

The angle, in radians, of the detected text with respect to the closest horizontal or vertical direction. After rotating the input image clockwise by this angle, the recognized text lines become horizontal or vertical. In combination with the orientation property it can be used to overlay recognition results correctly on the original image, by rotating either the original image or recognition results by a suitable angle around the center of the original image. If the angle cannot be confidently detected, this property is not present. If the image contains text at different angles, only part of the text will be recognized correctly.

OcrWord

Information on a recognized word.

Name Type Description
boundingBox

string

Bounding box of a recognized word. The four integers represent the x-coordinate of the left edge, the y-coordinate of the top edge, width, and height of the bounding box, in the coordinate system of the input image, after it has been rotated around its center according to the detected text angle (see textAngle property), with the origin at the top-left corner, and the y-axis pointing down.

text

string

String value of a recognized word.