Azure Computer Vision Cognitive Services - how to perform OCR on image captured via getUserMedia camera stream (JavaScript)

Question

Hello!

Am using the Computer Vision Cognitive Services (JavaScript) to build a web app where the user can use the device camera to take an image and have OCR performed on it.

Previously I used the JavaScript Tesseract library (https://tesseract.projectnaptha.com/) to do this and used the following code:

Open camera stream

var video = document.getElementById('video');
        if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
            const constraints = {
                "video": {
                    "facingMode": { "ideal": "environment" }
                }
            };
            navigator.mediaDevices.getUserMedia(constraints)
                .then(function(stream) {
                    video.srcObject = stream;
                    video.play();
                })
                .catch(function(err) {
                    unsupportedBrowser(); //custom function I wrote to execute if the browser didn't support camera stream
                });
        } else {
            unsupportedBrowser();
        }

Press capture button

function capture() {
    var canvas = document.getElementById('canvas');
    var context = canvas.getContext('2d');
    context.drawImage(video, 0, 0, 640, 480);
    let tesseractSettings = {
        lang: 'eng'
    };
    Tesseract.recognize(context, tesseractSettings).then(function(result) {
        var scannedText = result.text ? result.text.trim() : '';
        closeScanningUi(scannedText); //text from image scanned with Tesseract would be displayed in another UI
    }, false);
}

Essentially, a still from the camera stream would be taken when the user pressed the 'capture' button and then Tesseract would perform the OCR on it.

I want to use the Computer Vision Cognitive Service instead of Tesseract now because it's more accurate and works on a much wider variety of documents etc.

I've had a look at the tutorial here: https://github.com/Azure-Samples/cognitive-services-javascript-computer-vision-tutorial and have successfully gotten this example to work, but it requires the URL of an image to perform the OCR.

Is there a way to get Computer Vision Cognitive Services to perform the OCR on the still that is captured from my camera stream instead of from an image located at a specific URL?

Cheers!

Answer

Hi anonymous user-9414

We also tried the same in one of my projects. Where user can capture or upload an image and get the result on the basis of prediction.

Few things to remember:

Prediction API - you have to Post an AJAX request to url="your_endpoint" send Content-Type="application/octet-stream" and Prediction-key="your_prediction_key" in header and data in the form of octet-stream remember to make processData=false (So without any attempt to modify data by encoding as a query string)
Make sure you convert your image into binary format (convert base64 to raw binary data in javascript itself)

Below is the values need to send in ajax call. Where I was sending my captured or uploaded image data_uri to function dataURItoBlob(data_uri) to convert base64 to raw binary data.

{  
    'url': 'Your-Endpoint',  
 'method': 'POST',  
 'timeout': '0',  
 'processData': false,  
 'headers':{  
 'Content-Type': 'application/octet-stream',  
 'Prediction-key': 'Your-Prediction-Key'  
 },  
 'data': dataURItoBlob(data_uri)  
};  


 function dataURItoBlob(dataURI) {  
 // convert base64 to raw binary data held in a string  
 // doesn't handle URLEncoded DataURIs  
 var byteString = atob(dataURI.split(',')[1]);  
  
 // separate out the mime component  
 var mimeString = dataURI.split(',')[0].split(':')[1].split(';')[0]  
  
 // write the bytes of the string to an ArrayBuffer  
 var ab = new ArrayBuffer(byteString.length);  
 var ia = new Uint8Array(ab);  
 for (var i = 0; i < byteString.length; i++) {  
 ia[i] = byteString.charCodeAt(i);  
 }  
  
 // write the ArrayBuffer to a blob, and you're done  
 var bb = new Blob([ab], {type: mimeString});  
 return bb;  
 }

NOTE:

your_endpoint will get change if you make the iteration.
Security Recommendation =>Either follow the answer in https://learn.microsoft.com/en-us/answers/questions/121143/azure-computer-vision-api.html or call this Prediction API in the backend language so the user can't access your endpoint and Prediction-key as both are sensitive information.
Performance Recommendation => Better to first capture and then hit for Prediction API on another event (As might be possible that the captured image is by mistake - Avoid unrequited Prediction API calls)

Hope this will help you...

Regards,
Nikhil Gupta

Azure Computer Vision Cognitive Services - how to perform OCR on image captured via getUserMedia camera stream (JavaScript)

1 answer