Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Text recognition, also known as optical character recognition (OCR), is supported in Windows AI Foundry through a set of artificial intelligence (AI)-backed APIs that can detect and extract text within images and convert it into machine readable character streams.
These APIs can identify characters, words, lines, polygonal text boundaries, and provide confidence levels for each match. They are also exclusively supported by hardware acceleration in in devices with a neural processing unit (NPU), making them faster and more accurate than the legacy Windows.Media.Ocr.OcrEngine APIs in the Windows platform SDK.
For API details, see API ref for Text Recognition (OCR).
Important
The following is a list of Windows AI features and the Windows App SDK release in which they are currently supported.
Version 1.8 Experimental (1.8.0-experimental1) - Object Erase, Phi Silica, LoRA fine-tuning for Phi Silica, Conversation Summarization (Text Intelligence)
Private preview - Semantic Search
Version 1.7.1 (1.7.250401001) - All other APIs
These APIs will only be functional on Windows Insider Preview (WIP) devices that have received the May 7th update. On May 28-29, an optional update will be released to non-WIP devices, followed by the Jun 10 update. This update will bring with it the AI models required for the Windows AI APIs to function. These updates will also require that any app using Windows AI APIs will be unable to do so until the app has been granted package identity at runtime.
What can I do with AI Text Recognition?
Use AI Text Recognition features to identify and recognize text in an image. You can also get the text boundaries and confidence scores for the recognized text.
Create an ImageBuffer from a file
In this WinUI example we call a LoadImageBufferFromFileAsync
function to get an ImageBuffer from an image file.
In the LoadImageBufferFromFileAsync function, we complete the following steps:
- Create a StorageFile object from the specified file path.
- Open a stream on the StorageFile using OpenAsync.
- Create a BitmapDecoder for the stream.
- Call GetSoftwareBitmapAsync on the bitmap decoder to get a SoftwareBitmap object.
- Return an image buffer from CreateBufferAttachedToBitmap.
using Microsoft.Windows.Vision;
using Microsoft.Graphics.Imaging;
using Windows.Graphics.Imaging;
using Windows.Storage;
using Windows.Storage.Streams;
public async Task<ImageBuffer> LoadImageBufferFromFileAsync(string filePath)
{
StorageFile file = await StorageFile.GetFileFromPathAsync(filePath);
IRandomAccessStream stream = await file.OpenAsync(FileAccessMode.Read);
BitmapDecoder decoder = await BitmapDecoder.CreateAsync(stream);
SoftwareBitmap bitmap = await decoder.GetSoftwareBitmapAsync();
if (bitmap == null)
{
return null;
}
return ImageBuffer.CreateBufferAttachedToBitmap(bitmap);
}
#include <iostream>
#include <sstream>
#include <winrt/Microsoft.Windows.AI.Imaging.h>
#include <winrt/Windows.Graphics.Imaging.h>
#include <winrt/Microsoft.Graphics.Imaging.h>
#include <winrt/Microsoft.UI.Xaml.Controls.h>
#include<winrt/Microsoft.UI.Xaml.Media.h>
#include<winrt/Microsoft.UI.Xaml.Shapes.h>
using namespace winrt;
using namespace Microsoft::UI::Xaml;
using namespace Microsoft::Windows::AI;
using namespace Microsoft::Windows::AI::Imaging;
using namespace winrt::Microsoft::UI::Xaml::Controls;
using namespace winrt::Microsoft::UI::Xaml::Media;
winrt::Windows::Foundation::IAsyncOperation<winrt::hstring>
MainWindow::RecognizeTextFromSoftwareBitmap(
Windows::Graphics::Imaging::SoftwareBitmap const& bitmap)
{
winrt::Microsoft::Windows::AI::Imaging::TextRecognizer textRecognizer =
EnsureModelIsReady().get();
Microsoft::Graphics::Imaging::ImageBuffer imageBuffer =
Microsoft::Graphics::Imaging::ImageBuffer::CreateForSoftwareBitmap(bitmap);
RecognizedText recognizedText =
textRecognizer.RecognizeTextFromImage(imageBuffer);
std::wstringstream stringStream;
for (const auto& line : recognizedText.Lines())
{
stringStream << line.Text().c_str() << std::endl;
}
co_return winrt::hstring{ stringStream.str()};
}
Recognize text in a bitmap image
The following example shows how to recognize some text in a SoftwareBitmap object as a single string value:
- Create a TextRecognizer object through a call to the
EnsureModelIsReady
function, which also confirms there is a language model present on the system. - Using the bitmap obtained in the previous snippet, we call the
RecognizeTextFromSoftwareBitmap
function. - Call CreateBufferAttachedToBitmap on the image file to get an ImageBuffer object.
- Call RecognizeTextFromImage to get the recognized text from the ImageBuffer.
- Create a wstringstream object and load it with the recognized text.
- Return the string.
Note
The EnsureModelIsReady
function is used to check the readiness state of the text recognition model (and install it if necessary).
using Microsoft.Windows.Vision;
using Microsoft.Windows.AI;
using Microsoft.Graphics.Imaging;
using Windows.Graphics.Imaging;
using Windows.Storage;
using Windows.Storage.Streams;
public async Task<string> RecognizeTextFromSoftwareBitmap(SoftwareBitmap bitmap)
{
TextRecognizer textRecognizer = await EnsureModelIsReady();
ImageBuffer imageBuffer = ImageBuffer.CreateBufferAttachedToBitmap(bitmap);
RecognizedText recognizedText = textRecognizer.RecognizeTextFromImage(imageBuffer);
StringBuilder stringBuilder = new StringBuilder();
foreach (var line in recognizedText.Lines)
{
stringBuilder.AppendLine(line.Text);
}
return stringBuilder.ToString();
}
public async Task<TextRecognizer> EnsureModelIsReady()
{
if (TextRecognizer.GetReadyState() == AIFeatureReadyState.EnsureNeeded)
{
var loadResult = await TextRecognizer.EnsureReadyAsync();
if (loadResult.Status != PackageDeploymentStatus.CompletedSuccess)
{
throw new Exception(loadResult.ExtendedError().Message);
}
}
return await TextRecognizer.CreateAsync();
}
winrt::Windows::Foundation::IAsyncOperation<winrt::Microsoft::Windows::AI::Imaging::TextRecognizer> MainWindow::EnsureModelIsReady()
{
if (winrt::Microsoft::Windows::AI::Imaging::TextRecognizer::GetReadyState() == AIFeatureReadyState::NotReady)
{
auto loadResult = TextRecognizer::EnsureReadyAsync().get();
if (loadResult.Status() != AIFeatureReadyResultState::Success)
{
throw winrt::hresult_error(loadResult.ExtendedError());
}
}
return winrt::Microsoft::Windows::AI::Imaging::TextRecognizer::CreateAsync();
}
Get word bounds and confidence
Here we show how to visualize the BoundingBox of each word in a SoftwareBitmap object as a collection of color-coded polygons on a Grid element.
Note
For this example we assume a TextRecognizer object has already been created and passed in to the function.
using Microsoft.Windows.Vision;
using Microsoft.Graphics.Imaging;
using Windows.Graphics.Imaging;
using Windows.Storage;
using Windows.Storage.Streams;
public void VisualizeWordBoundariesOnGrid(
SoftwareBitmap bitmap,
Grid grid,
TextRecognizer textRecognizer)
{
ImageBuffer imageBuffer = ImageBuffer.CreateBufferAttachedToBitmap(bitmap);
RecognizedText result = textRecognizer.RecognizeTextFromImage(imageBuffer);
SolidColorBrush greenBrush = new SolidColorBrush(Microsoft.UI.Colors.Green);
SolidColorBrush yellowBrush = new SolidColorBrush(Microsoft.UI.Colors.Yellow);
SolidColorBrush redBrush = new SolidColorBrush(Microsoft.UI.Colors.Red);
foreach (var line in result.Lines)
{
foreach (var word in line.Words)
{
PointCollection points = new PointCollection();
var bounds = word.BoundingBox;
points.Add(bounds.TopLeft);
points.Add(bounds.TopRight);
points.Add(bounds.BottomRight);
points.Add(bounds.BottomLeft);
Polygon polygon = new Polygon();
polygon.Points = points;
polygon.StrokeThickness = 2;
if (word.Confidence < 0.33)
{
polygon.Stroke = redBrush;
}
else if (word.Confidence < 0.67)
{
polygon.Stroke = yellowBrush;
}
else
{
polygon.Stroke = greenBrush;
}
grid.Children.Add(polygon);
}
}
}
void MainWindow::VisualizeWordBoundariesOnGrid(
Windows::Graphics::Imaging::SoftwareBitmap const& bitmap,
Grid const& grid,
TextRecognizer const& textRecognizer)
{
Microsoft::Graphics::Imaging::ImageBuffer imageBuffer =
Microsoft::Graphics::Imaging::ImageBuffer::CreateForSoftwareBitmap(bitmap);
RecognizedText result = textRecognizer.RecognizeTextFromImage(imageBuffer);
auto greenBrush = SolidColorBrush(winrt::Microsoft::UI::Colors::Green());
auto yellowBrush = SolidColorBrush(winrt::Microsoft::UI::Colors::Yellow());
auto redBrush = SolidColorBrush(winrt::Microsoft::UI::Colors::Red());
for (const auto& line : result.Lines())
{
for (const auto& word : line.Words())
{
PointCollection points;
const auto& bounds = word.BoundingBox();
points.Append(bounds.TopLeft);
points.Append(bounds.TopRight);
points.Append(bounds.BottomRight);
points.Append(bounds.BottomLeft);
winrt::Microsoft::UI::Xaml::Shapes::Polygon polygon{};
polygon.Points(points);
polygon.StrokeThickness(2);
if (word.MatchConfidence() < 0.33)
{
polygon.Stroke(redBrush);
}
else if (word.MatchConfidence() < 0.67)
{
polygon.Stroke(yellowBrush);
}
else
{
polygon.Stroke(greenBrush);
}
grid.Children().Append(polygon);
}
}
}
Responsible AI
We have used a combination of the following steps to ensure these imaging APIs are trustworthy, secure, and built responsibly. We recommend reviewing the best practices described in Responsible Generative AI Development on Windows when implementing AI features in your app.