AI テキスト認識 (OCR) を始めよう

2025-07-03

光学式文字認識 (OCR) とも呼ばれるテキスト認識は、画像内のテキストを検出して抽出し、コンピューターで読み取り可能な文字ストリームに変換できる一連の人工知能 (AI) に基づく API を通じて、Windows AI Foundry でサポートされています。

これらの API は、文字、単語、行、多角形のテキスト境界を識別し、一致ごとに信頼度レベルを提供できます。また、ニューラル処理ユニット (NPU) を搭載したデバイスでのハードウェアアクセラレーションによってのみサポートされ、 Windows プラットフォーム SDKの従来の Windows.Media.Ocr.OcrEngine API よりも高速かつ正確になります。

API の詳細については、テキスト認識 (OCR) の API リファレンスを参照してください。

Von Bedeutung

現在サポートされている Windows AI 機能と Windows App SDK リリースの一覧を次に示します。

バージョン 1.8 試験的 (1.8.0-experimental1) - オブジェクト消去、 Phi シリカ、 Phi シリカの LoRA 微調整、会話の要約 (テキストインテリジェンス)

プライベートプレビュー - セマンティック検索

バージョン 1.7.1 (1.7.250401001) - その他すべての API

これらの API は、5 月 7 日の更新プログラムを受け取った Windows Insider Preview (WIP) デバイスでのみ機能します。 5 月 28 日から 29 日に、オプションの更新プログラムが WIP 以外のデバイスにリリースされ、その後、6 月 10 日の更新プログラムがリリースされます。この更新プログラムでは、Windows AI API が機能するために必要な AI モデルが提供されます。これらの更新プログラムでは、実行時にアプリにパッケージ ID が付与されるまで、Windows AI API を使用するすべてのアプリでこれを行うことができないことも必要になります。

AI テキスト認識でできること

AI テキスト認識機能を使用して、画像内のテキストを識別して認識します。認識されたテキストのテキストの境界と信頼度スコアを取得することもできます。

ファイルから ImageBuffer を作成する

この WinUI の例では、 LoadImageBufferFromFileAsync 関数を呼び出して 、イメージファイルから ImageBuffer を取得します。

LoadImageBufferFromFileAsync 関数で、次の手順を実行します。

指定したファイルパスから StorageFile オブジェクトを作成します。
OpenAsync を使用して StorageFile でストリームを開きます。
ストリームの BitmapDecoder を作成します。
ビットマップデコーダーの GetSoftwareBitmapAsync を呼び出して、SoftwareBitmap オブジェクトを取得します。
CreateBufferAttachedToBitmap からイメージバッファーを返します。

using Microsoft.Windows.Vision;
using Microsoft.Graphics.Imaging;
using Windows.Graphics.Imaging;
using Windows.Storage;
using Windows.Storage.Streams;

public async Task<ImageBuffer> LoadImageBufferFromFileAsync(string filePath)
{
    StorageFile file = await StorageFile.GetFileFromPathAsync(filePath);
    IRandomAccessStream stream = await file.OpenAsync(FileAccessMode.Read);
    BitmapDecoder decoder = await BitmapDecoder.CreateAsync(stream);
    SoftwareBitmap bitmap = await decoder.GetSoftwareBitmapAsync();

    if (bitmap == null)
    {
        return null;
    }

    return ImageBuffer.CreateBufferAttachedToBitmap(bitmap);
}

#include <iostream>
#include <sstream>
#include <winrt/Microsoft.Windows.AI.Imaging.h>
#include <winrt/Windows.Graphics.Imaging.h>
#include <winrt/Microsoft.Graphics.Imaging.h>
#include <winrt/Microsoft.UI.Xaml.Controls.h>
#include<winrt/Microsoft.UI.Xaml.Media.h>
#include<winrt/Microsoft.UI.Xaml.Shapes.h>

using namespace winrt;
using namespace Microsoft::UI::Xaml;
using namespace Microsoft::Windows::AI;
using namespace Microsoft::Windows::AI::Imaging;
using namespace winrt::Microsoft::UI::Xaml::Controls;
using namespace winrt::Microsoft::UI::Xaml::Media;


winrt::Windows::Foundation::IAsyncOperation<winrt::hstring> 
    MainWindow::RecognizeTextFromSoftwareBitmap(
        Windows::Graphics::Imaging::SoftwareBitmap const& bitmap)
{
    winrt::Microsoft::Windows::AI::Imaging::TextRecognizer textRecognizer = 
        EnsureModelIsReady().get();
    Microsoft::Graphics::Imaging::ImageBuffer imageBuffer = 
        Microsoft::Graphics::Imaging::ImageBuffer::CreateForSoftwareBitmap(bitmap);
    RecognizedText recognizedText = 
        textRecognizer.RecognizeTextFromImage(imageBuffer);
    std::wstringstream stringStream;
    for (const auto& line : recognizedText.Lines())
    {
        stringStream << line.Text().c_str() << std::endl;
    }
    co_return winrt::hstring{ stringStream.str()};
}

ビットマップイメージ内のテキストを認識する

次の例は、次の手順に従って、SoftwareBitmap オブジェクト内のテキストを 1 つの文字列値として認識する方法を示します。

関数の呼び出しを通じて EnsureModelIsReady オブジェクトを作成します。これにより、システムに言語モデルが存在することも確認されます。
前のスニペットで取得したビットマップを使用して、RecognizeTextFromSoftwareBitmap 関数を呼び出します。
ImageBuffer オブジェクトを取得するには、画像ファイルの CreateBufferAttachedToBitmap を呼び出します。
ImageBuffer から認識されたテキストを取得するには、RecognizeTextFromImage を呼び出します。
wstringstream オブジェクトを作成して、認識されたテキストと共に読み込みます。
文字列を返します。

メモ

EnsureModelIsReady 関数は、テキスト認識モデルの準備状態をチェック (および必要に応じてインストール) するために使用します。

using Microsoft.Windows.Vision;
using Microsoft.Windows.AI;
using Microsoft.Graphics.Imaging;
using Windows.Graphics.Imaging;
using Windows.Storage;
using Windows.Storage.Streams;

public async Task<string> RecognizeTextFromSoftwareBitmap(SoftwareBitmap bitmap)
{
    TextRecognizer textRecognizer = await EnsureModelIsReady();
    ImageBuffer imageBuffer = ImageBuffer.CreateBufferAttachedToBitmap(bitmap);
    RecognizedText recognizedText = textRecognizer.RecognizeTextFromImage(imageBuffer);
    StringBuilder stringBuilder = new StringBuilder();

    foreach (var line in recognizedText.Lines)
    {
        stringBuilder.AppendLine(line.Text);
    }

    return stringBuilder.ToString();
}

public async Task<TextRecognizer> EnsureModelIsReady()
{
    if (TextRecognizer.GetReadyState() == AIFeatureReadyState.EnsureNeeded)
    {
        var loadResult = await TextRecognizer.EnsureReadyAsync();
        if (loadResult.Status != PackageDeploymentStatus.CompletedSuccess)
        {
            throw new Exception(loadResult.ExtendedError().Message);
        }
    }

    return await TextRecognizer.CreateAsync();
}

winrt::Windows::Foundation::IAsyncOperation<winrt::Microsoft::Windows::AI::Imaging::TextRecognizer> MainWindow::EnsureModelIsReady()
{
    if (winrt::Microsoft::Windows::AI::Imaging::TextRecognizer::GetReadyState() == AIFeatureReadyState::NotReady)
    {
        auto loadResult = TextRecognizer::EnsureReadyAsync().get();
           
        if (loadResult.Status() != AIFeatureReadyResultState::Success)
        {
            throw winrt::hresult_error(loadResult.ExtendedError());
        }
    }

    return winrt::Microsoft::Windows::AI::Imaging::TextRecognizer::CreateAsync();
}

ワード境界と信頼度を取得する

ここでは、SoftwareBitmap オブジェクト内の各単語のBoundingBoxを、Grid 要素上の色分けされた多角形のコレクションとして視覚化する方法を示します。

メモ

この例では、TextRecognizer が既に作成されて、関数に渡されていることを前提としています。

using Microsoft.Windows.Vision;
using Microsoft.Graphics.Imaging;
using Windows.Graphics.Imaging;
using Windows.Storage;
using Windows.Storage.Streams;

public void VisualizeWordBoundariesOnGrid(
    SoftwareBitmap bitmap,
    Grid grid,
    TextRecognizer textRecognizer)
{
    ImageBuffer imageBuffer = ImageBuffer.CreateBufferAttachedToBitmap(bitmap);
    RecognizedText result = textRecognizer.RecognizeTextFromImage(imageBuffer);

    SolidColorBrush greenBrush = new SolidColorBrush(Microsoft.UI.Colors.Green);
    SolidColorBrush yellowBrush = new SolidColorBrush(Microsoft.UI.Colors.Yellow);
    SolidColorBrush redBrush = new SolidColorBrush(Microsoft.UI.Colors.Red);

    foreach (var line in result.Lines)
    {
        foreach (var word in line.Words)
        {
            PointCollection points = new PointCollection();
            var bounds = word.BoundingBox;
            points.Add(bounds.TopLeft);
            points.Add(bounds.TopRight);
            points.Add(bounds.BottomRight);
            points.Add(bounds.BottomLeft);

            Polygon polygon = new Polygon();
            polygon.Points = points;
            polygon.StrokeThickness = 2;

            if (word.Confidence < 0.33)
            {
                polygon.Stroke = redBrush;
            }
            else if (word.Confidence < 0.67)
            {
                polygon.Stroke = yellowBrush;
            }
            else
            {
                polygon.Stroke = greenBrush;
            }

            grid.Children.Add(polygon);
        }
    }
}

void MainWindow::VisualizeWordBoundariesOnGrid(
    Windows::Graphics::Imaging::SoftwareBitmap const& bitmap,
    Grid const& grid,
    TextRecognizer const& textRecognizer)
{
    Microsoft::Graphics::Imaging::ImageBuffer imageBuffer = 
        Microsoft::Graphics::Imaging::ImageBuffer::CreateForSoftwareBitmap(bitmap);

    RecognizedText result = textRecognizer.RecognizeTextFromImage(imageBuffer);

    auto greenBrush = SolidColorBrush(winrt::Microsoft::UI::Colors::Green());
    auto yellowBrush = SolidColorBrush(winrt::Microsoft::UI::Colors::Yellow());
    auto redBrush = SolidColorBrush(winrt::Microsoft::UI::Colors::Red());
    for (const auto& line : result.Lines())
    {
        for (const auto& word : line.Words())
        {
            PointCollection points;
            const auto& bounds = word.BoundingBox();
            points.Append(bounds.TopLeft);
            points.Append(bounds.TopRight);
            points.Append(bounds.BottomRight);
            points.Append(bounds.BottomLeft);

            winrt::Microsoft::UI::Xaml::Shapes::Polygon polygon{};
            polygon.Points(points);
            polygon.StrokeThickness(2);
            if (word.MatchConfidence() < 0.33)
            {
                polygon.Stroke(redBrush);
            }
            else if (word.MatchConfidence() < 0.67)
            {
                polygon.Stroke(yellowBrush);
            }
            else
            {
                polygon.Stroke(greenBrush);
            }

            grid.Children().Append(polygon);
        }
    }
}

責任ある AI

これらのイメージング API が信頼でき、安全で、責任を持って構築されていることを確認するために、次の手順を組み合わせて使用しました。アプリで AI 機能を実装する場合は、「Windows での責任ある生成型 AI 開発で説明されているベストプラクティスを確認することをお勧めします。

次の方法で共有

AI テキスト認識 (OCR) を始めよう

AI テキスト認識でできること

ファイルから ImageBuffer を作成する

ビットマップ イメージ内のテキストを認識する

ワード境界と信頼度を取得する

責任ある AI

こちらも参照ください

フィードバック

その他のリソース

ビットマップイメージ内のテキストを認識する