Overview of prebuilt document processing in Microsoft Syntex

Note

Through June 2025, you can try out prebuilt document processing and other selected Syntex services at no cost if you have pay-as-you-go billing set up. For information and limitations, see Try out Microsoft Syntex and explore its services.

In addition to custom models, Microsoft Syntex provides prebuilt models to automate the extraction of information.

Note

Microsoft respects the privacy and ownership of data you use to train and process models in Syntex. None of your organization's data is used or transferred by Microsoft to train AI models, large-language models, or any other models. Your data remains securely within your organization’s tenant. For more information, see Microsoft data protection and privacy.

Introduction to prebuilt models

Prebuilt document processing uses prebuilt models that are preconfigured to recognize documents and the structured information in the documents. Instead of having to create a new custom model from scratch, you can iterate on an existing pretrained model to add specific fields that fit the needs of your organization.

Prebuilt models use optical character recognition (OCR) combined with deep learning models to identify and extract predefined text and data fields common to specific document types. You start by analyzing one of your files against the prebuilt model. You then select the detected fields that make sense for your purpose. If the model doesn't detect the fields that you need, you can analyze again by using a different file.

Like other models, prebuilt models are created and managed in the content center. When applied to a SharePoint document library, the model is associated with a content type and has columns to store the information being extracted.

After publishing your model, use the content center to apply it to any SharePoint document library that you have access to.

Available prebuilt models

Currently, there are four prebuilt models available: contracts, invoices, receipts, and sensitive information.

  • Contracts. The prebuilt contracts model analyzes and extracts key information from contract documents. The API analyzes contracts in various formats and extracts key contract information, such as client name and address, contract duration, and renewal date.

  • Invoices. The invoices prebuilt model analyzes and extracts key information from sales invoices. The API analyzes invoices in various formats and extracts key invoice information such as customer name, billing address, due date, and amount due.

  • Receipts. The receipts prebuilt model analyzes and extracts key information from sales receipts. The API analyzes printed and handwritten receipts and extracts key receipt information such as merchant name, merchant phone number, transaction date, tax, and transaction total.

  • Sensitive information. The sensitive information prebuilt model analyzes, detects, and extracts key information from documents. The API analyzes documents in various formats and detects and extracts key sensitive information, such as personal and financial identification numbers, physical and email addresses, and phone numbers.

Additional prebuilt models will be available in future releases.

Requirements and limitations

For information about requirements to consider when choosing this model, see Requirements and limitations for models in Microsoft Syntex.