Overview of unstructured document processing in SharePoint

2025-06-24

Note

Through December 2025, you can try out a limited amount of unstructured document processing and other selected content services at no cost if you have pay-as-you-go billing set up. For information and limitations, see Try out pay-as-you-go services.

Use the unstructured document processing model (teaching method) to automatically classify files and extract information. It works best for unstructured documents, such as letters or contracts.

Note

Microsoft respects the privacy and ownership of data you use to train and process models in Syntex. None of your organization's data is used or transferred by Microsoft to train AI models, large-language models, or any other models. Your data remains securely within your organization’s tenant. For more information, see Microsoft data protection and privacy.

Introduction to unstructured models

Unstructured document processing models use artificial intelligence (AI) to analyze and extract information from documents. These models rely on identifiable text—based on phrases or patterns—to determine both the document’s classification and the data to extract.

Note

For more information about how to use Syntex and scenario examples, see Get started driving adoption of Microsoft Syntex and Scenarios and use cases for Microsoft Syntex.

You create and manage unstructured document processing models in a SharePoint content center. When you apply a model to a SharePoint document library, it’s associated with a content type that includes columns for storing extracted information. You can create a new content type or use an existing one from the SharePoint content type gallery.

Note

Read-only or sealed content types can't be updated, so they can't be used in a model.

Add classifiers and extractors to your unstructured document processing models to do the following actions:

Classifiers are used to identify and classify documents that are uploaded to the document library. For example, a classifier can be "trained" to identify all contract renewal documents that are uploaded to the library. The contract renewal content type is defined by you when you create your classifier.
Extractors pull information from these documents. For example, for each contract renewal document identified in your document library, columns display that show the Service Start Date and Client for each document.

Use example files to train and test your classifiers and extractors. These files help the model learn what to look for when identifying and extracting data. For example, train your contract renewal model using real contract renewal documents from your organization. You can also use these files to validate your model’s accuracy.

After publishing your model, use the content center to apply it to any SharePoint document library that you have access to.

Requirements and limitations

For information about requirements to consider when choosing this model, see the requirements and limitations for unstructured document processing.

Share via

Overview of unstructured document processing in SharePoint

Introduction to unstructured models

Requirements and limitations

Feedback

Additional resources