In the SharePoint Indexer .docx file is not indexing

Question

In the SharePoint Indexer .docx file is not indexing

Aditya Vishwakarma (LTIMINDTREE LIMITED) 0 Microsoft External Staff

In the SharePoint Indexer .docx file is not indexing but i can able to index the PDF format files

1 answer

Your answer

Answer 1

@Aditya Vishwakarma (LTIMINDTREE LIMITED)

Thanks for reaching out to us. Could you please share the document if it is not confidential? DOCX file should be supported regarding to the documents as below requirements -

You can control which files are indexed by setting inclusion and exclusion criteria in the "parameters" section of the indexer definition.

Include specific file extensions by setting "indexedFileNameExtensions" to a comma-separated list of file extensions (with a leading dot). Exclude specific file extensions by setting "excludedFileNameExtensions" to the extensions that should be skipped. If the same extension is in both lists, it's excluded from indexing.

PUT /indexers/[indexer name]?api-version=2020-06-30
{
    "parameters"

There are also prerequisites and more information you may want to concern -

SharePoint in Microsoft 365 cloud service

Files in a document library

Supported document formats

The SharePoint indexer can extract text from the following document formats:

CSV (see Indexing CSV blobs)
EML
EPUB
GZ
HTML
JSON (see Indexing JSON blobs)
KML (XML for geographic representations)
Microsoft Office formats: DOCX/DOC/DOCM, XLSX/XLS/XLSM, PPTX/PPT/PPTM, MSG (Outlook emails), XML (both 2003 and 2006 WORD XML)
Open Document formats: ODT, ODS, ODP
PDF
Plain text files (see also Indexing plain text)
RTF
XML
ZIP

Limitations and considerations

Here are the limitations of this feature:

Indexing SharePoint Lists isn't supported.

Indexing SharePoint .ASPX site content isn't supported.

OneNote notebook files aren't supported.

Private endpoint isn't supported.

Renaming a SharePoint folder doesn't trigger incremental indexing. A renamed folder is treated as new content.

SharePoint supports a granular authorization model that determines per-user access at the document level. The indexer doesn't pull these permissions into the index, and Azure AI Search doesn't support document-level authorization. When a document is indexed from SharePoint into a search service, the content is available to anyone who has read access to the index. If you require document-level permissions, you should consider security filters to trim results and automate copying the permissions at a file level to a field in the index.

Indexing user-encrypted files, Information Rights Management (IRM) protected files, ZIP files with passwords or similar encrypted content isn't supported. For encrypted content to be processed, the user with proper permissions to the specific file must remove the encryption so the item can be indexed accordingly when the indexer runs the next scheduled iteration.

Here are the considerations when using this feature:

If you need a SharePoint content indexing solution in a production environment, consider creating a custom connector with SharePoint Webhooks, calling Microsoft Graph API to export the data to an Azure Blob container, and then use the Azure Blob indexer for incremental indexing.
If your SharePoint configuration allows Microsoft 365 processes to update SharePoint file system metadata, be aware that these updates can trigger the SharePoint indexer, causing the indexer to ingest documents multiple times. Because the SharePoint indexer is a third-party connector to Azure, the indexer can't read the configuration or vary its behavior. It responds to changes in new and changed content, regardless of how those updates are made. For this reason, make sure that you test your setup and understand the document processing count prior to using the indexer and any AI enrichment.

If you can share the file and we can help you debug further, please let us know how it works, I hope this helps.

Regards,

Yutong

-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

Share via

In the SharePoint Indexer .docx file is not indexing

1 answer

Your answer