In the SharePoint Indexer .docx file is not indexing

Aditya Vishwakarma (LTIMINDTREE LIMITED) 0 Reputation points Microsoft External Staff
2024-03-26T10:20:57.6066667+00:00

In the SharePoint Indexer .docx file is not indexing but i can able to index the PDF format files

Microsoft 365 and Office | SharePoint | For business | Windows
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,629 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 53,971 Reputation points Moderator
    2024-03-27T00:19:50.53+00:00

    @Aditya Vishwakarma (LTIMINDTREE LIMITED)

    Thanks for reaching out to us. Could you please share the document if it is not confidential? DOCX file should be supported regarding to the documents as below requirements -

    You can control which files are indexed by setting inclusion and exclusion criteria in the "parameters" section of the indexer definition.

    Include specific file extensions by setting "indexedFileNameExtensions" to a comma-separated list of file extensions (with a leading dot). Exclude specific file extensions by setting "excludedFileNameExtensions" to the extensions that should be skipped. If the same extension is in both lists, it's excluded from indexing.

    PUT /indexers/[indexer name]?api-version=2020-06-30
    {
        "parameters" 
    

    There are also prerequisites and more information you may want to concern -

    SharePoint in Microsoft 365 cloud service

    Files in a document library

    Supported document formats

    The SharePoint indexer can extract text from the following document formats:

    • CSV (see Indexing CSV blobs)
    • EML
    • EPUB
    • GZ
    • HTML
    • JSON (see Indexing JSON blobs)
    • KML (XML for geographic representations)
    • Microsoft Office formats: DOCX/DOC/DOCM, XLSX/XLS/XLSM, PPTX/PPT/PPTM, MSG (Outlook emails), XML (both 2003 and 2006 WORD XML)
    • Open Document formats: ODT, ODS, ODP
    • PDF
    • Plain text files (see also Indexing plain text)
    • RTF
    • XML
    • ZIP

    Limitations and considerations

    Here are the limitations of this feature:

    Indexing SharePoint Lists isn't supported.

    Indexing SharePoint .ASPX site content isn't supported.

    OneNote notebook files aren't supported.

    Private endpoint isn't supported.

    Renaming a SharePoint folder doesn't trigger incremental indexing. A renamed folder is treated as new content.

    SharePoint supports a granular authorization model that determines per-user access at the document level. The indexer doesn't pull these permissions into the index, and Azure AI Search doesn't support document-level authorization. When a document is indexed from SharePoint into a search service, the content is available to anyone who has read access to the index. If you require document-level permissions, you should consider security filters to trim results and automate copying the permissions at a file level to a field in the index.

    Indexing user-encrypted files, Information Rights Management (IRM) protected files, ZIP files with passwords or similar encrypted content isn't supported. For encrypted content to be processed, the user with proper permissions to the specific file must remove the encryption so the item can be indexed accordingly when the indexer runs the next scheduled iteration.

    Here are the considerations when using this feature:

    • If you need a SharePoint content indexing solution in a production environment, consider creating a custom connector with SharePoint Webhooks, calling Microsoft Graph API to export the data to an Azure Blob container, and then use the Azure Blob indexer for incremental indexing.
    • If your SharePoint configuration allows Microsoft 365 processes to update SharePoint file system metadata, be aware that these updates can trigger the SharePoint indexer, causing the indexer to ingest documents multiple times. Because the SharePoint indexer is a third-party connector to Azure, the indexer can't read the configuration or vary its behavior. It responds to changes in new and changed content, regardless of how those updates are made. For this reason, make sure that you test your setup and understand the document processing count prior to using the indexer and any AI enrichment.

    If you can share the file and we can help you debug further, please let us know how it works, I hope this helps.

    Regards,

    Yutong

    -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.