How to add page number (sourcepage) to the AI search Index

Deepak Dange 0 Reputation points
2024-02-13T11:34:34.97+00:00

Hi Azure Support,

I hope this message finds you well. I'm reaching out because I have encountered an issue while customizing an Azure Search Index using Python. Specifically, I am attempting to add the source page information to the index schema for reference purposes in my application. However, my attempts to achieve this by utilizing metadata have been unsuccessful. Here's a brief overview of my situation:

  • I have created an Azure Search Index using Python code.
  • I have customized the index schema to align with my specific use case.
  • The documents utilized to create the index include PDFs, DOCX files, and PPTX files.
  • My objective is to include the source page information within the index schema.

Could you please provide guidance on how to successfully incorporate the source page information into the index schema for documents of these types? Your assistance in resolving this matter would be greatly appreciated. Thank you in advance for your support. Best regards, Deepak Dange

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
712 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,431 questions
Azure App Service
Azure App Service
Azure App Service is a service used to create and deploy scalable, mission-critical web apps.
6,901 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. brtrach-MSFT 15,256 Reputation points Microsoft Employee
    2024-02-15T03:37:00.6133333+00:00

    @Deepak Dange To include the source page information within the index schema, you can use the metadata_storage_content_type field. This field is automatically populated with the content type of the document being indexed. You can also use it to store additional metadata about the document, such as the source page information. Here's an example of how you can add the metadata_storage_content_type field to your index schema:

    from azure.search.documents.indexes import SearchFieldDataType, SearchableField fields = [ SearchableField(name="id", type=SearchFieldDataType.String, key=True), SearchableField(name="content", type=SearchFieldDataType.String), SearchableField(name="metadata_storage_content_type", type=SearchFieldDataType.String) ]
    

    In this example, we've added the metadata_storage_content_type field to the index schema. You can then populate this field with the source page information when you upload your documents to the index. Please note that the metadata_storage_content_type field is only available for certain file types, including PDFs, DOCX files, and PPTX files. If you're indexing other file types, you may need to use a different approach to store the source page information.