How to convert sharepoint aspx pages to pdf or any other file type that can be used as a data source si it can be indexedfor Azure AI Search

Babiker Assie 0 Reputation points
2024-03-20T09:14:55.06+00:00

I want to use AI search with my organization site. the site contains aspx pages that i want to use as a data source. Indexing aspx pages is not supported as per documentation so I'm looking into converting them to another type that can be indexed.

Is my asumption correct, converting the aspx pages is the only way I can use Azure AI solution ?

How can I convert them my aspx pages ? and is there a recommended type ?

kind regards :)

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,012 questions
SharePoint
SharePoint
A group of Microsoft Products and technologies used for sharing and managing content, knowledge, and applications.
10,705 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Mzoughi Ilyes 80 Reputation points
    2024-03-24T03:19:09.6133333+00:00

    Hi Babiker Assie,

    You can have the contents of your aspx pages in search results but there is some limitations with Azure cognitive Search and SharePoint Indexer such Indexing SharePoint .ASPX site content.

    https://learn.microsoft.com/en-us/sharepoint/make-site-content-searchable#show-contents-of-aspx-pages-in-search-results

    https://learn.microsoft.com/en-us/azure/search/search-howto-index-sharepoint-online#limitations-and-considerations

    So I suggest you to use a Power Automate process to convert your aspx pages within the same document library or on other place.
    You will found steps in this post https://sharepointstuff.com/2022/06/15/convert-sharepoint-pages-into-pdf/comment-page-1/

    Best regards,

    IM

    #Azure #cognitive #AI #Copilote

    0 comments No comments

  2. brtrach-MSFT 16,356 Reputation points Microsoft Employee
    2024-03-26T02:08:40.93+00:00

    @Babiker Assie To add to the great answer that IM provided, here is a sample by using iTextSharp to convert an ASPX page to a PDF:

    using iTextSharp.text;
    using iTextSharp.text.pdf;
    using System.IO;
    using System.Net;
    
    // Get the ASPX page content
    string url = "http://example.com/mypage.aspx";
    WebClient client = new WebClient();
    string html = client.DownloadString(url);
    
    // Convert the HTML to PDF
    Document document = new Document();
    PdfWriter.GetInstance(document, new FileStream("mypage.pdf", FileMode.Create));
    document.Open();
    HTMLWorker worker = new HTMLWorker(document);
    worker.Parse(new StringReader(html));
    document.Close();
    

    Let us know if you have any further questions.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.