How to extract SharePoint Online content (pages, documents) using Azure Function or other Azure services?
Hi,
I’m trying to build a solution to extract data from SharePoint Online, including:
Site Pages (.aspx
files)
Documents and files from libraries
Possibly metadata from lists as well
I would like to understand how this can be done using Azure services — primarily Azure Functions, but I’m also open to using Logic Apps, Power Automate, or other recommended tools.
A few specific questions I have:
What is the recommended way to authenticate securely from an Azure Function to SharePoint Online (using app registration, managed identity, etc.)?
Is it possible to extract the content or structure of .aspx pages programmatically?
How can I access and download documents from a document library via Azure?
Are there any sample implementations or best practices for this kind of SharePoint integration?
How should I handle large data volumes or throttling during extraction?
Any guidance, examples, or architectural suggestions would be very helpful.
Thanks!