Tag not monitored by Microsoft.
note i had to write my comment as an answer as the "Comment" functionality appears to be broken. I cannot reply to your comment.
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
We have a subset of azure web applications on a private azure cloud. These web apps are just a bunch of dynamic web pages.
So we want to run a web crawler and crawl that content.
-- Clarification -- We not crawling SharePoint. We are crawling azure web application sites.
But how do we authenticate? When we go to the page, it prompts for microsoft single sign-on. Username/password method such as NTLM, or Form based auth (using http or selenium) is not available. We only by default allow single-sign-on through azure active directory cloud login.
We know application registrations, service account, maybe oauth might be involved in this... but we have a hard time finding the specifics of what exactly to do here.
What is the method of obtaining Federated Auth/Spoidcrl cookies for crawling azure web sites?
Should we use an SDK? Or is it something we can set up in curl, postman, etc?
Tag not monitored by Microsoft.
note i had to write my comment as an answer as the "Comment" functionality appears to be broken. I cannot reply to your comment.
@Alfredo Revilla - Upwork Top Talent | IAM SWE SWA hi sorry it took me so long to get the information together.
Had to send this as an answer as it is the only way i could get the forum to allow me to post the reply.
We are able to obtain the bearer token:
curl --location --request POST 'https://login.microsoftonline.com/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/oauth2/v2.0/token' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Cookie: fpc=xxxxxxxxxxxxxxxxxxxx; stsservicecookie=estsfd; x-ms-gateway-slice=estsfd' \
--data-urlencode 'client_secret=xxxxxxxxxxxxxxxxxxxxxxxxxx' \
--data-urlencode 'scope=api://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/.default' \
--data-urlencode 'client_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' \
--data-urlencode 'grant_type=client_credentials'
Then when attempting to access the site using our bearer token we get the 401.71 2147500037 error code.
The application log shows:
HTTP Error 401.71 - Unauthorized
You do not have permission to view this directory or page.
Most likely causes:
The authenticated user does not have access to a resource needed to process the request.
Things you can try:
Create a tracing rule to track failed requests for this HTTP status code. For more information about creating a tracing rule for failed requests, click here.
Detailed Error Information:
Module EasyAuthModule_32bit
Notification AuthenticateRequest
Handler ExtensionlessUrlHandler-Integrated-4.0
Error Code 0x80004005
Requested URL http://xxxxxxxxxx:80/
Physical Path D:\home\site\wwwroot
Logon Method Not yet determined
Logon User Not yet determined
More Information:
This is the generic Access Denied error returned by IIS. Typically, there is a substatus code associated with this error that describes why the server denied the request. Check the IIS Log file to determine whether a substatus code is associated with this failure.
View more information »
Then the IIS log shows this:
2022-09-21 16:21:19 XXXXXXXXXXXXXXXXX GET / X-ARR-LOG-ID=2017f9cd-64d3-4305-924d-029d37c53390 80 - ::1 AlwaysOn ARRAffinity=270fc76c7a748acb7bb3a328ed3b3e85783de79ee41831feff7c3c2118b4802a - XXXXXXXXXXXXXXXXX.azurewebsites.net 401 71 2147500037 705 693 13
So it looks like the bearer token is letting me in. But then I'm getting some sort of failure due to lack of permissions.
When I set this same thing up on my test site, it works and I can access the page.
The Azure web application folder permissions are probably the culprit here but I don't really know what to look at in terms of how to grant this access.
So my enterprise azure web app team needs to update something but we don't know what.
Should we open a support ticket to get assistance with that?
Hello @nicholas dipiazza and thanks for reaching out. In order to crawl Azure AD protected web apps the crawler should not worry about the specific protocol used (OIDC, OAuth, SAML, etc.) since web apps usually abstract them, but will need to interact with the Azure AD login UI, pass credentials and also react to additional prompts such as MFA. This requires, between others, core browser capabilities such as cookie management, client side storage and JavaScript content rendering (the latter two for JavaScript enabled web apps).
Let us know if you need additional assistance. If the answer was helpful, please accept it and complete the quality survey so that others can find a solution.