Share via

google web bots deny by microsoft waf

Darrin Werre - Azure Admin 0 Reputation points
2026-03-24T20:08:35.39+00:00

We are getting reports that google web crawlers are getting blocked by azure waf. I am not seeing anything blocked in the logs although google is claiming it is being blocked. The web crawler testing yeilds that google crawler is allowed although from research it seems the algo's are different from test to full prod.

Azure Web Application Firewall

3 answers

Sort by: Most helpful
  1. Praveen Bandaru 11,390 Reputation points Microsoft External Staff Moderator
    2026-03-28T18:51:47.6966667+00:00

    Hello **Darrin Werre - Azure Admin
    **I understand that your Google crawlers are being denied access, even though you aren’t seeing any entries marked as “blocked” by the WAF.

    Please let me know which WAF are you using AFD or Application Gateway?

    • Azure Front Door (Standard/Premium), Application Gateway, and CDN each have their own Bot Manager integration and logging methods. Ensure that your front door, app-gw, or CDN endpoint is the one actually handling the traffic from Google.
    • In the Managed Rules section of your WAF policy, make sure the Microsoft_BotManagerRuleSet (or BotManagerRuleSet_1.1 for Front Door Premium) is assigned. Verify that “Good bots” are allowed and not blocked by any custom rule. Also, confirm the policy is set to Prevention mode, rather than Detection.
    • Activate WAF diagnostics to Log Analytics, Storage, or Event Hub. Apply filters for bot rule IDs, such as 300600 on Front Door, or search for “BotManager” in the logs. Check for entries where the user-agent includes “Googlebot” or where the rule action is “Block” or “Log”.
    • Google provides its crawler IP ranges here: https://developers.google.com/crawling/docs/crawlers-fetchers/verify-google-requests
    • Perform a reverse DNS lookup on incoming IPs to confirm they are from Google. If any are incorrectly identified (such as appearing as “unknown bot”), you might need to use an allow-list. You can set up a custom WAF exclusion rule to permit traffic from these IP ranges or the UA string while troubleshooting.
    • If you have custom OWASP rules in place, such as for SQLi, XSS, or size constraints, please double-check that none of them are unintentionally triggering on the Googlebot user agent or its query patterns. Since custom rules are processed before managed rule sets, a general rule like “block on unknown UA” could be causing the issue.
    • Microsoft’s “test” crawler may only simulate a user agent check, while the production throttling and algorithms can be stricter. Try using curl or a basic HTTP client from a Googlebot IP to your public endpoint, and be sure to capture the complete WAF logs.

    Check the below reference documents for more understanding:

    bot protection for Web Application Firewall

    Custom rules for Azure Web Application Firewall on Azure Front Door

    Azure Web Application Firewall on Azure Application Gateway bot protection overview


    I hope the above answer helps you! Please let us know if you have any further questions.

    Please don't forget to "upvote" where the information provided will help you, this can be beneficial to other members of the community.

    0 comments No comments

  2. Venkatesan S 6,920 Reputation points Microsoft External Staff Moderator
    2026-03-24T20:17:47.01+00:00

    Hi Darrin Werre - Azure Admin,

    Thanks for reaching out in Microsoft Q&A forum,

    Google web crawlers like Googlebot are classified as good bots by Azure WAF's Bot Protection (Bot Manager rule set) and allowed by default, unless overridden by custom rules or other configurations. No blocks appear in your logs because production crawlers may trigger anomaly scores, JS challenges, or silent drops not marked as explicit "Block" actions.

    Verify Bot Protection Settings:

    In your WAF policy (Azure Front Door or Application Gateway):

    Confirm Bot Protection is enabled.

    • Ensure Good bots action is set to Allow (not Block/Redirect).
    • Check Unknown bots action (Log instead of Block) to safely analyze traffic.

    Custom rules evaluate first review for matches on Googlebot (e.g., IP ranges, geo-filtering, rate limits, paths like /robots.txt).

    Analyze WAF Logs:

    Query Log Analytics with:

    AzureDiagnostics
    | where Category == "FrontDoorWebApplicationFirewallLog" // or "FrontdoorWebApplicationFirewallLog" for Classic
    | where TimeGenerated > ago(1d)
    | where userAgent_s contains "Googlebot"
    | where action_s in ("Block", "Redirect", "ManagedChallenge")
    | project TimeGenerated, clientIp_s, requestUri_s, ruleId_s, details_msg_s
    

    Correlate timestamps/URIs with Google's crawl reports to identify triggering rules.

    Follow the steps:

    • Custom Allow Rule (priority 0-1000): If http.userAgent contains "Googlebot" → Allow.
    • URI Exclusions: Exclude /robots.txt or query args Googlebot uses.
    • Disable Problem Rules: Tune specific DRS/Bot rules causing false positives (Detection mode first).
    • Other Checks: Disable geo-blocks/rate limits temporarily; confirm Prevention mode.

    Test with Google's live URL Inspection tool after changes (allow 24-48 hours for recrawl).

    Official Documentation:

    Kindly let us know if the above helps or you need further assistance on this issue.

    Please do not forget to 210246-screenshot-2021-12-10-121802.pngand “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

  3. Q&A Assist
    2026-03-24T20:08:55.5166667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    Azure WAF bot protection distinguishes between Good, Bad, and Unknown bots and treats verified search engine crawlers (such as Googlebot) as Good bots. By default, Good bots are allowed, while Bad and some Unknown bots can be blocked or logged depending on configuration.

    To investigate and mitigate Google crawler blocking:

    1. Confirm bot protection configuration
      • In the WAF policy (Front Door or Application Gateway), check whether the Bot Protection / Bot Manager rule set is enabled.
      • Verify the action configured for Good bots. It should be Allow, not Block or Redirect.
      • On Azure Front Door, the bot protection rule set supports custom actions per bot category (Bad, Good, Unknown) and WAF actions include Allow, Block, Log, and Redirect. Ensure no custom rule is overriding Good bots with a more restrictive action.
    2. Check for custom rules taking precedence (Front Door)
      • Custom rules are evaluated before the Default Rule Set and bot protection rules.
      • If a request matches a custom rule, that rule’s action is applied and no other rules are processed.
      • Review custom rules for conditions that might inadvertently match Googlebot traffic (e.g., IP ranges, paths, rate limiting, geo-filtering) and block or redirect it.
    3. Review WAF logs specifically for Googlebot traffic
      • WAF logs show every request that is matched or blocked.
      • Ensure WAF logging is enabled and query:
        • For Front Door Standard/Premium:
                 AzureDiagnostics
                 | where Category == "FrontDoorWebApplicationFirewallLog"
                 | where TimeGenerated > ago(1d)
                 | where action_s == "Block"
          
        • For Front Door Classic:
                 AzureDiagnostics
                 | where Category == "FrontdoorWebApplicationFirewallLog"
                 | where TimeGenerated > ago(1d)
                 | where action_s == "Block"
          
      • Narrow further by:
        • Request URI paths Google is crawling.
        • Known Googlebot user-agent strings.
      • If Googlebot is being blocked, the log entry will show which rule (custom, DRS, or bot rule) is responsible.
    4. Validate that Googlebot is classified as a Good bot
      • Good bots include verified search engine bots (such as Googlebot and Bingbot).
      • If Googlebot requests are being treated as Unknown or Bad bots, they might be coming from IPs not recognized as legitimate Googlebot sources or with modified user agents.
      • In that case, adjust bot protection actions so that Unknown bots are logged instead of blocked, then analyze logs to see how Googlebot traffic is being categorized.
    5. Use exclusions or tuning if a rule is causing false positives
      • If a specific WAF rule (from the Default Rule Set or a custom rule) is blocking legitimate Googlebot traffic, tune the WAF:
        • Create exclusions for specific request attributes (paths, query parameters, headers) that Googlebot uses.
        • Disable or adjust only the problematic rule for the affected site/URI.
      • Tuning is expected and normal for OWASP-based rule sets.
    6. Check for other WAF features impacting crawlers
      • Geo-filtering: If enabled, ensure the countries/regions where Google’s crawlers operate are not blocked.
      • IP restriction: Confirm that IP restriction rules are not blocking Googlebot IP ranges.
      • Rate limiting: Ensure rate limiting rules are not throttling or blocking Google’s crawl patterns on busy endpoints.
    7. Verify mode on Application Gateway WAF
      • If using Application Gateway WAF, confirm whether it is in Detection or Prevention mode.
      • In Detection mode, WAF will only log; if Google reports blocking but logs show only detections, the block might be elsewhere (e.g., app, network, or another security layer).

    By correlating Google’s reported crawl failures with WAF logs (timestamp, URI, client IP, user agent) and reviewing bot protection and custom rules, it is possible to identify whether Azure WAF is actually blocking Googlebot and then either allow Good bots, relax Unknown bot handling, or tune specific rules.


    References:

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.