Home Knowledge base Cloudflare bot protection and AEO — the silent killer
Technical

Cloudflare bot protection and AEO — the silent killer

Cloudflare's default bot challenge blocks the AI crawlers you want to be cited by. How to detect it, and the WAF rules to selectively unblock the major AI bots.

The single most common reason a site fails the AEO Site Checker’s fetch_direct check — the heaviest check in the rubric, worth 18 points out of 100 — is Cloudflare bot protection.

Cloudflare’s defaults are conservative. If you’ve ever clicked “Under Attack Mode” or enabled the AI Bots block in their dashboard, every AI crawler trying to read your site gets a 403 or a JavaScript challenge. Your robots.txt is correct, your content is fine, but the request never reaches your origin.

Here’s how to detect this and fix it without losing actual bot protection.

How to tell if Cloudflare is the problem

If your AEO score includes a fetch_direct failure with the result “BrightData unlocker required” or “403 Forbidden”, and your site is behind Cloudflare, you almost certainly have this issue.

Three quick verifications:

1. curl with a normal User-Agent vs. an AI bot UA

# Pretend to be Chrome
curl -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36" \
     -I https://yoursite.com
# 200 OK — fine

# Pretend to be ChatGPT-User
curl -A "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot" \
     -I https://yoursite.com
# 403 Forbidden, cf-ray: ... — Cloudflare is blocking

2. Check the response headers

In a blocked response you’ll typically see:

HTTP/2 403
cf-mitigated: challenge
cf-ray: 8f1234567890abcd-DFW
server: cloudflare

The cf-mitigated header is the giveaway.

3. Cloudflare dashboard

Sign in → your site → BotsConfigure. Look at:

  • Bot Fight Mode — should be off if you want AI crawlers through.
  • Block AI Bots — explicitly designed to block GPTBot, Claude-User, etc. Off if you want AEO.
  • Super Bot Fight Mode (paid) — leave on, but configure exceptions (below).
  • WAF managed rules — check for any rule that includes cf.client.bot matchers.

You probably do still want bot protection — for credential stuffing, scraping abuse, ad fraud. The goal is to let AI crawlers through while keeping malicious bots out.

Step 1: turn off the global “AI Bots” toggle

In Cloudflare → BotsConfigure → uncheck “Block AI Bots”.

This single toggle is the most common cause of AEO failures we audit.

Step 2: configure Super Bot Fight Mode allowlists

If you’re on Pro / Business / Enterprise, Super Bot Fight Mode lets you allowlist verified bots. Cloudflare verifies bots by reverse-DNS lookup against the published bot operator IP ranges.

Go to BotsSuper Bot Fight Mode and set:

  • Definitely Automated → Block (default)
  • Likely Automated → Allow (don’t block — this is where many crawlers fall)
  • Verified Bots → Allow

Then add a custom rule to allowlist by User-Agent:

Step 3: WAF custom rule

Create a custom rule that explicitly allows the AI crawler User-Agents you care about:

(http.user_agent contains "GPTBot")
or (http.user_agent contains "ChatGPT-User")
or (http.user_agent contains "OAI-SearchBot")
or (http.user_agent contains "ClaudeBot")
or (http.user_agent contains "Claude-User")
or (http.user_agent contains "Claude-SearchBot")
or (http.user_agent contains "PerplexityBot")
or (http.user_agent contains "Perplexity-User")
or (http.user_agent contains "Google-Extended")
or (http.user_agent contains "Applebot-Extended")

Set the action to Skip → all WAF managed rules, all rate limits, all bot challenges. This is what allows the bot through to your origin.

User-Agent allowlisting alone is a weak signal (anyone can fake the UA), so combine it with verified-bot status if you can.

Step 4: confirm with Cloudflare’s bot verification

Cloudflare maintains a list of officially verified bots at radar.cloudflare.com/bots. The major AI bot operators are there:

  • OpenAI (GPTBot, ChatGPT-User, OAI-SearchBot)
  • Anthropic (ClaudeBot, Claude-User, Claude-SearchBot)
  • Perplexity (PerplexityBot, Perplexity-User)
  • Google (Googlebot, Google-Extended)
  • Apple (Applebot, Applebot-Extended)

In your custom rule, set the match to (cf.bot_management.verified_bot) or (cf.client.bot) and the action to Skip. This is more robust than UA-only matching because it requires the request to come from a verified bot operator IP.

Why this matters more than other AEO fixes

In our scoring rubric, fetch_direct is worth 18 points — the largest individual check. If you fail it, the maximum score you can achieve is 82.

But the practical consequence is worse: even if you score 82 on our auditor (because we fall back to BrightData’s Web Unlocker), real AI crawlers will not have BrightData. They’ll just see the 403 and skip your site. So the real-world AEO of a Cloudflare-blocked site is closer to 0 than to 82.

Fixing the bot config typically takes 10 minutes and moves the score from F to A on otherwise-well-built sites.

Other WAFs to check

The same pattern shows up on:

  • AWS WAF — managed rules AWSManagedRulesBotControlRuleSet block AI bots by default. Add explicit allow rules for the User-Agents.
  • Akamai Bot Manager — has an “AI Bots” category that’s blocked by default. Move it to “Allow.”
  • Imperva — similar story; check the “Bot Mitigation” policy.
  • Datadome / PerimeterX — both default to “block all bots.” If you’re using one of these and want AEO, you need to explicitly allow OpenAI/Anthropic/Perplexity bot ASNs.

The auditor’s fetch_direct check is WAF-agnostic — it just measures whether a plain GET works. If it doesn’t, the underlying cause is almost always a WAF rule, not your robots.txt.

A note on legitimate concerns

There are real reasons to block training crawlers:

  • You’re a paid-content publisher who doesn’t want to subsidize model training.
  • You’re concerned about scraping for competitive intel.
  • You’ve had abusive traffic from a bot ASN.

If those apply, the right pattern is block training, allow live:

# robots.txt
User-agent: GPTBot
Disallow: /            # Training crawler — blocked

User-agent: ChatGPT-User
Allow: /               # Live fetch when a user asks ChatGPT — allowed

User-agent: OAI-SearchBot
Allow: /               # ChatGPT Search index — allowed

# (Repeat the same pattern for ClaudeBot vs Claude-User, etc.)

This way you opt out of future training while staying in current citation surfaces. Many publishers (NYTimes, Reuters) have moved to this stance.


Further reading

Ready to score your site? Run an audit →