Crawlability & Indexation Framework Guide

Uncovering Hostinger’s Silent GPTBot Filtering on Shared Hosting (28.04.2026)

During the very first pre‑production tests of this framework(28.04.2026), the system uncovered an infrastructure‑level GPTBot blockade on Hostinger shared hosting.

The audit detected a consistent 429 rejection for GPTBot while all other crawlers received 200 OK. The failure was later confirmed by Hostinger’s own technical support, who stated that the block is applied globally across all shared hosting plans and cannot be disabled or whitelisted by individual users.

This was the framework’s first real‑world validation: it exposed an AI‑visibility failure that was invisible to humans, undocumented by the provider, and enforced at the server edge before the site’s code even loads.

Their filtering logic treats GPTBot as a high‑risk or hostile user‑agent, while allowing every other major crawler — including ChatGPT‑User — to pass without restriction.

Crawlability and indexation form the access layer of AI SEO. Before an AI system can interpret, classify, or embed content, it must be able to reach it, render it, and extract it in a stable, text‑only environment. AI crawlers do not behave like full browsers. They do not execute complex JavaScript, they do not wait for asynchronous content to load, and they do not interact with modals, overlays, or dynamic components. They operate more like lightweight, resource‑constrained agents whose goal is to obtain a clean, unambiguous text representation of the page.

If the content is not accessible in a text‑only crawl, AI will not see it.
If AI cannot see it, it cannot embed it.
If it cannot embed it, it cannot retrieve it.

This is why crawlability is not a “technical SEO hygiene” issue — it is a prerequisite for machine comprehension.


Why Crawlability Matters

AI systems build their understanding of a site from the raw text they can extract. They do not rely on rendered DOM snapshots or client‑side hydration. They rely on:

  • the HTML returned on first response
  • the text available without interaction
  • the content visible without scripts
  • the structure present without dynamic rendering

If any part of the content is hidden behind JavaScript, modals, client‑side routing, or unstable rendering, that content effectively does not exist for AI.

This is not a ranking issue.
It is an existence issue.

How Crawlability Fails in Real Sites

Most crawlability failures are not obvious to humans because humans see the rendered page. AI sees the pre‑rendered, script‑free, interaction‑free version.

Common failure modes include:

Blocked resources

Critical CSS, JS, or API endpoints blocked by robots.txt prevent the crawler from accessing essential content or layout structure.

JS‑dependent content

If the main content is injected client‑side, AI crawlers will see an empty shell.
This is one of the most common and most damaging failures.

Hidden content behind modals

Cookie banners, newsletter popups, location selectors, and consent gates often obscure the primary content in the raw DOM.

Unstable rendering

If the DOM shifts during load (due to hydration, A/B testing scripts, or personalization), AI may extract incomplete or corrupted content.

Missing or inaccurate sitemaps

If sitemaps do not reflect the actual entity structure of the site, AI cannot discover key pages.

Robots.txt conflicts

Overly broad disallow rules, misconfigured directives, or blocked directories can prevent AI from accessing essential content or resources.

Each of these failures creates blind spots in the site’s semantic graph.

How AI Crawlers Actually Behave

AI crawlers are not full browsers.
They behave more like:

  • text‑only fetchers
  • partial renderers
  • limited JavaScript executors
  • non‑interactive agents

They do not:

  • click buttons
  • scroll
  • wait for hydration
  • accept cookies
  • close modals
  • execute heavy scripts
  • resolve client‑side routing
  • load content hidden behind user interaction

They extract whatever is available immediately in the HTML response.

This is why the “text‑only crawl” is the closest approximation of what AI sees.

If the content is not present in that environment, it is invisible.

What Happens When Crawlability Is Weak

When AI cannot access or fully extract the content:

  • Entities are missing — the model cannot identify what the page is about
  • Relationships break — internal linking and structured data cannot be interpreted
  • Embeddings become incomplete — missing sections lead to partial or incorrect vectors
  • Retrieval becomes unreliable — AI surfaces the wrong page or no page at all
  • Authority signals disappear — the model cannot see the content that establishes expertise
  • Multilingual alignment collapses — translations become disconnected because crawlers cannot access them consistently

A site with crawlability issues is not “underperforming.”
It is invisible.

What Proper Crawlability Looks Like

A crawlable site ensures that:

  • the primary content is available server‑side
  • the DOM is stable on first load
  • no essential content depends on JavaScript
  • no modals or overlays block the text
  • sitemaps reflect the true structure of the site
  • robots.txt allows access to all necessary resources
  • language versions are discoverable and linked
  • internal links are visible in the raw HTML
  • canonical URLs resolve cleanly without redirects

This creates a complete, accessible, machine‑readable representation of the site.

See a Real Crawlability Audit Example

Here is a full example of an AI‑native Crawlability & Indexation audit generated by our system:

That example in particular costs €12.24 because it includes additional features like the full roadmap, but you can always choose only what you need — packages start at 1 euro for a 5‑URL basic audit with the same level of analysis shown here.

For this type of analysis, agencies normally package the work into four‑figure “discovery” fees.


The Goal

The goal of crawlability is not simply to “let Google crawl the site.”
The goal is to ensure that AI systems can extract the full semantic content of every page in a stable, predictable, script‑free environment.

If the content is not accessible in a text‑only crawl, it does not exist for AI.
If it does not exist for AI, it cannot be embedded.
If it cannot be embedded, it cannot be retrieved.

Crawlability is the entry point of the entire AI SEO framework.
Everything else depends on it.

Get My Crawlability & Indexation AI Audit