Crawlability & Indexation Technical Framework Guide

Uncovering Hostinger’s Silent GPTBot Filtering on Shared Hosting (28.04.2026)

During the very first pre‑production tests of this framework(28.04.2026), the system uncovered an infrastructure‑level GPTBot blockade on Hostinger shared hosting.

The audit detected a consistent 429 rejection for GPTBot while all other crawlers received 200 OK. The failure was later confirmed by Hostinger’s own technical support, who stated that the block is applied globally across all shared hosting plans and cannot be disabled or whitelisted by individual users.

This was the framework’s first real‑world validation: it exposed an AI‑visibility failure that was invisible to humans, undocumented by the provider, and enforced at the server edge before the site’s code even loads.

Their filtering logic treats GPTBot as a high‑risk or hostile user‑agent, while allowing every other major crawler — including ChatGPT‑User — to pass without restriction.

Crawlability and indexation form the access layer of AI SEO. Before an AI system can interpret, classify, or embed content, it must be able to reach it, render it, and extract it in a stable, text‑only environment. AI crawlers do not behave like full browsers. They do not execute complex JavaScript, they do not wait for asynchronous content to load, and they do not interact with modals, overlays, or dynamic components. They operate more like lightweight, resource‑constrained agents whose goal is to obtain a clean, unambiguous text representation of the page.

If the content is not accessible in a text‑only crawl, AI will not see it.
If AI cannot see it, it cannot embed it.
If it cannot embed it, it cannot retrieve it.

This is why crawlability is not a “technical SEO hygiene” issue — it is a prerequisite for machine comprehension.

Why Crawlability Matters

AI systems build their understanding of a site from the raw text they can extract. They do not rely on rendered DOM snapshots or client‑side hydration. They rely on:

the HTML returned on first response
the text available without interaction
the content visible without scripts
the structure present without dynamic rendering

If any part of the content is hidden behind JavaScript, modals, client‑side routing, or unstable rendering, that content effectively does not exist for AI.

This is not a ranking issue.
It is an existence issue.

How Crawlability Fails in Real Sites

Most crawlability failures are not obvious to humans because humans see the rendered page. AI sees the pre‑rendered, script‑free, interaction‑free version.

Common failure modes include:

Blocked resources

Critical CSS, JS, or API endpoints blocked by robots.txt prevent the crawler from accessing essential content or layout structure.

JS‑dependent content

If the main content is injected client‑side, AI crawlers will see an empty shell.
This is one of the most common and most damaging failures.

Hidden content behind modals

Cookie banners, newsletter popups, location selectors, and consent gates often obscure the primary content in the raw DOM.

Unstable rendering

If the DOM shifts during load (due to hydration, A/B testing scripts, or personalization), AI may extract incomplete or corrupted content.

Missing or inaccurate sitemaps

If sitemaps do not reflect the actual entity structure of the site, AI cannot discover key pages.

Robots.txt conflicts

Overly broad disallow rules, misconfigured directives, or blocked directories can prevent AI from accessing essential content or resources.

Each of these failures creates blind spots in the site’s semantic graph.

How AI Crawlers Actually Behave

AI crawlers are not full browsers.
They behave more like:

text‑only fetchers
partial renderers
limited JavaScript executors
non‑interactive agents

They do not:

click buttons
scroll
wait for hydration
accept cookies
close modals
execute heavy scripts
resolve client‑side routing
load content hidden behind user interaction

They extract whatever is available immediately in the HTML response.

This is why the “text‑only crawl” is the closest approximation of what AI sees.

If the content is not present in that environment, it is invisible.

What Happens When Crawlability Is Weak

When AI cannot access or fully extract the content:

Entities are missing — the model cannot identify what the page is about
Relationships break — internal linking and structured data cannot be interpreted
Embeddings become incomplete — missing sections lead to partial or incorrect vectors
Retrieval becomes unreliable — AI surfaces the wrong page or no page at all
Authority signals disappear — the model cannot see the content that establishes expertise
Multilingual alignment collapses — translations become disconnected because crawlers cannot access them consistently

A site with crawlability issues is not “underperforming.”
It is invisible.

What Proper Crawlability Looks Like

A crawlable site ensures that:

the primary content is available server‑side
the DOM is stable on first load
no essential content depends on JavaScript
no modals or overlays block the text
sitemaps reflect the true structure of the site
robots.txt allows access to all necessary resources
language versions are discoverable and linked
internal links are visible in the raw HTML
canonical URLs resolve cleanly without redirects

This creates a complete, accessible, machine‑readable representation of the site.

See a Real Crawlability Audit Example

Here is a full example of an AI‑native Crawlability & Indexation audit generated by our system:

→ View the Crawlability & Indexation Audit Example

That example in particular costs €12.24 because it includes additional features like the full roadmap, but you can always choose only what you need — packages start at 1 euro for a 5‑URL basic audit with the same level of analysis shown here.

For this type of analysis, agencies normally package the work into four‑figure “discovery” fees.

The Goal

The goal of crawlability is not simply to “let Google crawl the site.”
The goal is to ensure that AI systems can extract the full semantic content of every page in a stable, predictable, script‑free environment.

If the content is not accessible in a text‑only crawl, it does not exist for AI.
If it does not exist for AI, it cannot be embedded.
If it cannot be embedded, it cannot be retrieved.

Crawlability is the entry point of the entire AI SEO framework.
Everything else depends on it.

Get My Crawlability & Indexation AI Audit

Free. No Signup Required.

Business Intelligence Engine

Machine-Readability Framework

Crawlability & Indexation Framework Guide

Uncovering Hostinger’s Silent GPTBot Filtering on Shared Hosting (28.04.2026)

Why Crawlability Matters

How Crawlability Fails in Real Sites

How AI Crawlers Actually Behave

What Happens When Crawlability Is Weak

What Proper Crawlability Looks Like

See a Real Crawlability Audit Example

The Goal

Free. No Signup Required.

Business Intelligence Engine

Machine-Readability Framework