How It Works — Lab451

You paste your URL

Enter any publicly accessible website URL. Lab451 accepts root domains, subdomains, and specific path prefixes. No authentication, no API key, no installs required on your end.

Tip: If you only want to index a subsection of your site (e.g., /blog), you can specify a path prefix and we'll limit the crawl scope accordingly.

Our crawler maps your site structure

We send a polite crawl bot (identifiable by its user-agent Lab451Bot/1.0) to walk your site's link graph. It respects your existing robots.txt — even the one it's about to replace.

add_link

Link graph walking

Follows internal links to discover all public pages

folder_code

Content extraction

Strips nav/footer/boilerplate, extracts main content

page_info

Metadata parsing

Reads title tags, meta descriptions, Open Graph data

edit_calendar

Date detection

Finds publish dates for accurate sitemap lastmod values

We generate your four files

With your site fully mapped, Lab451 compiles each file individually. Each is built to a specific spec:

llms.txt llmstxt.org spec

A concise, Markdown-formatted document describing your site: its purpose, main sections, key topics, and preferred AI interaction guidelines. Think of it as a README for AI models.

llms-full.txt Extended spec

A full-content companion that includes the actual text of every page (cleaned and structured). Used by RAG pipelines and AI systems that do deep document retrieval rather than just crawling.

sitemap.xml Sitemaps.org protocol

A standards-compliant XML sitemap with accurate lastmod dates, appropriate changefreq hints, and canonical URLs. Submitted-ready for Google Search Console.

robots.txt RFC 9309

Configured to welcome major AI crawlers (GPTBot, ClaudeBot, Googlebot, etc.) while blocking known scrapers. Auto-references your new sitemap.xml location.

You download, upload, and you're done

Download all four files in a single ZIP, then upload them to your web server's root directory. That's literally it. No ongoing maintenance required on the free plan.

AI models find you on their next crawl

AI companies run their crawlers on their own schedules — typically every few weeks. Once your files are live, the next crawl will pick them up. There's no "submit to AI" button (yet) — this is the right way to do it at scale.

Control: Lab451 doesn't fast-track your content into any model's training data. What we do, is make sure your site is structured, legible, and maximally discoverable for AI retrieval systems, which is the part you can actually control.

How often should I re-generate my files?

When significant content changes, such as new product features, updated API documentation, or updated FAQs. For active sites, a weekly or bi-weekly update cadence is recommended.

Tip: While some AI crawlers are now fetching llms.txt, llms-full.txt daily, the goal of these files is to prevent AI from providing inaccurate or outdated information about your products, making regular updates essential for accuracy.

Compatibility

Which AI models does this help with?

The proposed standard that helps large language models understand what your site is about. Presently supported by:

ChatGPT

GPTBot

Supported

Claude

ClaudeBot

Supported

Gemini

Googlebot

Supported

Perplexity

PerplexityBot

Supported

Grok

xAI-Bot

Supported

Copilot

Bingbot

Supported

Mistral

MistralBot

Partial

Llama

Meta-ExternalAgent

Partial

How Lab451 works

You paste your URL

Our crawler maps your site structure

We generate your four files

You download, upload, and you're done

AI models find you on their next crawl

How often should I re-generate my files?

Which AI models does this help with?