What is llms.txt? The New Google Lighthouse Audit Explained

Google now checks whether your site ships an llms.txt file. Here is everything agencies need to know about the specification, why it matters for AI discoverability, and how to create one.

By Kris Zwart· ZwartifyDesign · May 2026

The Rise of Machine-Readable Site Manifests

For decades, robots.txttold search-engine crawlers which pages to index and which to leave alone. It was a simple contract between website owners and bots. But large language models do not crawl the way Googlebot does. They ingest content, summarise it, and reference it in conversational answers. That fundamental shift in how information is consumed demanded a new contract — and that contract is llms.txt.

The llms.txt file is a plain-text manifest placed at the root of a domain (e.g. https://example.com/llms.txt) that describes a website in a format optimised for large language models. Rather than HTML with navigation chrome, scripts, and visual styling, llms.txt gives AI systems a clean, structured overview of what a site contains, how it is organised, and what each section covers.

The llmstxt.org Specification

The formal specification lives at llmstxt.org and was originally proposed by Jeremy Howard. The spec defines a Markdown file with a strict structure: a top-level heading with the site name, an optional blockquote summary, then a series of sections whose headings indicate content categories. Each section contains short descriptions and links to relevant pages.

There are two variants in the spec. The standard llms.txt is a concise overview with links. The extended llms-full.txt is an expanded version that includes the actual content from each linked page, giving LLMs everything they need in a single file without following links. Both files sit in the site root alongside robots.txt and sitemap.xml.

The key design principle is simplicity. No XML schemas, no JSON-LD, no namespace prefixes. Just clean Markdown that any LLM can parse without ambiguity. This low barrier to entry is exactly why adoption has been fast.

Why Google Made It a Lighthouse Audit

In early 2026, Google added an Agentic Browsingcategory to its Lighthouse performance tool. This category evaluates how well a website supports AI agents — systems that autonomously browse, extract, and act on web content. The very first audit in that category checks for the presence and validity of an llms.txt file.

This was a signal moment. Google effectively endorsed llms.txt as a web standard for AI readability. When Lighthouse flags a missing llms.txt as a failed audit, every developer and SEO professional running a site audit now sees it. The message is clear: if you want your site to perform well in an AI-first search landscape, you need this file.

The audit checks three things. First, does /llms.txt return a 200 response? Second, does the file follow the llmstxt.org structure with a valid top-level heading? Third, are the links in the file resolvable and not broken? Sites that pass all three checks get a green tick in the Agentic Browsing section.

How to Create an llms.txt File

Creating an llms.txt manually is straightforward but time-consuming. You need to audit every page on the site, write a concise description of each one, organise them into logical categories, and format the result in valid Markdown. For a five-page brochure site, that takes fifteen minutes. For a 500-page e-commerce store with product categories, blog posts, and landing pages, it can take days.

The Manual Approach

Start with a top-level heading using your site name. Add a blockquote that summarises what the site does in one or two sentences. Then create sections for each content category: Products, Blog, Documentation, About, and so on. Under each section, list the key pages with a short description and a direct URL. Save the file as llms.txt in your site root.

The Automated Approach

For agencies managing multiple client sites, manual creation does not scale. You need a tool that can crawl a domain, extract page titles and meta descriptions, infer a logical site structure, and generate both the standard and full-text variants automatically. This is precisely what the AI SEO DOJO llms.txt generator does. It crawls the site, builds the manifest, validates the output against the llmstxt.org spec, and deploys it — all in one click.

Why Agencies Need to Act Now

The shift from traditional SEO to AI-optimised content is accelerating. Agencies that wait until llms.txt becomes as ubiquitous as robots.txt will be late. Right now, fewer than 8% of websites in the Alexa top 100,000 have an llms.txt file. That gap represents an enormous opportunity for agencies that move first.

Every client site you manage is a candidate. Running a Lighthouse audit, showing the client the red flag in the Agentic Browsing section, and then offering to fix it is one of the easiest upsells in modern SEO. It is tangible, measurable, and directly tied to a Google audit score.

Beyond the immediate Lighthouse pass, an llms.txt file improves your client's chances of being cited by ChatGPT, Perplexity, Gemini, and Claude. These models use structured site information to determine which sources are authoritative and relevant. A well-crafted llms.txt file is like a cover letter for your client's website, addressed directly to every AI system that encounters it.

How AI SEO DOJO Automates llms.txt Generation

AI SEO DOJO includes a dedicated llms.txt generator as part of its 18-task GEO scoring engine. When you add a client domain to the platform, the crawler indexes every accessible page, extracts metadata, and builds both the standard and full-text variants of the llms.txt file.

The generator handles edge cases that trip up manual creation: pagination sequences, duplicate content behind different URL parameters, JavaScript-rendered pages that require headless browsing, and multilingual sites with hreflang tags. It also validates every internal link in the output, so you never deploy a manifest with broken references.

For WordPress sites, the included AI SEO DOJO plugin can deploy the file directly to the site root without requiring FTP access or manual upload. For static sites and custom platforms, the generated file is available as a one-click download or can be deployed via API.

The result is a Lighthouse-passing, spec-compliant llms.txt file generated in seconds rather than hours. Agencies using AI SEO DOJO typically roll llms.txt out across their entire client portfolio in a single afternoon.

Key Takeaways

  • llms.txt is the new robots.txt for AI. It tells large language models what your site is about in a format they can parse directly.
  • Google Lighthouse now audits it. The Agentic Browsing category checks for presence, structure, and link validity.
  • Manual creation does not scale. Agencies need automated tooling to roll this out across client portfolios efficiently.
  • Early movers win. Fewer than 8% of top sites have an llms.txt file. The competitive advantage for agencies that act now is significant.

Generate llms.txt for every client in one click

AI SEO DOJO crawls your client sites, builds spec-compliant llms.txt files, and deploys them automatically. Pass the Lighthouse Agentic Browsing audit across your entire portfolio.