llms-txt-gen icon

llms-txt-gen

llms-txt-gen is a zero-dependency Node.js CLI that generates a starter `llms.txt` file from a website’s sitemap. It helps site owners create a publishable draft quickly, then edit and publish it at the site root.

llms-txt-gen

What it is

llms-txt-gen is a zero-dependency Node.js CLI that generates a starter `llms.txt` file from a website’s sitemap. It is built for site owners who want a fast way to create a publishable `llms.txt` draft without hand-assembling the page list.

The tool reads `/sitemap.xml`, including sitemap indexes, collects the URLs it finds, and emits a Markdown scaffold you can edit before publishing at the root of your site. Its output is intentionally a starting point: the README tells you to replace the placeholder description, prune the page list, and publish the file yourself.

The repository positions the tool around answer engine optimization and AI search visibility, with examples that reference ChatGPT, Perplexity, Gemini, and Google AI Overviews. The package is designed for simple local use through `npx`, and the code requires Node 18+ so it can use the built-in `fetch`.

Core capabilities

One-command CLI usage

Runs from the command line with `npx`, so you can generate a starter file without adding the package to a project first.

Sitemap-based URL collection

Fetches `/sitemap.xml` from a site and reads the URLs listed there, including sitemap indexes that point to child sitemaps.

Structured starter output

Creates a spec-shaped `llms.txt` scaffold with a site heading, placeholder summary, and a curated pages list that you can edit before publishing.

Stdout or file output

Lets you choose whether to print the result to stdout or write it directly to a file with `--output`.

URL limiting and deduplication

Accepts a `--max` limit for the number of URLs included, with a default of 100, and removes duplicates before output.

Lightweight runtime footprint

Uses no external dependencies and only requires Node 18+ because it depends on the built-in `fetch`.

Typical use cases

  • Bootstrap a new llms.txt

    Create a first-pass `llms.txt` file from an existing sitemap when you want to publish a curated LLM-facing summary quickly.

  • Curate a site’s key pages

    Turn an established sitemap into a draft that can be trimmed down to the most relevant pages before publishing.

  • Prepare a file for manual review

    Generate a local output file that can be reviewed in a text editor and then uploaded to the site root.

  • Support AI search visibility work

    Check how a site’s pages may be surfaced for AI search workflows by producing a structured, human-editable page list.

Pros and Cons

Pros

  • Zero external dependencies, which keeps installation and execution simple.
  • Works with `npx`, so you can run it without a full project setup.
  • Supports sitemap indexes as well as a flat sitemap file.
  • Can write to a file or print to stdout, which fits both scripting and interactive use.
  • Deduplicates URLs and lets you cap the list with `--max`.

Cons

  • It depends on a site exposing `/sitemap.xml`; if that endpoint is missing or inaccessible, generation fails.
  • The generated file is a starter template, so it still needs human editing before publishing.
  • It only uses sitemap data, so it does not claim broader site crawling or content analysis.

FAQ

How do you run llms-txt-gen?

It runs as a Node.js CLI and can be invoked with `npx github:NitishSamurai/llms-txt-gen <site-url>`. The tool reads `/sitemap.xml`, supports sitemap indexes, and can write the generated content to a file with `-o` or print it to stdout.

What runtime does it require?

Yes. The package metadata requires Node 18 or newer, and the code notes that it relies on the built-in `fetch` available in Node 18+.

What does it generate?

The generated output is a starter `llms.txt` file based on sitemap URLs. The README shows a Markdown structure with a site heading, a placeholder description, and a `Pages` section of linked URLs that you are expected to edit before publishing.

Is it free to use?

It is designed to be a zero-dependency CLI. The package uses an MIT license and is published on GitHub, which makes it suitable for open-source use or quick evaluation without installing a dependency tree.

Does it crawl pages beyond the sitemap?

The repository says it follows a conservative workflow: fetch the site sitemap, optionally follow sitemap indexes, de-duplicate URLs, and cap the list with the `--max` option. It does not claim to crawl the full website beyond the sitemap.

Quick Facts

Category
Developer Tool
Platform
Node.js CLI
Runtime
Node 18+
License
MIT
Source domain
github.com
Output
Starter llms.txt Markdown