Table of Contents Generation in Webdown¶
Webdown provides powerful Table of Contents (TOC) generation capabilities for web pages you convert to Markdown or Claude XML format.
Basic Usage¶
To generate a TOC, use the --toc
flag:
This will: 1. Extract all headings (h1-h6) from the page 2. Create a nested TOC based on the heading levels 3. Add the TOC at the beginning of the output 4. Create anchor links to each heading
Customizing TOC Generation¶
Controlling TOC Depth¶
By default, Webdown includes all heading levels (h1-h6) in the TOC. You can limit the depth with --toc-depth
:
This includes only h1, h2, and h3 headings in the TOC.
TOC Title¶
By default, the TOC is titled "Table of Contents". You can customize this with --toc-title
:
Placement¶
The TOC is always placed at the beginning of the document, after any metadata but before the main content.
How TOC Links Work¶
Link Generation¶
For each heading in the document, Webdown:
- Extracts the heading text
- Removes HTML tags (if any remain)
- Converts the text to lowercase
- Replaces spaces with hyphens
- Removes special characters that would break Markdown links
- Creates a unique ID if duplicate headings exist (by appending -1, -2, etc.)
For example:
- Heading "Getting Started" becomes #getting-started
- Heading "Section 2.1: Examples" becomes #section-21-examples
Duplicate Heading Handling¶
Webdown automatically detects duplicate heading text and adds numeric suffixes to ensure each link is unique:
The TOC will generate:
Integration with Claude XML¶
When using --claude-xml
with --toc
, Webdown:
- Generates the TOC with proper Markdown formatting
- Places it at the beginning of the document
- Properly escapes all content within the XML tags
Example:
Output:
<answer>
# Table of Contents
- [Introduction](#introduction)
- [Getting Started](#getting-started)
- [Prerequisites](#prerequisites)
- [Main Content](#main-content)
- [Section 1](#section-1)
- [Section 2](#section-2)
# Introduction
## Getting Started
...content continues...
</answer>
Technical Details¶
Heading Detection¶
Webdown extracts headings from the HTML document using a combination of: - Standard h1-h6 tags - Elements with heading roles - Elements with heading-like styling
This ensures comprehensive heading detection across different website structures.
Limitations¶
- Links to headings in code blocks might not work (code blocks often contain # characters that aren't headings)
- Very complex heading texts with unusual characters might have simplified link targets
- Some Markdown viewers might have slight differences in how they generate heading IDs
Best Practices¶
For optimal TOC results:
-
Consider using CSS selectors to extract only the main content:
-
For very large documents, limit the TOC depth:
-
When creating content for Claude, combine with other relevant options: