Command-Line Interface¶
Command-line interface for webdown.
This module provides the command-line interface (CLI) for Webdown, a tool for converting web pages or local HTML files to clean, readable Markdown format. The CLI allows users to customize various aspects of the conversion process, from content selection to formatting options.
For a complete reference, see the CLI Reference documentation.
Basic Usage¶
The most basic usage is to convert a URL:
Or convert a local HTML file:
This will convert the content to Markdown, displaying the result to stdout. To save the output to a file:
Common Options¶
The CLI offers various options to customize the conversion:
Source Options¶
-u, --url URL
: URL of the web page to convert-f, --file FILE
: Path to local HTML file to convert
Input/Output Options¶
-o, --output FILE
: Write output to FILE instead of stdout-p, --progress
: Show download progress bar (only for URL downloads)
Content Options¶
-t, --toc
: Generate a table of contents based on headings-L, --no-links
: Strip hyperlinks, converting them to plain text-I, --no-images
: Exclude images from the output-s, --css SELECTOR
: Extract only content matching the CSS selector (e.g., "main")-c, --compact
: Remove excessive blank lines from the output-w, --width N
: Set line width for wrapped text (0 for no wrapping)-V, --version
: Show version information and exit-h, --help
: Show help message and exit
Note: For large web pages (over 10MB), webdown automatically uses streaming mode to optimize memory usage.
Claude XML Options¶
Options for generating Claude XML format, optimized for use with Claude AI:
--claude-xml
: Output in Claude XML format instead of Markdown--metadata
: Include metadata section in XML (default: True)--no-metadata
: Exclude metadata section from XML--no-date
: Don't include current date in metadata
Example Scenarios¶
Web Page Conversion Examples¶
-
Basic conversion with a table of contents:
-
Extract only the main content area with compact output and text wrapping:
-
Create a plain text version (no links or images):
-
Show download progress for large pages:
-
Extract content from a specific div:
-
Process a large webpage with progress bar (streaming is automatic for large pages):
-
Generate output in Claude XML format for use with Claude AI:
-
Create Claude XML without metadata:
-
Complete example with multiple options:
Local HTML File Conversion Examples¶
-
Convert a local HTML file to Markdown:
-
Convert a local file with table of contents:
-
Extract only main content from a local file:
-
Create plain text from a local file (no links or images):
-
Convert local file to Claude XML format:
-
Complete local file example with multiple options:
The entry point is the main()
function, which is called when the command
webdown
is executed.
Functions Reference¶
Command-line interface for webdown.
This module provides the command-line interface (CLI) for Webdown, a tool for converting web pages or local HTML files to clean, readable Markdown format. The CLI allows users to customize various aspects of the conversion process, from content selection to formatting options.
For a complete reference, see the CLI Reference documentation.
Basic Usage¶
The most basic usage is to convert a URL:
Or convert a local HTML file:
This will convert the content to Markdown, displaying the result to stdout. To save the output to a file:
Common Options¶
The CLI offers various options to customize the conversion:
Source Options¶
-u, --url URL
: URL of the web page to convert-f, --file FILE
: Path to local HTML file to convert
Input/Output Options¶
-o, --output FILE
: Write output to FILE instead of stdout-p, --progress
: Show download progress bar (only for URL downloads)
Content Options¶
-t, --toc
: Generate a table of contents based on headings-L, --no-links
: Strip hyperlinks, converting them to plain text-I, --no-images
: Exclude images from the output-s, --css SELECTOR
: Extract only content matching the CSS selector (e.g., "main")-c, --compact
: Remove excessive blank lines from the output-w, --width N
: Set line width for wrapped text (0 for no wrapping)-V, --version
: Show version information and exit-h, --help
: Show help message and exit
Note: For large web pages (over 10MB), webdown automatically uses streaming mode to optimize memory usage.
Claude XML Options¶
Options for generating Claude XML format, optimized for use with Claude AI:
--claude-xml
: Output in Claude XML format instead of Markdown--metadata
: Include metadata section in XML (default: True)--no-metadata
: Exclude metadata section from XML--no-date
: Don't include current date in metadata
Example Scenarios¶
Web Page Conversion Examples¶
-
Basic conversion with a table of contents:
-
Extract only the main content area with compact output and text wrapping:
-
Create a plain text version (no links or images):
-
Show download progress for large pages:
-
Extract content from a specific div:
-
Process a large webpage with progress bar (streaming is automatic for large pages):
-
Generate output in Claude XML format for use with Claude AI:
-
Create Claude XML without metadata:
-
Complete example with multiple options:
Local HTML File Conversion Examples¶
-
Convert a local HTML file to Markdown:
-
Convert a local file with table of contents:
-
Extract only main content from a local file:
-
Create plain text from a local file (no links or images):
-
Convert local file to Claude XML format:
-
Complete local file example with multiple options:
The entry point is the main()
function, which is called when the command
webdown
is executed.
Functions¶
main(args: Optional[List[str]] = None) -> int
¶
Execute the webdown command-line interface.
This function is the main entry point for the webdown command-line tool. It handles the entire workflow: 1. Parsing command-line arguments 2. Converting the content (URL or file) to Markdown with the specified options 3. Writing the output to a file or stdout 4. Error handling and reporting
Parameters:
Name | Type | Description | Default |
---|---|---|---|
args
|
Optional[List[str]]
|
Command line arguments as a list of strings. If None, defaults to sys.argv[1:] (the command-line arguments passed to the script). |
None
|
Returns:
Type | Description |
---|---|
int
|
Exit code: 0 for success, 1 for errors |
Examples:
>>> main(['-u', 'https://example.com']) # Convert URL and print to stdout
0
>>> main(['-u', 'https://example.com', '-o', 'output.md']) # Write URL to file
0
>>> main(['-f', 'page.html']) # Convert file and print to stdout
0
>>> main(['-f', 'page.html', '-o', 'output.md']) # Write file to file
0
>>> main(['-u', 'invalid-url']) # Handle error
1
Source code in webdown/cli.py
parse_args(args: Optional[List[str]] = None) -> argparse.Namespace
¶
Parse command line arguments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
args
|
Optional[List[str]]
|
Command line arguments (defaults to sys.argv[1:] if None) |
None
|
Returns:
Type | Description |
---|---|
Namespace
|
Parsed arguments |