Skip to content

Changelog

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.7.0] - 2025-04-05

Breaking Changes

  • CLI interface now requires explicit -u/--url or -f/--file flags for source input
  • URL must now be specified with -u/--url flag instead of as a positional argument
  • All CLI examples in documentation have been updated to reflect the new syntax

Added

  • Support for converting local HTML files to Markdown with -f/--file option
  • New convert_file function in the Python API for local HTML file conversion
  • New read_html_file function for reading HTML from the local filesystem
  • Added error codes for file-related errors (FILE_NOT_FOUND, PERMISSION_DENIED, IO_ERROR)
  • Comprehensive documentation with examples for local file conversion
  • Comprehensive CLI test script (scripts/test_cli.sh) for testing all CLI options with real websites

[0.6.3] - 2025-03-22

Changed

  • Updated project status from Alpha to Beta to reflect stability and completeness
  • Enhanced PyPI metadata with improved descriptions and keywords
  • Added additional classifiers for better project categorization
  • Added project URLs for documentation, bug tracker, source code, and changelog

[0.6.2] - 2025-03-22

Changed

  • Added poetry.lock to version control for reproducible builds
  • Updated all dependencies to their latest compatible versions
  • Pin safety package to version 3.3.1
  • Enhanced GitHub Actions workflows with better cache handling to avoid conflicts

Security

  • Updated safety scanner from 2.3.5 to 3.3.1 for improved vulnerability detection

[0.6.1] - 2025-03-22

Fixed

  • Code style and formatting issues in test files
  • GitHub Actions cache conflict warnings by improving cache key uniqueness
  • Added Python version to GitHub Actions cache keys for better matrix build separation

[0.6.0] - 2025-03-22

Added

  • Comprehensive documentation for troubleshooting and error handling
  • Advanced Table of Contents (TOC) documentation with examples and customization guidance
  • Documentation for streaming large files, explaining the 10MB threshold implementation
  • Improved API documentation with correct module references after refactoring

Changed

  • Major refactoring to improve code organization and maintainability:
  • Split monolithic converter.py into logical modules (html_parser.py, markdown_converter.py, xml_converter.py)
  • Replaced complex placeholder system for code blocks with direct processing
  • Reduced XML helper functions from 7+ to 3-4 clear functions
  • Created reusable validation and error handling utilities
  • Consolidated configuration into dedicated classes (WebdownConfig, DocumentOptions)
  • Eliminated duplicate code in XML converter with new _process_paragraphs() helper function
  • Improved test coverage to 100% across all application code
  • Simplified streaming implementation with fixed 10MB threshold
  • Updated configuration class documentation to reflect actual implementation
  • Reorganized codebase following clean architecture principles

Fixed

  • Documentation build after modular architecture refactoring
  • Corrected API references in documentation
  • Validation issues in error handling code
  • Improved error reporting for various failure scenarios

[0.5.0] - 2025-03-21

Added

  • Claude XML format support with --claude-xml flag
  • Optimized XML structure for use with Anthropic's Claude AI models
  • Metadata handling in Claude XML output with --no-metadata and --no-date options
  • New test suite for Claude XML functionality
  • Documentation for Claude XML format

Changed

  • Simplified streaming implementation with fixed 10MB threshold
  • Removed stream_threshold parameter from WebdownConfig
  • Removed advanced HTML2Text options to simplify the API
  • Improved README documentation for clarity and simplicity
  • Added code quality improvement tasks to TODO.md

Fixed

  • Improved streaming mode detection reliability
  • Better error handling in the streaming implementation

[0.4.2] - 2025-03-16

Improved

  • Migrated documentation to MkDocs with Material theme for better API reference
  • Added proper documentation site with auto-generation from docstrings
  • Fixed documentation deployment to GitHub Pages
  • Improved docstrings to be more consistent across modules

[0.4.1] - 2025-03-15

Added

  • Added pdoc documentation generation with make docs and make docs-serve commands
  • Generated documentation now available in the docs/ directory

Improved

  • Enhanced CLI documentation with detailed explanations and practical examples
  • Improved command-line help with logically organized option groups and better descriptions
  • Added epilog with link to project repository

[0.4.0] - 2025-03-15

Added

  • Introduced WebdownConfig class for better parameter organization and configuration
  • Added comprehensive support for advanced HTML2Text options in both CLI and API:
  • Single line break mode (--single-line-break)
  • Unicode character support (--unicode)
  • HTML tables preservation (--tables-as-html)
  • Custom emphasis and strong emphasis markers (--emphasis-mark, --strong-mark)
  • Link protection, image handling options, and more in the API
  • Improved CLI with advanced options group for better help display

Changed

  • Simplified the exception hierarchy to a single WebdownError class
  • Updated API to support both parameter-based and config-based approaches
  • Improved documentation with detailed examples for new features
  • Updated all dependencies to latest versions
  • html2text updated from 2020.1.16 to 2024.2.26
  • beautifulsoup4, requests, tqdm and all dev dependencies updated to latest versions

[0.3.1] - 2025-03-15

Changed

  • Updated all dependencies to latest versions
  • html2text updated from 2020.1.16 to 2024.2.26
  • beautifulsoup4, requests, tqdm and all dev dependencies updated to latest versions

[0.3.0] - 2025-03-15

Added

  • Command-line option -w/--width to set html2text body_width for text wrapping
  • Progress bar for downloads with new -p/--progress flag
  • Support for CSS selectors with -s/--css to extract specific page sections
  • Compact output option with -c/--compact to remove excessive blank lines
  • Automatic removal of zero-width spaces and other invisible characters

Changed

  • Migrated to modern Python packaging using Poetry
  • Updated Python requirements to 3.10+
  • Changed CSS selector flag from -c to -s to avoid conflict with compact flag
  • Improved documentation with comprehensive docstrings
  • Enhanced test coverage to 100% (excluding integration tests)

[0.2.0] - 2025-03-12

Added

  • Initial release with basic web to markdown conversion
  • Support for table of contents generation
  • Link and image handling options