philschmid/clipper.js

HTML to Markdown converter and crawler.

40
/ 100
Emerging

Leverages Mozilla's Readability for intelligent content extraction and Turndown for HTML-to-Markdown conversion, with optional Playwright-based crawling for batch processing entire sites. Supports multiple input formats (URLs, local HTML files, directories) and output formats (Markdown, JSONL), making it useful for dataset generation and web archival workflows. Can be chained with tools like poppler for PDF-to-Markdown conversion pipelines.

614 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 14 / 25

How are scores calculated?

Stars

614

Forks

39

Language

TypeScript

License

Apache-2.0

Last pushed

Jan 09, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/philschmid/clipper.js"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.