sigoden/rag-crawler

Crawl a website to generate knowledge file for RAG

52
/ 100
Established

Extracts page content via CSS selectors and outputs structured JSON or individual markdown files, with configurable concurrency limits and path exclusion patterns. Includes auto-detected presets for popular platforms like GitHub Wiki and Markdown repositories, eliminating manual configuration for common documentation sources. Supports both HTML crawling and direct GitHub tree traversal for markdown-native documentation.

No commits in the last 6 months. Available on npm.

Stale 6m
Maintenance 0 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 17 / 25

How are scores calculated?

Stars

50

Forks

11

Language

TypeScript

License

MIT

Last pushed

Apr 03, 2025

Monthly downloads

11

Commits (30d)

0

Dependencies

6

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/sigoden/rag-crawler"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.