lefterisloukas/edgar-crawler

The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files. Presented at WWW 2025 @ Sydney, Australia (https://dl.acm.org/doi/10.1145/3701716.3715289)

53
/ 100
Established

Employs EDGAR API filtering by year, quarter, and filing type to enable targeted bulk downloads across US public companies, with item-level parsing that isolates and cleanly extracts standardized sections (Item 1, 1A, etc. for 10-K; Part I/II items for 10-Q; event items for 8-K). Designed specifically to bootstrap financial NLP research by producing machine-readable JSON output directly consumable by language models and text analysis pipelines, and has generated EDGAR-CORPUS, a large-scale HuggingFace dataset.

491 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

491

Forks

125

Language

Python

License

GPL-3.0

Last pushed

Jul 18, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/lefterisloukas/edgar-crawler"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.