lefterisloukas/edgar-crawler
The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files. Presented at WWW 2025 @ Sydney, Australia (https://dl.acm.org/doi/10.1145/3701716.3715289)
Employs EDGAR API filtering by year, quarter, and filing type to enable targeted bulk downloads across US public companies, with item-level parsing that isolates and cleanly extracts standardized sections (Item 1, 1A, etc. for 10-K; Part I/II items for 10-Q; event items for 8-K). Designed specifically to bootstrap financial NLP research by producing machine-readable JSON output directly consumable by language models and text analysis pipelines, and has generated EDGAR-CORPUS, a large-scale HuggingFace dataset.
491 stars. No commits in the last 6 months.
Stars
491
Forks
125
Language
Python
License
GPL-3.0
Category
Last pushed
Jul 18, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/lefterisloukas/edgar-crawler"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
yya518/FinBERT
A Pretrained BERT Model for Financial Communications. https://arxiv.org/abs/2006.08097
shirosaidev/stocksight
Stock market analyzer and predictor using Elasticsearch, Twitter, News headlines and Python...
Shubxam/Nifty-500-Live-Sentiment-Analysis
Live Sentiment Analysis dashboard of NIFTY 500 universe of stocks using plotly and streamlit
louisowen6/SENN
Code implementation of "SENN: Stock Ensemble-based Neural Network for Stock Market Prediction...
databricks-industry-solutions/esg-scoring
In this solution, we offer a novel approach to sustainable finance by combining NLP techniques...