explosion/spacy-layout

📚 Process PDFs, Word documents and more with spaCy

51
/ 100
Established

Leverages Docling for layout-aware document parsing, extracting structured content (sections, headings, tables) as labeled spaCy spans with bounding box coordinates and converting tables to pandas DataFrames. Outputs text-based representations alongside markdown and preserves layout information through custom extension attributes, enabling downstream NLP tasks like entity recognition and RAG chunking on semantically meaningful document regions.

869 stars. No commits in the last 6 months. Available on PyPI.

Stale 6m
Maintenance 0 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 16 / 25

How are scores calculated?

Stars

869

Forks

61

Language

Python

License

MIT

Last pushed

Mar 08, 2025

Commits (30d)

0

Dependencies

4

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/explosion/spacy-layout"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.