QuivrHQ/MegaParse

File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.

44
/ 100
Emerging

Preserves structural elements like tables, headers, footers, and images through multimodal vision models (GPT-4o, Claude 3.5) that achieve 0.87 similarity to source documents. Offers both Python library and REST API interfaces, with modular postprocessing architecture and benchmark evaluation tools for comparing parser performance.

7,347 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

7,347

Forks

416

Language

Python

License

Apache-2.0

Last pushed

Feb 21, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/QuivrHQ/MegaParse"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.