NoEdgeAI/pdfdeal

A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall in RAG). | Doc2X API的python封装,同时附带本地的文本处理(提升PDF在RAG中的召回率)。

51
/ 100
Established

Provides asynchronous batch PDF processing with configurable output formats (Markdown, LaTeX, DOCX, JSON) and coordinate metadata retention. Beyond Doc2X integration, includes post-processing tools for Markdown manipulation—HTML table conversion, remote image uploading, document splitting by headings—designed for seamless ingestion into RAG systems like GraphRAG, FastGPT, and Dify. The v3 API model includes helper scripts for extracting figures and tables as cropped image artifacts with bounding box metadata.

284 stars.

No Package No Dependents
Maintenance 13 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 12 / 25

How are scores calculated?

Stars

284

Forks

19

Language

Python

License

MIT

Last pushed

Mar 12, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/NoEdgeAI/pdfdeal"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.