NoEdgeAI/pdfdeal
A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall in RAG). | Doc2X API的python封装,同时附带本地的文本处理(提升PDF在RAG中的召回率)。
Provides asynchronous batch PDF processing with configurable output formats (Markdown, LaTeX, DOCX, JSON) and coordinate metadata retention. Beyond Doc2X integration, includes post-processing tools for Markdown manipulation—HTML table conversion, remote image uploading, document splitting by headings—designed for seamless ingestion into RAG systems like GraphRAG, FastGPT, and Dify. The v3 API model includes helper scripts for extracting figures and tables as cropped image artifacts with bounding box metadata.
284 stars.
Stars
284
Forks
19
Language
Python
License
MIT
Category
Last pushed
Mar 12, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/NoEdgeAI/pdfdeal"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
thiswillbeyourgithub/wdoc
Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype,...
laxmimerit/RAGWire
Production-grade RAG toolkit — ingest PDFs, DOCX, XLSX into Qdrant with LLM metadata extraction,...
Arterning/DeepParseX
DeepParseX 是一个强大的多模态文档解析与知识管理平台,支持 PDF、Word、Excel、PPT、图片、视频、音频 等多种文件格式的智能解析,自动提取关键信息,并构建...
atpuxiner/docsloader
This is a documents loader. (文档解析加载器,rag文档解析,rag知识库构建)
David-Lolly/ViewRAG
图文并茂的 PDF RAG 系统:支持版式感知分块、图表深度理解与精准视觉溯源。 Multimodal PDF RAG: Features layout-aware chunking,...