liunian-Jay/MU-GOT
PDF Parsing Tool: GOT's vLLM acceleration implementation, MinerU for layout recognition, and GOT for table formula parsing.
Implements an end-to-end PDF-to-markdown pipeline that decouples layout recognition from table parsing, using vLLM 0.5.3 for GOT acceleration with batch inference optimization and eliminating intermediate file I/O by passing data through variables. The system first converts PDFs to markdown via MinerU's layout analysis, then applies GOT-OCR2.0 for table-to-LaTeX formula extraction, targeting Torch 2.3.1 and Qwen2 model architectures.
No commits in the last 6 months.
Stars
65
Forks
5
Language
Python
License
—
Category
Last pushed
Nov 07, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/liunian-Jay/MU-GOT"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
thiswillbeyourgithub/wdoc
Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype,...
laxmimerit/RAGWire
Production-grade RAG toolkit — ingest PDFs, DOCX, XLSX into Qdrant with LLM metadata extraction,...
Arterning/DeepParseX
DeepParseX 是一个强大的多模态文档解析与知识管理平台,支持 PDF、Word、Excel、PPT、图片、视频、音频 等多种文件格式的智能解析,自动提取关键信息,并构建...
NoEdgeAI/pdfdeal
A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall...
atpuxiner/docsloader
This is a documents loader. (文档解析加载器,rag文档解析,rag知识库构建)