liunian-Jay/MU-GOT

PDF Parsing Tool: GOT's vLLM acceleration implementation, MinerU for layout recognition, and GOT for table formula parsing.

/ 100

Experimental

Implements an end-to-end PDF-to-markdown pipeline that decouples layout recognition from table parsing, using vLLM 0.5.3 for GOT acceleration with batch inference optimization and eliminating intermediate file I/O by passing data through variables. The system first converts PDFs to markdown via MinerU's layout analysis, then applies GOT-OCR2.0 for table-to-LaTeX formula extraction, targeting Torch 2.3.1 and Qwen2 model architectures.

No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

thiswillbeyourgithub/wdoc

Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype,...

laxmimerit/RAGWire

Production-grade RAG toolkit — ingest PDFs, DOCX, XLSX into Qdrant with LLM metadata extraction,...

Arterning/DeepParseX

DeepParseX 是一个强大的多模态文档解析与知识管理平台，支持 PDF、Word、Excel、PPT、图片、视频、音频等多种文件格式的智能解析，自动提取关键信息，并构建...

NoEdgeAI/pdfdeal

A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall...

atpuxiner/docsloader

This is a documents loader. (文档解析加载器，rag文档解析，rag知识库构建)

Explore RAG Tools

All categories Trending RAG directory Insights