ck-unifr/pdf_parsing

PDF解析（文字，章节，表格，图片，参考），基于大模型(ChatGLM2-6B, RWKV)+langchain+streamlit的PDF问答，摘要，信息抽取

/ 100

Emerging

Combines PyMuPDF and PyPDF2 for multi-modal PDF extraction (text hierarchy, tables, images, references) with separate LLM pipelines—RWKV-Raven-7B for summarization and ChatGLM2-6B for structured reference metadata extraction (author, title, year). Provides a complete Streamlit+LangChain QA interface with vector-based retrieval over parsed content, though acknowledges table extraction as a current limitation requiring alternative approaches like LayoutLM or table-transformer.

211 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 19 / 25

How are scores calculated?

Stars

211

Forks

Language

Python

License

—

Higher-rated alternatives

sudan94/chat-pdf-hugginface

This is a fun Python project that allows you to chat with a chatbot about the PDF you uploaded....

amitgupta4407/All_About_PDF

This is a complete website in which you can chat with pdf, extract meta data, text, links,...

rahul2002m/ChatPDF

ChatPDF is a Streamlit app allowing users to query PDF & DOCX content via natural language. It...

benthecoder/chatpdf

chat with pdf with mistral.ai + streamlit

Hashir-Ahmad1/Train-AI-agent-on-mutiple-PDF

The Multi-PDF's Chat Agent is a Streamlit-based web application designed to facilitate...

Explore LLM Tools

All categories Trending LLM Tool directory Insights