iamarunbrahma/pdf-to-markdown

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.

/ 100

Emerging

115 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 13 / 25

How are scores calculated?

Stars

115

Forks

Language

Python

License

MIT

Category

web-to-markdown-rag

Last pushed

Nov 22, 2024

Commits (30d)

GitHub

Web-to-Markdown RAG · 101 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/iamarunbrahma/pdf-to-markdown"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

any4ai/AnyCrawl

AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts...

kreuzberg-dev/html-to-markdown

High performance and CommonMark compliant HTML to Markdown converter. Maintained by the...

lightfeed/extractor

Using LLMs and AI browser automation to robustly extract web data

ScrapeGraphAI/Scrapegraph-ai

Python scraper based on AI

paulpierre/markdown-crawler

A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file...

Explore RAG Tools

All categories Trending RAG directory Insights