KylinMountain/markify
Convert files into markdown to help RAG or LLM understand, based on markitdown and MinerU, which could provide high quality pdf parser.
Exposes dual PDF parsing modes—fast pdfminer-based extraction and advanced MinerU-powered deep analysis—via FastAPI with async job queuing and Streamlit UI. Handles 10+ file formats (PDFs, Office documents, images, HTML, CSV, JSON, archives) with unified markdown output optimized for RAG/LLM ingestion. Deployable as containerized service or standalone CLI with automatic format detection.
133 stars. No commits in the last 6 months.
Stars
133
Forks
16
Language
Python
License
—
Category
Last pushed
Mar 27, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/KylinMountain/markify"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
any4ai/AnyCrawl
AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts...
ScrapeGraphAI/Scrapegraph-ai
Python scraper based on AI
adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping,...
kreuzberg-dev/html-to-markdown
High performance and CommonMark compliant HTML to Markdown converter. Maintained by the...
lightfeed/extractor
Using LLMs and AI browser automation to robustly extract web data