kreuzberg-dev/langchain-kreuzberg
Langchain document loader for Kreuzberg
# Technical Summary Wraps Kreuzberg's Rust-powered extraction API to handle 88+ file formats with true async processing via tokio, producing LangChain `Document` objects enriched with metadata including quality scores, detected languages, and extracted keywords. Supports configurable OCR backends (Tesseract, EasyOCR, PaddleOCR), per-page splitting for RAG pipelines, multiple output formats (text, Markdown, HTML, structured), and direct bytes input from API responses or cloud storage.
Available on PyPI.
Stars
4
Forks
—
Language
Python
License
MIT
Category
Last pushed
Mar 04, 2026
Monthly downloads
248
Commits (30d)
0
Dependencies
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/kreuzberg-dev/langchain-kreuzberg"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
run-llama/llama_index
LlamaIndex is the leading document agent and OCR platform
emarco177/documentation-helper
Reference implementation of a RAG-based documentation helper using LangChain, Pinecone, and Tavily..
janus-llm/janus-llm
Leveraging LLMs for modernization through intelligent chunking, iterative prompting and...
JetXu-LLM/llama-github
Llama-github is an open-source Python library that empowers LLM Chatbots, AI Agents, and...
Vasallo94/ObsidianRAG
RAG system to query your Obsidian notes using LangGraph and local LLMs (Ollama)