velocitybolt/open-extract
Structured Data Extractor for AI Agents. Search your documents or the web for specific data and get it back in JSON or Markdown in a single tool call.
Supports multi-document and multi-schema extraction across diverse file types without requiring vector databases or page specifications, with built-in caching for rapid reprocessing. Model-agnostic architecture that works with any LLM provider, integrating directly into agentic frameworks like LangGraph, AG2, and CrewAI. Accepts user-defined schemas as key-value pairs describing extraction targets, returning results in JSON or Markdown formats suitable for downstream agent workflows.
185 stars.
Stars
185
Forks
21
Language
Python
License
MIT
Category
Last pushed
Jan 05, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/velocitybolt/open-extract"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PaddlePaddle/PaddleOCR
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR...
kreuzberg-dev/kreuzberg
A polyglot document intelligence framework with a Rust core. Extract text, metadata, and...
yfedoseev/pdf_oxide
The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown...
opendataloader-project/opendataloader-pdf
PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
NanoNets/docext
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking...