katanaml/sparrow

Structured data extraction and instruction calling with ML, LLM and Vision LLM

66
/ 100
Established

Combines pluggable extraction pipelines (Sparrow Parse for vision, Instructor for text) with multi-backend support (MLX for Apple Silicon, Ollama, vLLM, HuggingFace Cloud) to handle diverse document types as JSON-validated schemas. Includes agent-based workflow orchestration for multi-step processing, OCR preprocessing, and a web UI with real-time visualization and bounding box annotations for extracted data.

5,129 stars. Actively maintained with 20 commits in the last 30 days.

No Package No Dependents
Maintenance 20 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

5,129

Forks

511

Language

Python

License

GPL-3.0

Last pushed

Mar 12, 2026

Commits (30d)

20

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/katanaml/sparrow"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.