aryn-ai/sycamore
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
Leverages Aryn DocParse, a GPU-powered document segmentation API with a DETR vision model trained on 80k+ enterprise documents, to intelligently partition complex PDFs, images, tables, and infographics while preserving semantic structure. Built on a scalable DocSet abstraction with functional Python transforms for data extraction, enrichment, and cleaning, then loads results into vector databases (OpenSearch, Elasticsearch, Pinecone, DuckDB, Qdrant, Weaviate) with a Ray backend for distributed processing.
592 stars. Actively maintained with 5 commits in the last 30 days.
Stars
592
Forks
68
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 12, 2026
Commits (30d)
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/aryn-ai/sycamore"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
deepset-ai/haystack-tutorials
Here you can find all the Tutorials for Haystack 📓
unum-cloud/USearch
Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C,...
towhee-io/towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
MaartenGr/PolyFuzz
Fuzzy string matching, grouping, and evaluation.
pingcap/pytidb
TiDB AI SDK: Unified Multi-Modal Data Platform for AI Apps & Agents - https://pingcap.github.io/ai/