datachain-ai/datachain

Analytics, Versioning and ETL for multimodal data: video, audio, PDFs, images

86
/ 100
Verified

Provides a Python dataframe-like API with vectorized operations and delta/retry processing for efficient incremental workflows on unstructured data stored in S3, GCP, Azure, or local filesystems. Integrates with LLM APIs and ML frameworks (PyTorch, TensorFlow) for enrichment and model application, while maintaining data references without duplication and metadata in an internal queryable database.

2,729 stars and 17,066 monthly downloads. Used by 1 other package. Actively maintained with 40 commits in the last 30 days. Available on PyPI.

Maintenance 23 / 25
Adoption 21 / 25
Maturity 25 / 25
Community 17 / 25

How are scores calculated?

Stars

2,729

Forks

136

Language

Python

License

Apache-2.0

Last pushed

Mar 12, 2026

Monthly downloads

17,066

Commits (30d)

40

Dependencies

36

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/mlops/datachain-ai/datachain"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.