All Data Engineering Tools
517 tools ranked by quality score
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data... |
|
Verified |
| 2 |
dagster-io/dagster
An orchestration platform for the development, production, and observation... |
|
Verified |
| 3 |
dlt-hub/dlt
data load tool (dlt) is an open source Python library that makes data... |
|
Verified |
| 4 |
growthbook/growthbook
Open Source Feature Flags, Experimentation, and Product Analytics |
|
Verified |
| 5 |
pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM... |
|
Verified |
| 6 |
supabase/supabase-py
Python Client for Supabase. Query Postgres from Flask, Django, FastAPI.... |
|
Verified |
| 7 |
Unstructured-IO/unstructured
Convert documents to structured data effortlessly. Unstructured is... |
|
Verified |
| 8 |
bruin-data/ingestr
ingestr is a CLI tool to copy data between any databases with a single... |
|
Verified |
| 9 |
koopjs/koop
Transform, query, and download geospatial data on the web. |
|
Verified |
| 10 |
mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data. |
|
Verified |
| 11 |
meltano/meltano
Meltano: the declarative code-first data integration engine that powers your... |
|
Verified |
| 12 |
pyjanitor-devs/pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor |
|
Verified |
| 13 |
quiltdata/quilt
Quilt is a Scientific Data Management Platform on AWS that helps teams and... |
|
Verified |
| 14 |
databricks/dbt-databricks
A dbt adapter for Databricks. |
|
Verified |
| 15 |
debezium/debezium
Change data capture for a variety of databases. Please log issues at... |
|
Verified |
| 16 |
apache/flink-cdc
Flink CDC is a streaming data integration tool |
|
Verified |
| 17 |
airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from... |
|
Verified |
| 18 |
apache/superset
Apache Superset is a Data Visualization and Data Exploration Platform |
|
Verified |
| 19 |
apache/shardingsphere
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability,... |
|
Verified |
| 20 |
apache/seatunnel
SeaTunnel is a multimodal, high-performance, distributed, massive data... |
|
Verified |
| 21 |
vaexio/vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML,... |
|
Verified |
| 22 |
datajoint/datajoint-python
Relational data pipelines for the science lab |
|
Verified |
| 23 |
open-metadata/OpenMetadata
OpenMetadata is a unified metadata platform for data discovery, data... |
|
Verified |
| 24 |
apache/incubator-devlake
Apache DevLake is an open-source dev data platform to ingest, analyze, and... |
|
Verified |
| 25 |
crate/crate
CrateDB is a distributed and scalable SQL database for storing and analyzing... |
|
Verified |
| 26 |
dathere/qsv
Blazing-fast Data-Wrangling toolkit |
|
Verified |
| 27 |
capitalone/locopy
locopy: Loading/Unloading to Redshift and Snowflake using Python. |
|
Verified |
| 28 |
vectordotdev/vector
A high-performance observability data pipeline. |
|
Verified |
| 29 |
treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data |
|
Verified |
| 30 |
fugue-project/fugue
A unified interface for distributed computing. Fugue executes SQL, Python,... |
|
Verified |
| 31 |
dbeaver/dbeaver
Free universal database tool and SQL client |
|
Verified |
| 32 |
dagu-org/dagu
A local-first workflow engine built the way it should be: declarative,... |
|
Verified |
| 33 |
cloudquery/cloudquery
Data pipelines for cloud config and security data. Build cloud asset... |
|
Verified |
| 34 |
risingwavelabs/risingwave
Event streaming platform for agents, apps, and analytics. Continuously... |
|
Verified |
| 35 |
PeerDB-io/peerdb
Fast, Simple and a cost effective tool to replicate data from Postgres to... |
|
Established |
| 36 |
apache/hop
Hop Orchestration Platform |
|
Established |
| 37 |
thorsten/phpMyFAQ
phpMyFAQ - Open Source FAQ web application for PHP 8.3+ and MySQL,... |
|
Established |
| 38 |
catalyst-cooperative/pudl
The Public Utility Data Liberation Project provides analysis-ready energy... |
|
Established |
| 39 |
networktocode/diffsync
A utility library for comparing and synchronizing different datasets. |
|
Established |
| 40 |
snowplow/snowplow
The leader in Customer Data Infrastructure |
|
Established |
| 41 |
steedos/steedos-platform
The AI-Native Infrastructure for Enterprise Apps. Powered by ObjectStack... |
|
Established |
| 42 |
scribe-org/Scribe-Data
Wikidata and Wiktionary language data extraction |
|
Established |
| 43 |
mayneyao/eidos
An extensible framework for Personal Data Management. |
|
Established |
| 44 |
biglocalnews/warn-scraper
Command-line interface for downloading WARN Act notices of qualified plant... |
|
Established |
| 45 |
SQLMesh/sqlmesh
Scalable and efficient data transformation framework - backwards compatible with dbt. |
|
Established |
| 46 |
elastic/eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL... |
|
Established |
| 47 |
dataform-co/dataform
Dataform is a framework for managing SQL based data operations in BigQuery |
|
Established |
| 48 |
odpi/egeria
Egeria core |
|
Established |
| 49 |
laminlabs/lamindb
Open-source data framework for biology. Context and memory for datasets and... |
|
Established |
| 50 |
aws/aws-sdk-pandas
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream,... |
|
Established |
| 51 |
astronomer/airflow-provider-fivetran-async
A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran |
|
Established |
| 52 |
datazip-inc/olake
OLake - Fastest Databases, Kafka & S3 Replication to Apache Iceberg or Plain... |
|
Established |
| 53 |
datavane/tis
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI |
|
Established |
| 54 |
nordquant/complete-dbt-bootcamp-zero-to-hero
Supplementary Materials for the The Complete dbt (Data Build Tool) Bootcamp... |
|
Established |
| 55 |
wgzhao/Addax
A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL... |
|
Established |
| 56 |
ariacom/Seal-Report
Database Reporting Tool and Tasks (.Net) |
|
Established |
| 57 |
datavane/datavines
Know your data better!Datavines is Next-gen Data Observability Platform,... |
|
Established |
| 58 |
nightscape/spark-excel
A Spark plugin for reading and writing Excel files |
|
Established |
| 59 |
knime/knime-core
KNIME Analytics Platform |
|
Established |
| 60 |
vietvudanh/vietlott-data
Automation fetching data for Vietlott. Just for fun. |
|
Established |
| 61 |
datagouv/csv-detective
Inspection of tabular (csv, xls-like) files to guess the columns' content |
|
Established |
| 62 |
sodadata/soda-core
Data Contracts engine for the modern data stack. https://www.soda.io |
|
Established |
| 63 |
xorq-labs/xorq
A compute manifest and composable tools for data, built on Ibis, DataFusion,... |
|
Established |
| 64 |
apache/hamilton
Apache Hamilton helps data scientists and engineers define testable,... |
|
Established |
| 65 |
redpanda-data/connect
Fancy stream processing made operationally mundane |
|
Established |
| 66 |
elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers.... |
|
Established |
| 67 |
fkie-cad/Logprep
log data pre processing, generation and shipping in python |
|
Established |
| 68 |
ucbepic/docetl
A system for agentic LLM-powered data processing and ETL |
|
Established |
| 69 |
rusq/slackdump
Save or export your private and public Slack messages, threads, files, and... |
|
Established |
| 70 |
iBridges-for-iRODS/iBridges
A wrapper around the python-irodsclient to allow for easy interaction with... |
|
Established |
| 71 |
apecloud/ape-dts
ApeCloud's Data Transfer Suite, written in Rust. Provides ultra-fast data... |
|
Established |
| 72 |
jtablesaw/tablesaw
Java dataframe and visualization library |
|
Established |
| 73 |
VisActor/VStory
Use data to tell stories.An intelligent Visualization Narrative Development... |
|
Established |
| 74 |
datacleaner/DataCleaner
The premier open source Data Quality solution |
|
Established |
| 75 |
bitpicky/dbt-sugar
dbt-sugar is a CLI tool that allows users of dbt to have fun and ease... |
|
Established |
| 76 |
cre-dev/xml2db
A Python package to load complex XML files into a relational database |
|
Established |
| 77 |
evinism/mistql
A query / expression language for performing computations on JSON-like... |
|
Established |
| 78 |
turbot/steampipe
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No... |
|
Established |
| 79 |
DataTalksClub/data-engineering-zoomcamp
Data Engineering Zoomcamp is a free 9-week course on building... |
|
Established |
| 80 |
heavyai/heavydb
HeavyDB (formerly MapD/OmniSciDB) |
|
Established |
| 81 |
slingdata-io/sling-cli
Sling is a CLI tool that extracts data from a source storage/database and... |
|
Established |
| 82 |
rudderlabs/rudder-server
Privacy and Security focused Segment-alternative, in Golang and React |
|
Established |
| 83 |
biglocalnews/warn-transformer
Consolidate, enrich and republish the data gathered by warn-scraper |
|
Established |
| 84 |
nshiab/simple-data-analysis
Easy-to-use and high-performance TypeScript library for data analysis. Works... |
|
Established |
| 85 |
Data-Centric-AI-Community/ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas... |
|
Established |
| 86 |
timeplus-io/proton
⚡ Fastest SQL ETL pipeline in a single C++ binary, built for stream... |
|
Established |
| 87 |
debba/tabularis
A lightweight, developer-focused database management tool. Supports MySQL,... |
|
Established |
| 88 |
apache/wayang
Apache Wayang is the first cross-platform data processing system. |
|
Established |
| 89 |
fedspendingtransparency/usaspending-api
Server application to serve U.S. federal spending data via a RESTful API |
|
Established |
| 90 |
amphi-ai/amphi-etl
visual data prep powered by python |
|
Established |
| 91 |
snowflakedb/snowpark-python
Snowflake Snowpark Python API |
|
Established |
| 92 |
dotflow-io/dotflow
🎲 Business Logic Code in a flow! |
|
Established |
| 93 |
Desbordante/desbordante-core
Desbordante is a high-performance data profiler that is capable of... |
|
Established |
| 94 |
osalvador/ReplicaDB
ReplicaDB is open source tool for database replication, designed for... |
|
Established |
| 95 |
turbot/steampipe-plugin-aws
Use SQL to instantly query AWS resources across regions and accounts. Open... |
|
Established |
| 96 |
ohs-foundation/fhir-data-pipes
A collection of tools for extracting FHIR resources and analytics services... |
|
Established |
| 97 |
langchain-ai/langchain-postgres
LangChain abstractions backed by Postgres Backend |
|
Established |
| 98 |
ConduitIO/conduit
Conduit streams data between data stores. Kafka Connect replacement. No JVM required. |
|
Established |
| 99 |
data-engineering-community/data-engineering-wiki
The best place to learn data engineering. Built and maintained by the data... |
|
Established |
| 100 |
opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life... |
|
Established |