All Data Engineering Tools
517 tools ranked by quality score · Page 2 of 6
| # | Tool | Score | Tier |
|---|---|---|---|
| 101 |
neo4j/neo4j-jdbc
Official Neo4j JDBC Driver |
|
Established |
| 102 |
HariSekhon/SQL-scripts
100+ SQL Scripts - PostgreSQL, MySQL, Oracle, Google BigQuery, MariaDB, AWS... |
|
Established |
| 103 |
Breeze0806/go-etl
go-etl is a toolset for data extraction, transformation and loading. |
|
Established |
| 104 |
DataKitchen/dataops-testgen
DataOps Data Quality TestGen is part of DataKitchen's Open Source Data... |
|
Established |
| 105 |
sparklyr/sparklyr
R interface for Apache Spark |
|
Established |
| 106 |
benjamin-awd/monopoly
Monopoly is a Python library & CLI that converts bank statement PDFs to CSV. |
|
Established |
| 107 |
debba/tabularis
A lightweight, developer-focused database management tool. Supports MySQL,... |
|
Established |
| 108 |
VisActor/VStory
Use data to tell stories.An intelligent Visualization Narrative Development... |
|
Established |
| 109 |
bitol-io/open-data-contract-standard
Home of the Open Data Contract Standard (ODCS). |
|
Established |
| 110 |
jtablesaw/tablesaw
Java dataframe and visualization library |
|
Established |
| 111 |
kalininalab/DataSAIL
DataSAIL is a tool to split datasets while reducing information leakage. |
|
Established |
| 112 |
HTTP-RPC/Kilo
Lightweight REST for Java |
|
Established |
| 113 |
techascent/tech.ml.dataset
A Clojure high performance data processing system |
|
Established |
| 114 |
cre-dev/xml2db
A Python package to load complex XML files into a relational database |
|
Established |
| 115 |
turbot/steampipe
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No... |
|
Established |
| 116 |
dotnet/spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers. |
|
Established |
| 117 |
linkedpipes/etl
LinkedPipes ETL is an RDF based, lightweight ETL tool |
|
Established |
| 118 |
dalenewman/Transformalize
Configurable Extract, Transform, and Load |
|
Established |
| 119 |
DataTalksClub/data-engineering-zoomcamp
Data Engineering Zoomcamp is a free 9-week course on building... |
|
Established |
| 120 |
quixio/quix-streams
Python Streaming DataFrames for Kafka |
|
Established |
| 121 |
bacalhau-project/bacalhau
Community-driven, simple, yet powerful framework for fast, cost-effective... |
|
Established |
| 122 |
turbot/steampipe-plugin-github
Use SQL to instantly query repositories, users, gists and more from GitHub.... |
|
Established |
| 123 |
metafacture/metafacture-core
Core package of the Metafacture tool suite for metadata processing. |
|
Established |
| 124 |
vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL. |
|
Established |
| 125 |
Data-Centric-AI-Community/ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas... |
|
Established |
| 126 |
heavyai/heavydb
HeavyDB (formerly MapD/OmniSciDB) |
|
Established |
| 127 |
biglocalnews/warn-transformer
Consolidate, enrich and republish the data gathered by warn-scraper |
|
Established |
| 128 |
alibaba/feathub
FeatHub - A stream-batch unified feature store for real-time machine learning |
|
Established |
| 129 |
rudderlabs/rudder-server
Privacy and Security focused Segment-alternative, in Golang and React |
|
Established |
| 130 |
dagster-io/community-integrations
Community supported integrations for the Dagster platform. |
|
Established |
| 131 |
9tigerio/db2rest
Instant no code DATA API platform for relational databases. Connect any... |
|
Established |
| 132 |
Guepard-Corp/qwery-core
The Boring query platform - Connect and query anything |
|
Established |
| 133 |
turbot/steampipe-plugin-gcp
Use SQL to instantly query GCP resources across regions, projects and... |
|
Established |
| 134 |
h2oai/sparkling-water
Sparkling Water provides H2O functionality inside Spark cluster |
|
Established |
| 135 |
dataflint/spark
Drop-in replacement for Apache Spark UI |
|
Established |
| 136 |
datazip-inc/olake-ui
Frontend & BFF (Backend for frontend) for Olake. This includes the UI code... |
|
Established |
| 137 |
dotflow-io/dotflow
🎲 Business Logic Code in a flow! |
|
Established |
| 138 |
turbot/steampipe-plugin-kubernetes
Use SQL to instantly query Kubernetes API resources. Open source CLI. No DB required. |
|
Established |
| 139 |
dfpc-coe/CloudTAK
TAK Compatible, browser based Common Operation Picture & Situational Awareness tool |
|
Established |
| 140 |
elyra-ai/pipeline-editor
Common pipeline-editor components used in different clients (e.g. Elyra... |
|
Established |
| 141 |
turbot/steampipe-plugin-azure
Use SQL to instantly query Azure resources across regions and subscriptions.... |
|
Established |
| 142 |
dbt-labs/jaffle-shop
🥪🦘 An open source sandbox project exploring dbt workflows via a fictional... |
|
Established |
| 143 |
starlake-ai/starlake
Declarative text based tool for data analysts and engineers to extract,... |
|
Established |
| 144 |
flowsynx/flowsynx
A deterministic orchestrator for composable micro-workflows with reusable modules |
|
Established |
| 145 |
CogStack/CogStack-NiFi
Building data processing pipelines for documents processing with NLP using... |
|
Established |
| 146 |
DataSQRL/sqrl
Data Pipeline Automation Framework to build MCP servers, data APIs, and data... |
|
Established |
| 147 |
spitfireuptown/datalinkx
🔥🔥DatalinkX异构数据源之间的数据同步系统,支持海量数据的增量或全量同步,同时支持HTTP、Oracle、MySQL、ES等数据源之间的数据流转,... |
|
Established |
| 148 |
arkflow-rs/arkflow
High performance Rust stream processing engine seamlessly integrates AI... |
|
Established |
| 149 |
SentryPeer/SentryPeer
Protect your SIP Servers from bad actors at https://sentrypeer.org |
|
Established |
| 150 |
turbot/steampipe-plugin-sdk
Steampipe Plugin SDK is a simple abstraction layer to write a Steampipe... |
|
Established |
| 151 |
docwire/docwire
DocWire SDK: Award-winning modern data processing in C++20. SourceForge... |
|
Established |
| 152 |
reductstore/reductstore
High Performance Storage and Streaming Solution for Data Acquisition Systems |
|
Established |
| 153 |
Snowflake-Labs/emerging-solutions-toolbox
The Emerging Solutions Toolbox is a collection of solutions created by... |
|
Established |
| 154 |
kay-ou/SimTradeData
SimTradeData is a utility library supporting SimTradeDesk, SimTradeLab and... |
|
Established |
| 155 |
dflib/dflib
In-memory Java DataFrame library |
|
Established |
| 156 |
kanton-bern/hellodata-be
The Open-Source Enterprise Data Platform in a single Portal |
|
Established |
| 157 |
MLT-OSS/FirstData
The World's Most Comprehensive, Authoritative, and Structured Open Source... |
|
Established |
| 158 |
akmalsoliev/Validoopsie
A simple and easy to use Data Validation library for Python. |
|
Established |
| 159 |
airyhq/airy
💬 Open Source App Framework to build streaming apps with real-time data - 💎... |
|
Established |
| 160 |
OHDSI/ETL-Synthea
A package supporting the conversion from Synthea CSV to OMOP CDM |
|
Established |
| 161 |
mprove-io/mprove
Open Source Business Intelligence with Malloy Semantic Layer :tada: |
|
Established |
| 162 |
GoPlasmatic/dataflow-rs
A high-performance rules engine for IFTTT-style automation in Rust with... |
|
Established |
| 163 |
ogbinar/DataEngineeringPilipinas
Data Engineering Pilipinas is a community for data engineers, data analysts,... |
|
Established |
| 164 |
JuliaML/TableTransforms.jl
Transforms and pipelines with tabular data in Julia |
|
Established |
| 165 |
build-on-aws/rag-postgresql-agent-bedrock
This application is built in four stages using infrastructure as code with... |
|
Established |
| 166 |
fdmorison/tiozin
Tiozin, your friendly ETL framework |
|
Established |
| 167 |
halestudio/hale
(Spatial) data harmonisation with hale»studio (formerly HUMBOLDT Alignment Editor) |
|
Established |
| 168 |
DataRecce/recce
The data-validation toolkit for enhanced dbt (data build tool) PR review |
|
Established |
| 169 |
DataKitchen/data-observability-installer
Installer for DataKitchen's Open Source Data Observability Products. Data... |
|
Established |
| 170 |
ara3d/bim-open-schema
Representing BIM Data as Parquet |
|
Established |
| 171 |
lakevision-project/lakevision
Lakevision is a tool which provides insights into your Apache Iceberg based... |
|
Established |
| 172 |
Edwardvaneechoud/Flowfile
Flowfile is a visual ETL tool and Python library combining drag-and-drop... |
|
Established |
| 173 |
xxh/xxh-shell-xonsh
Use @xonsh wherever you go through the SSH without installation on the host. |
|
Established |
| 174 |
myriade-ai/myriade
AI Native Data Platform: explore, clean, transform and govern your data... |
|
Established |
| 175 |
AndreaBozzo/dataprof
Library and CLI for profiling tabular data |
|
Established |
| 176 |
byzer-org/byzer-lang
Byzer (former MLSQL): A low-code open-source programming language for data... |
|
Established |
| 177 |
robert-koch-institut/mex-common
RKI Metadata Exchange | Software development toolkit for the MEx project... |
|
Established |
| 178 |
StructuredLabs/preswald
Preswald is a WASM packager for Python-based interactive data apps: bundle... |
|
Established |
| 179 |
dashmug/glue-utils
glue-utils makes AWS Glue jobs less repetitive, more type-safe, and easier... |
|
Established |
| 180 |
FalkorDB/falkordb-ts
FalkorDB Typescript Client |
|
Established |
| 181 |
aws-samples/uncovering-hidden-connections-in-unstructured-financial-data
Uncovering Hidden Connections in Unstructured Financial Data using Amazon... |
|
Established |
| 182 |
bitol-io/open-data-product-standard
Home of the Open Data Product Standard (ODPS). |
|
Established |
| 183 |
mehd-io/pypi-duck-flow
end-to-end data engineering project to get insights from PyPi using python,... |
|
Established |
| 184 |
libredb/libredb-studio
A modern, blazing-fast SQL IDE for the cloud era. Query PostgreSQL, MySQL,... |
|
Established |
| 185 |
ashish10alex/vscode-dataform-tools
Dataform Tools - VS Code extension to run and visualise Dataform data... |
|
Established |
| 186 |
pplu/aws-sdk-perl
A community AWS SDK for Perl Programmers |
|
Established |
| 187 |
opensnowcat/opensnowcat-collector
OpenSnowcat Collector, an open source fork of Snowplow (Apache 2.0 License) |
|
Emerging |
| 188 |
bradfitz/embiggen-disk
embiggden-disk live-resizes a filesystem after first live-resizing any... |
|
Emerging |
| 189 |
Pipelex/pipelex-cookbook
Cookbook for Pipelex, the declarative language for composable Al workflows.... |
|
Emerging |
| 190 |
turbot/steampipe-plugin-terraform
Use SQL to instantly query resources, data sources and more from Terraform... |
|
Emerging |
| 191 |
koralium/flowtide
High-performance streaming SQL query engine designed for real-time data... |
|
Emerging |
| 192 |
hi-primus/optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF,... |
|
Emerging |
| 193 |
weifuwan/seatunnel-web
SeaTunnel Web is a visual platform for building, managing, and monitoring... |
|
Emerging |
| 194 |
mc2-project/opaque-sql
An encrypted data analytics platform |
|
Emerging |
| 195 |
MilkMp/CIA-World-Factbooks-Archive-1990-2025
Complete structured archive of every CIA World Factbook edition from... |
|
Emerging |
| 196 |
capitalone/DataProfiler
What's in your data? Extract schema, statistics and entities from datasets |
|
Emerging |
| 197 |
edrewitz/WxData
A Python library that acts as a client to download, pre-process and... |
|
Emerging |
| 198 |
aartikis/RTEC
RTEC is an Event Calculus implementation optimised for stream reasoning |
|
Emerging |
| 199 |
icoretech/airbroke
🔥 Lightweight, Airbrake/Sentry-compatible, PostgreSQL-based Open Source Error Catcher |
|
Emerging |
| 200 |
SETL-Framework/setl
A simple Spark-powered ETL framework that just works 🍺 |
|
Emerging |