All Data Engineering Tools

517 tools ranked by quality score

Showing 1–100 of 517
# Tool Score Tier
1 PrefectHQ/prefect

Prefect is a workflow orchestration framework for building resilient data...

95
Verified
2 dagster-io/dagster

An orchestration platform for the development, production, and observation...

94
Verified
3 dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data...

90
Verified
4 growthbook/growthbook

Open Source Feature Flags, Experimentation, and Product Analytics

90
Verified
5 pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM...

87
Verified
6 supabase/supabase-py

Python Client for Supabase. Query Postgres from Flask, Django, FastAPI....

87
Verified
7 Unstructured-IO/unstructured

Convert documents to structured data effortlessly. Unstructured is...

86
Verified
8 bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single...

84
Verified
9 koopjs/koop

Transform, query, and download geospatial data on the web.

82
Verified
10 mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

81
Verified
11 meltano/meltano

Meltano: the declarative code-first data integration engine that powers your...

80
Verified
12 pyjanitor-devs/pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

79
Verified
13 quiltdata/quilt

Quilt is a Scientific Data Management Platform on AWS that helps teams and...

77
Verified
14 databricks/dbt-databricks

A dbt adapter for Databricks.

77
Verified
15 debezium/debezium

Change data capture for a variety of databases. Please log issues at...

76
Verified
16 apache/flink-cdc

Flink CDC is a streaming data integration tool

76
Verified
17 airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from...

76
Verified
18 apache/superset

Apache Superset is a Data Visualization and Data Exploration Platform

76
Verified
19 apache/shardingsphere

Empowering Data Intelligence with Distributed SQL for Sharding, Scalability,...

76
Verified
20 apache/seatunnel

SeaTunnel is a multimodal, high-performance, distributed, massive data...

76
Verified
21 vaexio/vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML,...

75
Verified
22 datajoint/datajoint-python

Relational data pipelines for the science lab

75
Verified
23 open-metadata/OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data...

75
Verified
24 apache/incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and...

74
Verified
25 crate/crate

CrateDB is a distributed and scalable SQL database for storing and analyzing...

73
Verified
26 dathere/qsv

Blazing-fast Data-Wrangling toolkit

73
Verified
27 capitalone/locopy

locopy: Loading/Unloading to Redshift and Snowflake using Python.

72
Verified
28 vectordotdev/vector

A high-performance observability data pipeline.

71
Verified
29 treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

70
Verified
30 fugue-project/fugue

A unified interface for distributed computing. Fugue executes SQL, Python,...

70
Verified
31 dbeaver/dbeaver

Free universal database tool and SQL client

70
Verified
32 dagu-org/dagu

A local-first workflow engine built the way it should be: declarative,...

70
Verified
33 cloudquery/cloudquery

Data pipelines for cloud config and security data. Build cloud asset...

70
Verified
34 risingwavelabs/risingwave

Event streaming platform for agents, apps, and analytics. Continuously...

70
Verified
35 PeerDB-io/peerdb

Fast, Simple and a cost effective tool to replicate data from Postgres to...

69
Established
36 apache/hop

Hop Orchestration Platform

69
Established
37 thorsten/phpMyFAQ

phpMyFAQ - Open Source FAQ web application for PHP 8.3+ and MySQL,...

69
Established
38 catalyst-cooperative/pudl

The Public Utility Data Liberation Project provides analysis-ready energy...

69
Established
39 networktocode/diffsync

A utility library for comparing and synchronizing different datasets.

69
Established
40 snowplow/snowplow

The leader in Customer Data Infrastructure

69
Established
41 steedos/steedos-platform

The AI-Native Infrastructure for Enterprise Apps. Powered by ObjectStack...

69
Established
42 scribe-org/Scribe-Data

Wikidata and Wiktionary language data extraction

68
Established
43 mayneyao/eidos

An extensible framework for Personal Data Management.

68
Established
44 biglocalnews/warn-scraper

Command-line interface for downloading WARN Act notices of qualified plant...

67
Established
45 SQLMesh/sqlmesh

Scalable and efficient data transformation framework - backwards compatible with dbt.

67
Established
46 elastic/eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL...

67
Established
47 dataform-co/dataform

Dataform is a framework for managing SQL based data operations in BigQuery

67
Established
48 odpi/egeria

Egeria core

67
Established
49 laminlabs/lamindb

Open-source data framework for biology. Context and memory for datasets and...

66
Established
50 aws/aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream,...

66
Established
51 astronomer/airflow-provider-fivetran-async

A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran

65
Established
52 datazip-inc/olake

OLake - Fastest Databases, Kafka & S3 Replication to Apache Iceberg or Plain...

65
Established
53 datavane/tis

Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI

64
Established
54 nordquant/complete-dbt-bootcamp-zero-to-hero

Supplementary Materials for the The Complete dbt (Data Build Tool) Bootcamp...

64
Established
55 wgzhao/Addax

A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL...

64
Established
56 ariacom/Seal-Report

Database Reporting Tool and Tasks (.Net)

64
Established
57 datavane/datavines

Know your data better!Datavines is Next-gen Data Observability Platform,...

64
Established
58 nightscape/spark-excel

A Spark plugin for reading and writing Excel files

64
Established
59 knime/knime-core

KNIME Analytics Platform

64
Established
60 vietvudanh/vietlott-data

Automation fetching data for Vietlott. Just for fun.

64
Established
61 datagouv/csv-detective

Inspection of tabular (csv, xls-like) files to guess the columns' content

64
Established
62 sodadata/soda-core

Data Contracts engine for the modern data stack. https://www.soda.io

63
Established
63 xorq-labs/xorq

A compute manifest and composable tools for data, built on Ibis, DataFusion,...

63
Established
64 apache/hamilton

Apache Hamilton helps data scientists and engineers define testable,...

63
Established
65 redpanda-data/connect

Fancy stream processing made operationally mundane

63
Established
66 elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers....

62
Established
67 fkie-cad/Logprep

log data pre processing, generation and shipping in python

62
Established
68 ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

62
Established
69 rusq/slackdump

Save or export your private and public Slack messages, threads, files, and...

62
Established
70 iBridges-for-iRODS/iBridges

A wrapper around the python-irodsclient to allow for easy interaction with...

61
Established
71 apecloud/ape-dts

ApeCloud's Data Transfer Suite, written in Rust. Provides ultra-fast data...

61
Established
72 jtablesaw/tablesaw

Java dataframe and visualization library

60
Established
73 VisActor/VStory

Use data to tell stories.An intelligent Visualization Narrative Development...

60
Established
74 datacleaner/DataCleaner

The premier open source Data Quality solution

60
Established
75 bitpicky/dbt-sugar

dbt-sugar is a CLI tool that allows users of dbt to have fun and ease...

60
Established
76 cre-dev/xml2db

A Python package to load complex XML files into a relational database

59
Established
77 evinism/mistql

A query / expression language for performing computations on JSON-like...

59
Established
78 turbot/steampipe

Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No...

59
Established
79 DataTalksClub/data-engineering-zoomcamp

Data Engineering Zoomcamp is a free 9-week course on building...

59
Established
80 heavyai/heavydb

HeavyDB (formerly MapD/OmniSciDB)

58
Established
81 slingdata-io/sling-cli

Sling is a CLI tool that extracts data from a source storage/database and...

58
Established
82 rudderlabs/rudder-server

Privacy and Security focused Segment-alternative, in Golang and React

58
Established
83 biglocalnews/warn-transformer

Consolidate, enrich and republish the data gathered by warn-scraper

58
Established
84 nshiab/simple-data-analysis

Easy-to-use and high-performance TypeScript library for data analysis. Works...

58
Established
85 Data-Centric-AI-Community/ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas...

58
Established
86 timeplus-io/proton

⚡ Fastest SQL ETL pipeline in a single C++ binary, built for stream...

58
Established
87 debba/tabularis

A lightweight, developer-focused database management tool. Supports MySQL,...

58
Established
88 apache/wayang

Apache Wayang is the first cross-platform data processing system.

57
Established
89 fedspendingtransparency/usaspending-api

Server application to serve U.S. federal spending data via a RESTful API

57
Established
90 amphi-ai/amphi-etl

visual data prep powered by python

57
Established
91 snowflakedb/snowpark-python

Snowflake Snowpark Python API

57
Established
92 dotflow-io/dotflow

🎲 Business Logic Code in a flow!

57
Established
93 Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of...

56
Established
94 osalvador/ReplicaDB

ReplicaDB is open source tool for database replication, designed for...

56
Established
95 turbot/steampipe-plugin-aws

Use SQL to instantly query AWS resources across regions and accounts. Open...

56
Established
96 ohs-foundation/fhir-data-pipes

A collection of tools for extracting FHIR resources and analytics services...

56
Established
97 langchain-ai/langchain-postgres

LangChain abstractions backed by Postgres Backend

56
Established
98 ConduitIO/conduit

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

56
Established
99 data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data...

56
Established
100 opendatadiscovery/odd-platform

First open-source data discovery and observability platform. We make a life...

55
Established
1 2 3 4 5 6 Next »